SlideShare a Scribd company logo
1 of 54
Download to read offline
Putting OAC-triclustering on MapReduce
Sergey Zudin, Dmitry V. Gnatyshak, and Dmitry I. Ignatov
National Research University Higher School of Economics, Russian Federation
Faculty of Computer Science
CLA 2015, Clermont-Ferrand, France
October 13-16
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 1 / 39
Outline
1 Motivation and previous work
2 Prime OAC-triclustering
Triadic Formal concept analysis
Basic algorithm
Online version of the algorithm
3 OAC-triclustering on MapReduce
MapReduce technology
MapReduce implementation
4 Experiments
Description of the experiments
Datasets
Results
5 Conclusion
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 2 / 39
Outline
1 Motivation and previous work
2 Prime OAC-triclustering
Triadic Formal concept analysis
Basic algorithm
Online version of the algorithm
3 OAC-triclustering on MapReduce
MapReduce technology
MapReduce implementation
4 Experiments
Description of the experiments
Datasets
Results
5 Conclusion
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 3 / 39
Motivation
Big amount of multimodal data:
Gene expression data
Folksonomies
Recommender Systems
Communities in multi-mode (social) networks
Pattern mining in relational databases
. . .
Non-binary data can be scaled (possibly increasing the dimensionality)
Increasing amount of big data: fast and/or distributed algorithms are
required (linear or sublinear, one-pass)
Existing methods: finding all n-sets (mulitimodal clusters) satisfying some
conditions (often the exponential number of patterns)
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 4 / 39
Motivation
IMDB example, [Mirkin et al., 2011]
Clump Movie-Keyword-Genre
Bicluster
{12 Angry Men (1957), To Kill a Mockingbird (1962), Wit-
ness for the Prosecution (1957)}, {Murder, Trial}, {n/a }
Tricluster
{12 Angry Men (1957), Double Indemnity (1944), China-
town (1974), The Big Sleep (1946), Witness for the Pros-
ecution (1957), Dial M for Murder (1954), Shadow of a
Doubt (1943) }, {Murder, Trial, Widow, Marriage, Private
detective, Blackmail, Letter}, {Crime, Drama, Thriller,
Mystery, Film-Noir }
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 5 / 39
Previous and related work
A short (not full) list
Triadic FCA [Wille, 1995; Lehman and Wille,1995] and Polyadic FCA
[Voutsadakis, 2002]
TRIAS [J¨aeschke et al., 2006] for mining (frequent) triconcepts
DataPeeler for closed n-sets [Cerf et al., 2009], MultiDupeHack [Cerf et al,
2013]
TriBox [Mirkin et al., 2011] for mining dense triboxes with LS criterion
Box OAC-triclustering and Spectral Triclustering [Ignatov et al., 2011,2013]
Multi-way set enumeration in weight tensors [Sch¨olkopf et al, 2011]
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 6 / 39
Previous and related work
A short (not full) list
Quadri-concepts for personalised folksnomies [Jelassi et al., 2012, 2013]
Prime OAC-triclustering [Gnatyshak et al., 2012–2014]
Triadic Boolean tensor factorisation [Miettinen et al., 2011; Belohlavek et al.,
2013] and Boolean tensor clustering [Miettinen et al., 2015]
Closed and connected patterns in multi-relational data. [Spyropoulu et al.,
2011–14]
Triadic FCA and triclustering: Searching for optimal patterns. Machine
Learning journal [Ignatov et al., 2015] and CLA 2013
. . .
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 7 / 39
Outline
1 Motivation and previous work
2 Prime OAC-triclustering
Triadic Formal concept analysis
Basic algorithm
Online version of the algorithm
3 OAC-triclustering on MapReduce
MapReduce technology
MapReduce implementation
4 Experiments
Description of the experiments
Datasets
Results
5 Conclusion
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 8 / 39
Prime OAC-triclustering
Formal concept analysis: triadic case
Definition
Let G, M, B be sets and the ternary relation I be a subset of their Cartesian
product: I ⊆ G × M × B. Then the tuple K = (G, M, B, I) is called a triadic
formal context.
G is a set of objects, M is a set of attributes, B is a set of conditions.
GM m1 m2 m3 m1 m2 m3 m1 m2 m3
g1 x x x x x x x x
g2 x x x x x
g3 x x x x
g4 x x x x x x
B b1 b2 b3
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 9 / 39
Prime OAC-triclustering
Formal concept analysis: triadic case
Definition
Galois operators (prime operators) are defined in similar way to the dyadic case:
2G
→ 2M
× 2B
2G
× 2M
→ 2B
2M
→ 2G
× 2B
2G
× 2B
→ 2M
2B
→ 2G
× 2M
2M
× 2B
→ 2G
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 9 / 39
Prime OAC-triclustering
Formal concept analysis: triadic case
GM m1 m2 m3 m1 m2 m3 m1 m2 m3
g1 x x x x x x x x
g2 x x x x x
g3 x x x x
g4 x x x x x x
B b1 b2 b3
({g1, g2}, {m1, m2})′
= {b1, b3}
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 9 / 39
Prime OAC-triclustering
Formal concept analysis: triadic case
GM m1 m2 m3 m1 m2 m3 m1 m2 m3
g1 x x x x x x x x
g2 x x x x x
g3 x x x x
g4 x x x x x x
B b1 b2 b3
m′
2 = {(g1, b1), (g2, b1), (g3, b1), (g1, b2), (g1, b3), (g2, b3), (g4, b3)}
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 9 / 39
Prime OAC-triclustering
Formal concept analysis: triadic case
Definition
The triple (X, Y , Z) is called triadic formal concept of the context
K = (G, M, B, I), if X ⊆ G,Y ⊆ M, Z ⊆ B, (X, Y )′
= Z, (X, Z)′
= Y ,
(Y , Z)′
= X.
X is called (formal) extent, Y — (formal) intent, Z — (formal) modus.
GM m1 m2 m3 m1 m2 m3 m1 m2 m3
g1 x x x x x x x x
g2 x x x x x
g3 x x x x
g4 x x x x x x
B b1 b2 b3
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 9 / 39
Prime OAC-triclustering
Basic algorithm [Gnatyshak et al., 2013]
This method uses the following types of prime operators (for the context
K = (G, M, B, I)):
(g, m)′
= {b ∈ B | (g, m, b) ∈ I},
(g, b)′
= {m ∈ M | (g, m, b) ∈ I},
(m, b)′
= {g ∈ G | (g, m, b) ∈ I}
Definition
Then the triple T = ((m, b)′
, (g, b)′
, (g, m)′
) is called the prime-based
OAC-tricluster for a triple (g, m, b) ∈ I. The sets of tricluster are called,
respectively, tricluster extent, intent, and modus. Triple (g, m, b) is called a
generating triple of the tricluster T.
Definition
Density of a tricluster: ρ(X, Y , Z) = |I∩(X×Y ×Z)|
|X||Y ||Z|
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 10 / 39
Prime OAC-triclustering
Basic algorithm
An example of a tricluster based on triple (g, m, b):
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 11 / 39
Prime OAC-triclustering
Basic algorithm
Input: K = (G, M, B, I) — triadic context;
ρmin — density threshold
Output: T = {T = (X, Y , Z)}
1: T := ∅
2: for all (g, m): g ∈ G,m ∈ M do
3: PrimesObjAttr[g, m] = (g, m)′
4: end for
5: for all (g, b): g ∈ G,b ∈ B do
6: PrimesObjCond[g, b] = (g, b)′
7: end for
8: for all (m, b): m ∈ M,b ∈ B do
9: PrimesAttrCond[m, b] = (m, b)′
10: end for
11: for all (g, m, b) ∈ I do
12: T = (PrimesAttrCond[m, b], PrimesObjCond[g, b], PrimesObjAttr[g, m])
13: Tkey = hash(T)
14: if Tkey ̸∈ T .keys ∧ ρ(T) ≥ ρmin then
15: T [Tkey] := T
16: end if
17: end for
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 12 / 39
Prime OAC-triclustering
Online version of the algorithm [Gnatyshak et al., 2014]
Let K = (G, M, B, I) be a triadic context. We do not know G, M, B, I, or their
cardinalities in advance.
Input on each iteration: {(g, m, b)} = J ⊆ I.
Goal: maintain an updated version of the results and efficiently update them when
new triples are received.
We need to keep in memory the results of prime operators’ application (prime
sets):
PrimesObjAttr — dictionary with elements of type ((g, m), {b ∈ B}), g ∈ G,
m ∈ M;
PrimesObjCond — dictionary with elements of type ((g, b), {m ∈ M}),
g ∈ G, b ∈ B;
PrimesAttrCond — dictionary with elements of type ((m, b), {g ∈ G}),
m ∈ M, b ∈ B.
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 13 / 39
Prime OAC-triclustering
Online version of the algorithm
Remark
In this case we need to consider triclusters based on different triples different, even
if their extents, intents, and modi are equal.
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 14 / 39
Prime OAC-triclustering
Online version of the algorithm
Algorithm of triples addition:
Input: J is a set of triples to add;
T = {T = (∗X, ∗Y , ∗Z)} is a current tricluster set;
PrimesObjAttr, PrimesObjCond, PrimesAttrCond;
Output: T = {T = (∗X, ∗Y , ∗Z)};
PrimesObjAttr, PrimesObjCond, PrimesAttrCond;
1: for all (g, m, b) ∈ J do
2: PrimesObjAttr[g, m] := PrimesObjAttr[g, m] ∪ b
3: PrimesObjCond[g, b] := PrimesObjCond[g, b] ∪ m
4: PrimesAttrCond[m, b] := PrimesAttrCond[m, b] ∪ g
5: T :=
T ∪ (&PrimesAttrCond[m, b], &PrimesObjCond[g, b], &PrimesObjAttr[g, m])
6: end for
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 15 / 39
Prime OAC-triclustering
Online version of the algorithm
A user may require to remove the triclusters with the same extent, intent and
modus at the post-processing stage. At this stage we can also check various
conditions (for instance, minimal density condition).
Input: T = {T = (∗X, ∗Y , ∗Z)} is a current tricluster set;
Output: T = {T = (∗X, ∗Y , ∗Z)} — processed tricluster hash-set;
1: for all T ∈ T do
2: Compute hash(T)
3: if hash(T) ̸∈ T .keys() then
4: T := T ∪ T
5: end if
6: end for
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 16 / 39
Prime OAC-triclustering
Online version of the algorithm
Complexity summary:
Time complexity: O(|I|) (as there is a constant number of operations on
each step);
More precisely: 8|I| operations in total;
1 Modification of 3 prime sets (3);
2 Creation of a new tricluster (1);
3 Addition of pointers to its extent, intent, and modus (3);
4 Addition of the tricluster to the set of all triclusters (1).
Memory complexity: O(|I|) (as we need to keep in memory only prime sets,
|I| elements in each dictionary + keys).
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 17 / 39
Prime OAC-triclustering
Online version of the algorithm
Example:
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
Prime OAC-triclustering
Online version of the algorithm
→ (g1, m1, b1)
1 PrimesObjAttr = {((g1, m1), {b1})}
2 PrimesObjCond = {((g1, b1), {m1})}
3 PrimesAttrCond = {((m1, b1), {g1})}
4 T := T ∪ {PrimesAttrCond[m1, b1], PrimesObjCond[g1, b1], PrimesObjAttr[g1, m1]}
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
Prime OAC-triclustering
Online version of the algorithm
→ (g1, m2, b1)
1 PrimesObjAttr = {((g1, m1), {b1}), ((g1, m2), {b1})}
2 PrimesObjCond = {((g1, b1), {m1, m2})}
3 PrimesAttrCond = {((m1, b1), {g1}), ((m2, b1), {g1})}
4 T := T ∪ {PrimesAttrCond[m2, b1], PrimesObjCond[g1, b1], PrimesObjAttr[g1, m2]}
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
Prime OAC-triclustering
Online version of the algorithm
→ (g2, m1, b1)
1 PrimesObjAttr = {((g1, m1), {b1}), ((g1, m2), {b1}), ((g2, m1), {b1})}
2 PrimesObjCond = {((g1, b1), {m1, m2}), ((g2, b1), {m1})}
3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1})}
4 T := T ∪ {PrimesAttrCond[m1, b1], PrimesObjCond[g2, b1], PrimesObjAttr[g2, m1]}
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
Prime OAC-triclustering
Online version of the algorithm
→ (g2, m2, b1)
1 PrimesObjAttr = {((g1, m1), {b1}), ((g1, m2), {b1}), ((g2, m1), {b1}), ((g2, m2), {b1})}
2 PrimesObjCond = {((g1, b1), {m1, m2}), ((g2, b1), {m1, m2})}
3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1, g2})}
4 T := T ∪ {PrimesAttrCond[m2, b1], PrimesObjCond[g2, b1], PrimesObjAttr[g2, m2]}
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
Prime OAC-triclustering
Online version of the algorithm
→ (g3, m3, b1)
1 PrimesObjAttr =
{((g1, m1), {b1}), ((g1, m2), {b1}), ((g2, m1), {b1}), ((g2, m2), {b1}), ((g3, m3), {b1})}
2 PrimesObjCond = {((g1, b1), {m1, m2}), ((g2, b1), {m1, m2}), ((g3, b1), {m3})}
3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1, g2}), ((m3, b1), {g3})}
4 T := T ∪ {PrimesAttrCond[m3, b1], PrimesObjCond[g3, b1], PrimesObjAttr[g3, m3]}
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
Prime OAC-triclustering
Online version of the algorithm
→ (g1, m2, b2)
1 PrimesObjAttr = {((g1, m1), {b1}), ((g1, m2), {b1, b2}), ((g2, m1),
{b1}), ((g2, m2), {b1}), ((g3, m3), {b1})}
2 PrimesObjCond = {((g1, b1), {m1, m2}), ((g2, b1), {m1, m2}), ((g3, b1),
{m3}), ((g1, b2), {m2})}
3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1, g2}), ((m3, b1),
{g3}), ((m2, b2), {g1})}
4 T := T ∪ {PrimesAttrCond[m2, b2], PrimesObjCond[g1, b2], PrimesObjAttr[g1, m2]}
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
Prime OAC-triclustering
Online version of the algorithm
→ (g2, m1, b2)
1 PrimesObjAttr = {((g1, m1), {b1}), ((g1, m2), {b1, b2}), ((g2, m1), {b1, b2}),
((g2, m2), {b1}), ((g3, m3), {b1})}
2 PrimesObjCond = {((g1, b1), {m1, m2}), ((g2, b1), {m1, m2}), ((g3, b1), {m3}),
((g1, b2), {m2}), ((g2, b2), {m1})}
3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1, g2}), ((m3, b1), {g3}),
((m2, b2), {g1}), ((m1, b2), {g2})}
4 T := T ∪ {PrimesAttrCond[m1, b2], PrimesObjCond[g2, b2], PrimesObjAttr[g2, m1]}
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
Prime OAC-triclustering
Online version of the algorithm
→ (g2, m2, b2)
1 PrimesObjAttr = {((g1, m1), {b1}), ((g1, m2), {b1, b2}), ((g2, m1), {b1, b2}),
((g2, m2), {b1, b2}), ((g3, m3), {b1})}
2 PrimesObjCond = {((g1, b1), {m1, m2}), ((g2, b1), {m1, m2}), ((g3, b1), {m3}),
((g1, b2), {m2}), ((g2, b2), {m1, m2})}
3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1, g2}), ((m3, b1), {g3}),
((m2, b2), {g1, g2}), ((m1, b2), {g2})}
4 T := T ∪ {PrimesAttrCond[m2, b2], PrimesObjCond[g2, b2], PrimesObjAttr[g2, m2]}
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
Prime OAC-triclustering
Online version of the algorithm
→ (g3, m3, b2)
1 PrimesObjAttr = {((g1, m1), {b1}), ((g1, m2), {b1, b2}), ((g2, m1), {b1, b2}), ((g2, m2),
{b1, b2}), ((g3, m3), {b1, b2})}
2 PrimesObjCond = {((g1, b1), {m1, m2}), ((g2, b1), {m1, m2}), ((g3, b1), {m3}), ((g1, b2),
{m2}), ((g2, b2), {m1, m2}), ((g3, b2), {m3})}
3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1, g2}), ((m3, b1), {g3}),
((m2, b2), {g1, g2}), ((m1, b2), {g2}), ((m3, b2), {g3})}
4 T := T ∪ {PrimesAttrCond[m3, b2], PrimesObjCond[g3, b2], PrimesObjAttr[g3, m3]}
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
Prime OAC-triclustering
Online version of the algorithm
Postprocessing:
1 T(g1,m1,b1) = (g1, g2, m1, m2, b1) ← add
2 T(g1,m2,b1) = (g1, g2, m1, m2, b1, b2) ← add
3 T(g2,m1,b1) = (g1, g2, m1, m2, b1, b2) ← the same as T(g1,m2,b1), skip
4 T(g2,m2,b1) = (g1, g2, m1, m2, b1, b2) ← the same as T(g1,m2,b1), skip
5 T(g3,m3,b1) = (g3, m3, b1, b2) ← add
6 T(g1,m2,b2) = (g1, g2, m2, b1, b2) ← add
7 T(g2,m1,b2) = (g2, m1, m2, b1, b2) ← add
8 T(g2,m2,b2) = (g1, g2, m1, m2, b1, b2) ← the same as T(g1,m2,b1), skip
9 T(g3,m3,b2) = (g3, m3, b1, b2) ← the same as T(g3,m3,b1), skip
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
Prime OAC-triclustering
Online version of the algorithm
The final output set of triclusters:
1 T1 = ({g1, g2}, {m1, m2}, {b1})
2 T2 = ({g1, g2}, {m1, m2}, {b1, b2})
3 T3 = ({g3}, {m3}, {b1, b2})
4 T4 = ({g1, g2}, {m2}, {b1, b2})
5 T5 = ({g2}, {m1, m2}, {b1, b2})
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
Outline
1 Motivation and previous work
2 Prime OAC-triclustering
Triadic Formal concept analysis
Basic algorithm
Online version of the algorithm
3 OAC-triclustering on MapReduce
MapReduce technology
MapReduce implementation
4 Experiments
Description of the experiments
Datasets
Results
5 Conclusion
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 19 / 39
MapReduce Technology
MapReduce scheme [Dean and Ghemawat, 2004]
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 20 / 39
MapReduce Technology
MapReduce example
Figure: Word counting. Source:
http://blog.trifork.com/2009/08/04/introduction-to-hadoop/
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 21 / 39
MapReduce Technology
Communication costs: Mining of Massive Datasets [Leskovec et al., 2013]
Chapter 2: MapReduce and the New Software Stack
“Replication Rate and Reducer Size: It is often convenient to measure
communication by the replication rate, which is the communication per input.
Also, the reducer size is the maximum number of inputs associated with any
reducer. For many problems, it is possible to derive a lower bound on replication
rate as a function of the reducer size.”
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 22 / 39
MapReduce Implementation
The previous lattice-oriented M/R implementations
A version of Close-by-One algorithm was ported to M/R framework [Krajca
& Vychodil, 2009]
A M/R algorithm for computation of closed cube lattices was proposed
[Kudryavcev & Kuznecov, 2009]
[Xu et al., 2012] demonstrated that iterative algorithms like Ganter’s
NextClosure can benefit from the usage of iterative M/R schemes
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 23 / 39
MapReduce Implementation
Technologies and code repositories
Technologies used
Apache Hadoop 1
Apache Maven (framework for automatic project assembling)
Apache Commons (for work with extended Java collections)
Google Guava (utilities and data structures)
Jackson JSON (open-source library for transformation of object-oriented
representation of an object like tricluster to string)
TypeTools (for real-time type resolution of inbound and outbound key-value
pairs)
. . .
Implementations
Source 1: “Chaining-job” module2
Source 2: M/R-based OAC Triclustering3
1http://hadoop.apache.org/
2https://github.com/zydins/chaining-job
3https://github.com/zydins/DistributedTriclustering
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 24 / 39
Two-stage MapReduce Implementation
Distributed OAC-triclustering: First Map
Input: S is a set of input triples as strings;
r is a number of reducers;
i is a grouping index (objects, attributes or conditions).
Output: ˜J is a list of ⟨key, triple⟩ pairs.
1: for all s ∈ S do
2: t := transform(s)
3: key := hash(t[i]) mod r
4: ˜J := ˜J ∪ {⟨key, t⟩}
5: end for
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 25 / 39
Two-stage MapReduce Implementation
Distributed OAC-triclustering: First Reduce
Input: J is a list of triples (for a certain key);
T = {T = (X, Y , Z)} is a current set of triclusters;
PrimesOA, PrimesOC, PrimesAC.
Output: file of strings – encoded ⟨triple, tricluster⟩ pairs.
1: Primes ← initialise a new multimap
2: for all (g, m, b) ∈ J do
3: Primes[g, m] := Primes[g, m] ∪ {b}
4: Primes[g, b] := Primes[g, b] ∪ {m}
5: Primes[m, b] := Primes[m, b] ∪ {g}
6: end for
7: for all (g, m, b) ∈ J do
8: T := (set(Primes[m, b]), set(Primes[g, b]), set(Primes[g, m]))
9: s := encode(⟨(g, m, b), T⟩)
10: store s
11: end for
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 26 / 39
Two-stage MapReduce Implementation
Distributed OAC-triclustering: Second Map
Input: S is a list of strings.
Output: ˜T is an list of ⟨tricluster, tricluster⟩ pairs.
1: Primes ← initialise a new multimap
2: for all s ∈ S do
3: ⟨(g, m, b), T⟩ := decode(s)
4: update Primes multimap appropriately
5: I := I ∪ {(g, m, b)}
6: end for
7: for all (g, m, b) ∈ I do
8: T := (set(Primes[m, b]), set(Primes[g, b]), set(Primes[g, m]))
9: ˜T := ˜T ∪ {⟨T, T⟩}
10: end for
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 27 / 39
Two-stage MapReduce Implementation
Distributed OAC-triclustering: Second Reduce
Input: ˆT is a list of ⟨tricluster, list of triclusters⟩ pairs.
Output: File with a final set of triclusters {T = (X, Y , Z)}.
1: for all ⟨T, [T, . . . , T]⟩ ∈ ˆT do
2: store T
3: end for
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 28 / 39
Two-stage MapReduce Implementation
Communication costs
The time complexity of the M/R solution is composed from two terms for
each stage: O(|I|/r) (or O(|I|)) and O(|I|).
The replication rate for the first M/R stage r1 = 1 (each triple is passed as
one key-value pair), the reducer size q1 = |I|/r
The replication rate for the second M/R stage is r2 = 1 (it assigns one
key-value pair for each tricluster), but the reducer size varies from qmin
2 = 1
(no duplicate triclusters) and qmax
2 = |I| (one final tricluster when all the
initial triples belong to one absolutely dense cuboid).
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 29 / 39
Outline
1 Motivation and previous work
2 Prime OAC-triclustering
Triadic Formal concept analysis
Basic algorithm
Online version of the algorithm
3 OAC-triclustering on MapReduce
MapReduce technology
MapReduce implementation
4 Experiments
Description of the experiments
Datasets
Results
5 Conclusion
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 30 / 39
Experiments
Description of the experiments
OS X 10, 1.8 GHz Intel Core i5, 4 Gb 1600 MHz DDR3 and 8 Gb free space
on the hard drive (a typical commodity hardware).
Two M/R modes have been tested: sequential mode of tasks completion and
emulation of distributed one with 16 first reducers and 32 threads for the
second stage.
To evaluate the runtime more carefully, for each context the average result of
5 runs of the algorithms has been recorded.
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 31 / 39
Experiments
Datasets
Synthetic datasets. 1) 20,000 triples (25 unique entities of each type); 2) 100,000 triples (50
unique entities of each type); 3) 1,000,000 triples (all possible combinations of 100 unique
entities of each type).
The 1st dataset contains duplicates since 25 × 25 × 25 gives only 15,625 unique triples. The 2nd
one contains less triples than 503 = 125, 000, the number of all possible combinations. The 3rd
one is an absolutely dense cuboid 100 × 100 × 100.
The 3rd dataset does not result in 3min(|G|,|M|,|B|) formal triconcepts, this is an example of the
worst case scenario for the second reducer (qmax
2 = |I|).
IMDB. Top-250 list of the best movies from Internet Movie Database
Bibsonomy. The data of bibsonomy.org from ECML PKDD discovery challenge 2008.
Context |G| |M| |B| # triples Density
20k 25 25 25 20,000 1
100k 50 50 50 100,000 0.8
1m 100 100 100 1,000,000 1
IMDB 250 795 22 3,818 0.00087
BibSonomy 2,337 67,464 28,920 816,197 1.8 · 10−7
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 32 / 39
Experiments
Results
Algorithm/Context IMDB 20k 100k 1m Bibsonomy
(≈3k triples) triples triples triples (≈800k triples)
Tribox 324 800 1,265 >3,000 >3,000
TRIAS 189 362 862 >3,000 >3,000
OAC Box 374 756 1,265 >3,000 >3,000
OAC Prime 7 8 734 >3,000 >3,000
Online OAC prime 3 3 3 5 >3,000
M/R OAC prime seq. 12 30 81 166 1,534
M/R OAC prime distr. 1 15 20 25 520
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 33 / 39
Alternative MapReduce decomposition
Variant I: First stage
First Map: Finding primes. During this phase every input triple (g, m, b) is
encoded by three key-value pairs ⟨(g, m), b⟩, ⟨(g, b), m⟩, and ⟨(m, b), g⟩. These
pairs are passed to the first reducer.
The replication rate is r1 = 3.
First Reduce: Finding primes. This reducer fills three corresponding dictionaries
for primes of keys. So, for example, the first dictionary, PrimeOA contains
key-value pairs ⟨(g, m), {b1, b2, . . . , bn}⟩.
The reducer size is q1 = max(|G|, |M|, |B|)
The process can be stopped after the first reduce phase and all the triclusters
found as (Prime[g, m], Prime[g, b], Prime[m, b]) each by enumeration of
(g, m, b) ∈ I. However, to do it faster and keep the result for further
computation, it is possible to use M/R as well.
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 34 / 39
Alternative MapReduce decomposition
Variant I: Second stage
Second Map: Tricluster generation. The second map does tricluster combining
job, i.e. for each triple (g, m, b) it composes the new key-value pair, ⟨(g, m, b), ∅⟩.
And for each pair of either type, ⟨(g, m), Prime[g, m]⟩, ⟨(g, b), Prime[g, b]⟩, and
⟨(m, b), Prime[m, b]⟩ it generates key-values pairs ⟨(g, m, ˜b), Prime[g, m]⟩,
⟨(g, ˜m, b), PrimeOC[g, b]⟩, and ⟨(˜g, m, b), Prime[m, b]⟩, where ˜g ∈ G, ˜m ∈ M,
and ˜b ∈ B.
r2 = (|I| + 3|G||M||B|)/(|I| + |G||M| + |G||B| + |M||B|) ≤
(ρ + 3)/(ρ + 3/max(|G|, |M|, |B|)), where ρ is the input tricontext density.
Second Reduce: Tricluster generation. The second reducer just assembles only
one value for each key (g, m, b), the generating triple, its tricluster, (Prime[g, m],
Prime[g, b], Prime[m, b]). If there is no key-value pair ⟨(g, m, b), ∅⟩ for a
particular triple (g, m, b), it does not output any key-value pair for the key.
The reducer size q2 is either 3 (no output) or 4 (tricluster assembled).
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 35 / 39
Alternative MapReduce decomposition
Variant II: Second stage
Second Map: Tricluster generation with duplicate generating triples.
Second map does tricluster combining job, i.e. for each triple (g, m, b) it
composes a new key-value pair:
⟨(Prime[g, m], Prime[g, b], Prime[m, b]), (g, m, b)⟩.
Second Map: Tricluster generation with duplicate generating triples.
The second reducer just groups values for each key: ⟨(X, Y , Z), {(g1, m1, b1), . . . ,
(gn, mn, bn)}⟩.
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 36 / 39
Outline
1 Motivation and previous work
2 Prime OAC-triclustering
Triadic Formal concept analysis
Basic algorithm
Online version of the algorithm
3 OAC-triclustering on MapReduce
MapReduce technology
MapReduce implementation
4 Experiments
Description of the experiments
Datasets
Results
5 Conclusion
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 37 / 39
Conclusion and further work
MapReduce Prime OAC-triclustering implementation has been proposed.
Communication costs have been analysed.
Comparison of the online version and M/R one has been performed.
Further experiments are needed with other M/R variants and other
triclustering algorithms.
A proper comparison of the proposed OAC triclustering and noise tolerant
patterns in n-ary relations, e.g., by DataPeeler descendants [Cerf et al., 2013]
is not yet conducted.
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 38 / 39
Thank you!
Questions?
S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 39 / 39

More Related Content

What's hot

Slides: Simplifying Gaussian Mixture Models Via Entropic Quantization (EUSIPC...
Slides: Simplifying Gaussian Mixture Models Via Entropic Quantization (EUSIPC...Slides: Simplifying Gaussian Mixture Models Via Entropic Quantization (EUSIPC...
Slides: Simplifying Gaussian Mixture Models Via Entropic Quantization (EUSIPC...Frank Nielsen
 
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)MeetupDataScienceRoma
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsJonny Daenen
 
A Note on Correlated Topic Models
A Note on Correlated Topic ModelsA Note on Correlated Topic Models
A Note on Correlated Topic ModelsTomonari Masada
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
A study of the worst case ratio of a simple algorithm for simple assembly lin...
A study of the worst case ratio of a simple algorithm for simple assembly lin...A study of the worst case ratio of a simple algorithm for simple assembly lin...
A study of the worst case ratio of a simple algorithm for simple assembly lin...narmo
 
論文紹介 Fast imagetagging
論文紹介 Fast imagetagging論文紹介 Fast imagetagging
論文紹介 Fast imagetaggingTakashi Abe
 
YaPingPresentation
YaPingPresentationYaPingPresentation
YaPingPresentationYa-Ping Wang
 
Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...Till Blume
 
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...T. E. BOGALE
 
R-ggplot2 package Examples
R-ggplot2 package ExamplesR-ggplot2 package Examples
R-ggplot2 package ExamplesDr. Volkan OBAN
 
Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT  posterLitvinenko low-rank kriging +FFT  poster
Litvinenko low-rank kriging +FFT posterAlexander Litvinenko
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
 

What's hot (19)

Slides: Simplifying Gaussian Mixture Models Via Entropic Quantization (EUSIPC...
Slides: Simplifying Gaussian Mixture Models Via Entropic Quantization (EUSIPC...Slides: Simplifying Gaussian Mixture Models Via Entropic Quantization (EUSIPC...
Slides: Simplifying Gaussian Mixture Models Via Entropic Quantization (EUSIPC...
 
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
 
Spsp fw
Spsp fwSpsp fw
Spsp fw
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
 
2-rankings of Graphs
2-rankings of Graphs2-rankings of Graphs
2-rankings of Graphs
 
A Note on Correlated Topic Models
A Note on Correlated Topic ModelsA Note on Correlated Topic Models
A Note on Correlated Topic Models
 
Subquad multi ff
Subquad multi ffSubquad multi ff
Subquad multi ff
 
Ponchon Savarait
Ponchon SavaraitPonchon Savarait
Ponchon Savarait
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
A study of the worst case ratio of a simple algorithm for simple assembly lin...
A study of the worst case ratio of a simple algorithm for simple assembly lin...A study of the worst case ratio of a simple algorithm for simple assembly lin...
A study of the worst case ratio of a simple algorithm for simple assembly lin...
 
QMC: Operator Splitting Workshop, Forward-Backward Splitting Algorithm withou...
QMC: Operator Splitting Workshop, Forward-Backward Splitting Algorithm withou...QMC: Operator Splitting Workshop, Forward-Backward Splitting Algorithm withou...
QMC: Operator Splitting Workshop, Forward-Backward Splitting Algorithm withou...
 
論文紹介 Fast imagetagging
論文紹介 Fast imagetagging論文紹介 Fast imagetagging
論文紹介 Fast imagetagging
 
YaPingPresentation
YaPingPresentationYaPingPresentation
YaPingPresentation
 
Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...
 
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
 
R-ggplot2 package Examples
R-ggplot2 package ExamplesR-ggplot2 package Examples
R-ggplot2 package Examples
 
Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT  posterLitvinenko low-rank kriging +FFT  poster
Litvinenko low-rank kriging +FFT poster
 
Ceske budevice
Ceske budeviceCeske budevice
Ceske budevice
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 

Viewers also liked

A lattice-based consensus clustering
A lattice-based consensus clusteringA lattice-based consensus clustering
A lattice-based consensus clusteringDmitrii Ignatov
 
AIST 2016 Opening Slides
AIST 2016 Opening SlidesAIST 2016 Opening Slides
AIST 2016 Opening SlidesDmitrii Ignatov
 
A lattice-based consensus clustering
A lattice-based consensus clusteringA lattice-based consensus clustering
A lattice-based consensus clusteringDmitrii Ignatov
 
Experimental Economics and Machine Learning workshop
Experimental Economics and Machine Learning workshopExperimental Economics and Machine Learning workshop
Experimental Economics and Machine Learning workshopDmitrii Ignatov
 
NIPS 2016, Tensor-Learn@NIPS, and IEEE ICDM 2016
NIPS 2016, Tensor-Learn@NIPS, and IEEE ICDM 2016NIPS 2016, Tensor-Learn@NIPS, and IEEE ICDM 2016
NIPS 2016, Tensor-Learn@NIPS, and IEEE ICDM 2016Dmitrii Ignatov
 
Pattern-based classification of demographic sequences
Pattern-based classification of demographic sequencesPattern-based classification of demographic sequences
Pattern-based classification of demographic sequencesDmitrii Ignatov
 
Context-Aware Recommender System Based on Boolean Matrix Factorisation
Context-Aware Recommender System Based on Boolean Matrix FactorisationContext-Aware Recommender System Based on Boolean Matrix Factorisation
Context-Aware Recommender System Based on Boolean Matrix FactorisationDmitrii Ignatov
 
Поиск частых множеств признаков (товаров) и ассоциативные правила
Поиск частых множеств признаков (товаров) и ассоциативные правилаПоиск частых множеств признаков (товаров) и ассоциативные правила
Поиск частых множеств признаков (товаров) и ассоциативные правилаDmitrii Ignatov
 
On the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCAOn the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCADmitrii Ignatov
 
RAPS: A Recommender Algorithm Based on Pattern Structures
RAPS: A Recommender Algorithm Based on Pattern StructuresRAPS: A Recommender Algorithm Based on Pattern Structures
RAPS: A Recommender Algorithm Based on Pattern StructuresDmitrii Ignatov
 
Pattern Mining and Machine Learning for Demographic Sequences
Pattern Mining and Machine Learning for Demographic SequencesPattern Mining and Machine Learning for Demographic Sequences
Pattern Mining and Machine Learning for Demographic SequencesDmitrii Ignatov
 
Searching for optimal patterns in Boolean tensors
Searching for optimal patterns in Boolean tensorsSearching for optimal patterns in Boolean tensors
Searching for optimal patterns in Boolean tensorsDmitrii Ignatov
 
Введение в рекомендательные системы. 3 case-study без NetFlix.
Введение в рекомендательные системы. 3 case-study без NetFlix.Введение в рекомендательные системы. 3 case-study без NetFlix.
Введение в рекомендательные системы. 3 case-study без NetFlix.Dmitrii Ignatov
 
Boolean matrix factorisation for collaborative filtering
Boolean matrix factorisation for collaborative filteringBoolean matrix factorisation for collaborative filtering
Boolean matrix factorisation for collaborative filteringDmitrii Ignatov
 
Intro to Data Mining and Machine Learning
Intro to Data Mining and Machine LearningIntro to Data Mining and Machine Learning
Intro to Data Mining and Machine LearningDmitrii Ignatov
 
Online Recommender System for Radio Station Hosting: Experimental Results Rev...
Online Recommender System for Radio Station Hosting: Experimental Results Rev...Online Recommender System for Radio Station Hosting: Experimental Results Rev...
Online Recommender System for Radio Station Hosting: Experimental Results Rev...Dmitrii Ignatov
 

Viewers also liked (17)

A lattice-based consensus clustering
A lattice-based consensus clusteringA lattice-based consensus clustering
A lattice-based consensus clustering
 
AIST 2016 Opening Slides
AIST 2016 Opening SlidesAIST 2016 Opening Slides
AIST 2016 Opening Slides
 
A lattice-based consensus clustering
A lattice-based consensus clusteringA lattice-based consensus clustering
A lattice-based consensus clustering
 
Experimental Economics and Machine Learning workshop
Experimental Economics and Machine Learning workshopExperimental Economics and Machine Learning workshop
Experimental Economics and Machine Learning workshop
 
NIPS 2016, Tensor-Learn@NIPS, and IEEE ICDM 2016
NIPS 2016, Tensor-Learn@NIPS, and IEEE ICDM 2016NIPS 2016, Tensor-Learn@NIPS, and IEEE ICDM 2016
NIPS 2016, Tensor-Learn@NIPS, and IEEE ICDM 2016
 
Pattern-based classification of demographic sequences
Pattern-based classification of demographic sequencesPattern-based classification of demographic sequences
Pattern-based classification of demographic sequences
 
Sequence mining
Sequence miningSequence mining
Sequence mining
 
Context-Aware Recommender System Based on Boolean Matrix Factorisation
Context-Aware Recommender System Based on Boolean Matrix FactorisationContext-Aware Recommender System Based on Boolean Matrix Factorisation
Context-Aware Recommender System Based on Boolean Matrix Factorisation
 
Поиск частых множеств признаков (товаров) и ассоциативные правила
Поиск частых множеств признаков (товаров) и ассоциативные правилаПоиск частых множеств признаков (товаров) и ассоциативные правила
Поиск частых множеств признаков (товаров) и ассоциативные правила
 
On the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCAOn the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCA
 
RAPS: A Recommender Algorithm Based on Pattern Structures
RAPS: A Recommender Algorithm Based on Pattern StructuresRAPS: A Recommender Algorithm Based on Pattern Structures
RAPS: A Recommender Algorithm Based on Pattern Structures
 
Pattern Mining and Machine Learning for Demographic Sequences
Pattern Mining and Machine Learning for Demographic SequencesPattern Mining and Machine Learning for Demographic Sequences
Pattern Mining and Machine Learning for Demographic Sequences
 
Searching for optimal patterns in Boolean tensors
Searching for optimal patterns in Boolean tensorsSearching for optimal patterns in Boolean tensors
Searching for optimal patterns in Boolean tensors
 
Введение в рекомендательные системы. 3 case-study без NetFlix.
Введение в рекомендательные системы. 3 case-study без NetFlix.Введение в рекомендательные системы. 3 case-study без NetFlix.
Введение в рекомендательные системы. 3 case-study без NetFlix.
 
Boolean matrix factorisation for collaborative filtering
Boolean matrix factorisation for collaborative filteringBoolean matrix factorisation for collaborative filtering
Boolean matrix factorisation for collaborative filtering
 
Intro to Data Mining and Machine Learning
Intro to Data Mining and Machine LearningIntro to Data Mining and Machine Learning
Intro to Data Mining and Machine Learning
 
Online Recommender System for Radio Station Hosting: Experimental Results Rev...
Online Recommender System for Radio Station Hosting: Experimental Results Rev...Online Recommender System for Radio Station Hosting: Experimental Results Rev...
Online Recommender System for Radio Station Hosting: Experimental Results Rev...
 

Similar to Putting OAC-triclustering on MapReduce

IRJET - Some Results on Fuzzy Semi-Super Modular Lattices
IRJET - Some Results on Fuzzy Semi-Super Modular LatticesIRJET - Some Results on Fuzzy Semi-Super Modular Lattices
IRJET - Some Results on Fuzzy Semi-Super Modular LatticesIRJET Journal
 
Graph Analytics and Complexity Questions and answers
Graph Analytics and Complexity Questions and answersGraph Analytics and Complexity Questions and answers
Graph Analytics and Complexity Questions and answersAnimesh Chaturvedi
 
Stochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersStochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersTaiji Suzuki
 
Number theoretic-rsa-chailos-new
Number theoretic-rsa-chailos-newNumber theoretic-rsa-chailos-new
Number theoretic-rsa-chailos-newChristos Loizos
 
New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...Alexander Litvinenko
 
Radix-3 Algorithm for Realization of Type-II Discrete Sine Transform
Radix-3 Algorithm for Realization of Type-II Discrete Sine TransformRadix-3 Algorithm for Realization of Type-II Discrete Sine Transform
Radix-3 Algorithm for Realization of Type-II Discrete Sine TransformIJERA Editor
 
Radix-3 Algorithm for Realization of Type-II Discrete Sine Transform
Radix-3 Algorithm for Realization of Type-II Discrete Sine TransformRadix-3 Algorithm for Realization of Type-II Discrete Sine Transform
Radix-3 Algorithm for Realization of Type-II Discrete Sine TransformIJERA Editor
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2izahn
 
DISTANCE TWO LABELING FOR MULTI-STOREY GRAPHS
DISTANCE TWO LABELING FOR MULTI-STOREY GRAPHSDISTANCE TWO LABELING FOR MULTI-STOREY GRAPHS
DISTANCE TWO LABELING FOR MULTI-STOREY GRAPHSgraphhoc
 
ICML2016: Low-rank tensor completion: a Riemannian manifold preconditioning a...
ICML2016: Low-rank tensor completion: a Riemannian manifold preconditioning a...ICML2016: Low-rank tensor completion: a Riemannian manifold preconditioning a...
ICML2016: Low-rank tensor completion: a Riemannian manifold preconditioning a...Hiroyuki KASAI
 
A kernel-free particle method: Smile Problem Resolved
A kernel-free particle method: Smile Problem ResolvedA kernel-free particle method: Smile Problem Resolved
A kernel-free particle method: Smile Problem ResolvedKaiju Capital Management
 

Similar to Putting OAC-triclustering on MapReduce (20)

A Polynomial-Space Exact Algorithm for TSP in Degree-5 Graphs
A Polynomial-Space Exact Algorithm for TSP in Degree-5 GraphsA Polynomial-Space Exact Algorithm for TSP in Degree-5 Graphs
A Polynomial-Space Exact Algorithm for TSP in Degree-5 Graphs
 
IRJET - Some Results on Fuzzy Semi-Super Modular Lattices
IRJET - Some Results on Fuzzy Semi-Super Modular LatticesIRJET - Some Results on Fuzzy Semi-Super Modular Lattices
IRJET - Some Results on Fuzzy Semi-Super Modular Lattices
 
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
 
Presentation.pdf
Presentation.pdfPresentation.pdf
Presentation.pdf
 
Cmb part3
Cmb part3Cmb part3
Cmb part3
 
Graph Analytics and Complexity Questions and answers
Graph Analytics and Complexity Questions and answersGraph Analytics and Complexity Questions and answers
Graph Analytics and Complexity Questions and answers
 
Stochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersStochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of Multipliers
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
 
Number theoretic-rsa-chailos-new
Number theoretic-rsa-chailos-newNumber theoretic-rsa-chailos-new
Number theoretic-rsa-chailos-new
 
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
 
New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...
 
Radix-3 Algorithm for Realization of Type-II Discrete Sine Transform
Radix-3 Algorithm for Realization of Type-II Discrete Sine TransformRadix-3 Algorithm for Realization of Type-II Discrete Sine Transform
Radix-3 Algorithm for Realization of Type-II Discrete Sine Transform
 
Radix-3 Algorithm for Realization of Type-II Discrete Sine Transform
Radix-3 Algorithm for Realization of Type-II Discrete Sine TransformRadix-3 Algorithm for Realization of Type-II Discrete Sine Transform
Radix-3 Algorithm for Realization of Type-II Discrete Sine Transform
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Interval Pattern Structures: An introdution
Interval Pattern Structures: An introdutionInterval Pattern Structures: An introdution
Interval Pattern Structures: An introdution
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
DISTANCE TWO LABELING FOR MULTI-STOREY GRAPHS
DISTANCE TWO LABELING FOR MULTI-STOREY GRAPHSDISTANCE TWO LABELING FOR MULTI-STOREY GRAPHS
DISTANCE TWO LABELING FOR MULTI-STOREY GRAPHS
 
ICML2016: Low-rank tensor completion: a Riemannian manifold preconditioning a...
ICML2016: Low-rank tensor completion: a Riemannian manifold preconditioning a...ICML2016: Low-rank tensor completion: a Riemannian manifold preconditioning a...
ICML2016: Low-rank tensor completion: a Riemannian manifold preconditioning a...
 
A kernel-free particle method: Smile Problem Resolved
A kernel-free particle method: Smile Problem ResolvedA kernel-free particle method: Smile Problem Resolved
A kernel-free particle method: Smile Problem Resolved
 

More from Dmitrii Ignatov

Interpretable Concept-Based Classification with Shapley Values
Interpretable Concept-Based Classification with Shapley ValuesInterpretable Concept-Based Classification with Shapley Values
Interpretable Concept-Based Classification with Shapley ValuesDmitrii Ignatov
 
AIST2019 – opening slides
AIST2019 – opening slidesAIST2019 – opening slides
AIST2019 – opening slidesDmitrii Ignatov
 
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...Dmitrii Ignatov
 
Personal Experiences of Publishing with Springer from both Editor and Author ...
Personal Experiences of Publishing with Springer from both Editor and Author ...Personal Experiences of Publishing with Springer from both Editor and Author ...
Personal Experiences of Publishing with Springer from both Editor and Author ...Dmitrii Ignatov
 
Social Learning in Networks: Extraction Deterministic Rules
Social Learning in Networks: Extraction Deterministic RulesSocial Learning in Networks: Extraction Deterministic Rules
Social Learning in Networks: Extraction Deterministic RulesDmitrii Ignatov
 
Orpailleur -- triclustering talk
Orpailleur -- triclustering talkOrpailleur -- triclustering talk
Orpailleur -- triclustering talkDmitrii Ignatov
 
CoClus ICDM Workshop talk
CoClus ICDM Workshop talkCoClus ICDM Workshop talk
CoClus ICDM Workshop talkDmitrii Ignatov
 
Radio recommender system for FMHost
Radio recommender system for FMHostRadio recommender system for FMHost
Radio recommender system for FMHostDmitrii Ignatov
 

More from Dmitrii Ignatov (11)

Interpretable Concept-Based Classification with Shapley Values
Interpretable Concept-Based Classification with Shapley ValuesInterpretable Concept-Based Classification with Shapley Values
Interpretable Concept-Based Classification with Shapley Values
 
AIST2019 – opening slides
AIST2019 – opening slidesAIST2019 – opening slides
AIST2019 – opening slides
 
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
 
Personal Experiences of Publishing with Springer from both Editor and Author ...
Personal Experiences of Publishing with Springer from both Editor and Author ...Personal Experiences of Publishing with Springer from both Editor and Author ...
Personal Experiences of Publishing with Springer from both Editor and Author ...
 
Aist2014
Aist2014Aist2014
Aist2014
 
Social Learning in Networks: Extraction Deterministic Rules
Social Learning in Networks: Extraction Deterministic RulesSocial Learning in Networks: Extraction Deterministic Rules
Social Learning in Networks: Extraction Deterministic Rules
 
Orpailleur -- triclustering talk
Orpailleur -- triclustering talkOrpailleur -- triclustering talk
Orpailleur -- triclustering talk
 
CoClus ICDM Workshop talk
CoClus ICDM Workshop talkCoClus ICDM Workshop talk
CoClus ICDM Workshop talk
 
Pseudo-triclustering
Pseudo-triclusteringPseudo-triclustering
Pseudo-triclustering
 
Radio recommender system for FMHost
Radio recommender system for FMHostRadio recommender system for FMHost
Radio recommender system for FMHost
 
CrowDM system
CrowDM systemCrowDM system
CrowDM system
 

Recently uploaded

Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 

Recently uploaded (20)

Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 

Putting OAC-triclustering on MapReduce

  • 1. Putting OAC-triclustering on MapReduce Sergey Zudin, Dmitry V. Gnatyshak, and Dmitry I. Ignatov National Research University Higher School of Economics, Russian Federation Faculty of Computer Science CLA 2015, Clermont-Ferrand, France October 13-16 S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 1 / 39
  • 2. Outline 1 Motivation and previous work 2 Prime OAC-triclustering Triadic Formal concept analysis Basic algorithm Online version of the algorithm 3 OAC-triclustering on MapReduce MapReduce technology MapReduce implementation 4 Experiments Description of the experiments Datasets Results 5 Conclusion S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 2 / 39
  • 3. Outline 1 Motivation and previous work 2 Prime OAC-triclustering Triadic Formal concept analysis Basic algorithm Online version of the algorithm 3 OAC-triclustering on MapReduce MapReduce technology MapReduce implementation 4 Experiments Description of the experiments Datasets Results 5 Conclusion S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 3 / 39
  • 4. Motivation Big amount of multimodal data: Gene expression data Folksonomies Recommender Systems Communities in multi-mode (social) networks Pattern mining in relational databases . . . Non-binary data can be scaled (possibly increasing the dimensionality) Increasing amount of big data: fast and/or distributed algorithms are required (linear or sublinear, one-pass) Existing methods: finding all n-sets (mulitimodal clusters) satisfying some conditions (often the exponential number of patterns) S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 4 / 39
  • 5. Motivation IMDB example, [Mirkin et al., 2011] Clump Movie-Keyword-Genre Bicluster {12 Angry Men (1957), To Kill a Mockingbird (1962), Wit- ness for the Prosecution (1957)}, {Murder, Trial}, {n/a } Tricluster {12 Angry Men (1957), Double Indemnity (1944), China- town (1974), The Big Sleep (1946), Witness for the Pros- ecution (1957), Dial M for Murder (1954), Shadow of a Doubt (1943) }, {Murder, Trial, Widow, Marriage, Private detective, Blackmail, Letter}, {Crime, Drama, Thriller, Mystery, Film-Noir } S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 5 / 39
  • 6. Previous and related work A short (not full) list Triadic FCA [Wille, 1995; Lehman and Wille,1995] and Polyadic FCA [Voutsadakis, 2002] TRIAS [J¨aeschke et al., 2006] for mining (frequent) triconcepts DataPeeler for closed n-sets [Cerf et al., 2009], MultiDupeHack [Cerf et al, 2013] TriBox [Mirkin et al., 2011] for mining dense triboxes with LS criterion Box OAC-triclustering and Spectral Triclustering [Ignatov et al., 2011,2013] Multi-way set enumeration in weight tensors [Sch¨olkopf et al, 2011] S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 6 / 39
  • 7. Previous and related work A short (not full) list Quadri-concepts for personalised folksnomies [Jelassi et al., 2012, 2013] Prime OAC-triclustering [Gnatyshak et al., 2012–2014] Triadic Boolean tensor factorisation [Miettinen et al., 2011; Belohlavek et al., 2013] and Boolean tensor clustering [Miettinen et al., 2015] Closed and connected patterns in multi-relational data. [Spyropoulu et al., 2011–14] Triadic FCA and triclustering: Searching for optimal patterns. Machine Learning journal [Ignatov et al., 2015] and CLA 2013 . . . S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 7 / 39
  • 8. Outline 1 Motivation and previous work 2 Prime OAC-triclustering Triadic Formal concept analysis Basic algorithm Online version of the algorithm 3 OAC-triclustering on MapReduce MapReduce technology MapReduce implementation 4 Experiments Description of the experiments Datasets Results 5 Conclusion S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 8 / 39
  • 9. Prime OAC-triclustering Formal concept analysis: triadic case Definition Let G, M, B be sets and the ternary relation I be a subset of their Cartesian product: I ⊆ G × M × B. Then the tuple K = (G, M, B, I) is called a triadic formal context. G is a set of objects, M is a set of attributes, B is a set of conditions. GM m1 m2 m3 m1 m2 m3 m1 m2 m3 g1 x x x x x x x x g2 x x x x x g3 x x x x g4 x x x x x x B b1 b2 b3 S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 9 / 39
  • 10. Prime OAC-triclustering Formal concept analysis: triadic case Definition Galois operators (prime operators) are defined in similar way to the dyadic case: 2G → 2M × 2B 2G × 2M → 2B 2M → 2G × 2B 2G × 2B → 2M 2B → 2G × 2M 2M × 2B → 2G S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 9 / 39
  • 11. Prime OAC-triclustering Formal concept analysis: triadic case GM m1 m2 m3 m1 m2 m3 m1 m2 m3 g1 x x x x x x x x g2 x x x x x g3 x x x x g4 x x x x x x B b1 b2 b3 ({g1, g2}, {m1, m2})′ = {b1, b3} S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 9 / 39
  • 12. Prime OAC-triclustering Formal concept analysis: triadic case GM m1 m2 m3 m1 m2 m3 m1 m2 m3 g1 x x x x x x x x g2 x x x x x g3 x x x x g4 x x x x x x B b1 b2 b3 m′ 2 = {(g1, b1), (g2, b1), (g3, b1), (g1, b2), (g1, b3), (g2, b3), (g4, b3)} S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 9 / 39
  • 13. Prime OAC-triclustering Formal concept analysis: triadic case Definition The triple (X, Y , Z) is called triadic formal concept of the context K = (G, M, B, I), if X ⊆ G,Y ⊆ M, Z ⊆ B, (X, Y )′ = Z, (X, Z)′ = Y , (Y , Z)′ = X. X is called (formal) extent, Y — (formal) intent, Z — (formal) modus. GM m1 m2 m3 m1 m2 m3 m1 m2 m3 g1 x x x x x x x x g2 x x x x x g3 x x x x g4 x x x x x x B b1 b2 b3 S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 9 / 39
  • 14. Prime OAC-triclustering Basic algorithm [Gnatyshak et al., 2013] This method uses the following types of prime operators (for the context K = (G, M, B, I)): (g, m)′ = {b ∈ B | (g, m, b) ∈ I}, (g, b)′ = {m ∈ M | (g, m, b) ∈ I}, (m, b)′ = {g ∈ G | (g, m, b) ∈ I} Definition Then the triple T = ((m, b)′ , (g, b)′ , (g, m)′ ) is called the prime-based OAC-tricluster for a triple (g, m, b) ∈ I. The sets of tricluster are called, respectively, tricluster extent, intent, and modus. Triple (g, m, b) is called a generating triple of the tricluster T. Definition Density of a tricluster: ρ(X, Y , Z) = |I∩(X×Y ×Z)| |X||Y ||Z| S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 10 / 39
  • 15. Prime OAC-triclustering Basic algorithm An example of a tricluster based on triple (g, m, b): S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 11 / 39
  • 16. Prime OAC-triclustering Basic algorithm Input: K = (G, M, B, I) — triadic context; ρmin — density threshold Output: T = {T = (X, Y , Z)} 1: T := ∅ 2: for all (g, m): g ∈ G,m ∈ M do 3: PrimesObjAttr[g, m] = (g, m)′ 4: end for 5: for all (g, b): g ∈ G,b ∈ B do 6: PrimesObjCond[g, b] = (g, b)′ 7: end for 8: for all (m, b): m ∈ M,b ∈ B do 9: PrimesAttrCond[m, b] = (m, b)′ 10: end for 11: for all (g, m, b) ∈ I do 12: T = (PrimesAttrCond[m, b], PrimesObjCond[g, b], PrimesObjAttr[g, m]) 13: Tkey = hash(T) 14: if Tkey ̸∈ T .keys ∧ ρ(T) ≥ ρmin then 15: T [Tkey] := T 16: end if 17: end for S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 12 / 39
  • 17. Prime OAC-triclustering Online version of the algorithm [Gnatyshak et al., 2014] Let K = (G, M, B, I) be a triadic context. We do not know G, M, B, I, or their cardinalities in advance. Input on each iteration: {(g, m, b)} = J ⊆ I. Goal: maintain an updated version of the results and efficiently update them when new triples are received. We need to keep in memory the results of prime operators’ application (prime sets): PrimesObjAttr — dictionary with elements of type ((g, m), {b ∈ B}), g ∈ G, m ∈ M; PrimesObjCond — dictionary with elements of type ((g, b), {m ∈ M}), g ∈ G, b ∈ B; PrimesAttrCond — dictionary with elements of type ((m, b), {g ∈ G}), m ∈ M, b ∈ B. S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 13 / 39
  • 18. Prime OAC-triclustering Online version of the algorithm Remark In this case we need to consider triclusters based on different triples different, even if their extents, intents, and modi are equal. S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 14 / 39
  • 19. Prime OAC-triclustering Online version of the algorithm Algorithm of triples addition: Input: J is a set of triples to add; T = {T = (∗X, ∗Y , ∗Z)} is a current tricluster set; PrimesObjAttr, PrimesObjCond, PrimesAttrCond; Output: T = {T = (∗X, ∗Y , ∗Z)}; PrimesObjAttr, PrimesObjCond, PrimesAttrCond; 1: for all (g, m, b) ∈ J do 2: PrimesObjAttr[g, m] := PrimesObjAttr[g, m] ∪ b 3: PrimesObjCond[g, b] := PrimesObjCond[g, b] ∪ m 4: PrimesAttrCond[m, b] := PrimesAttrCond[m, b] ∪ g 5: T := T ∪ (&PrimesAttrCond[m, b], &PrimesObjCond[g, b], &PrimesObjAttr[g, m]) 6: end for S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 15 / 39
  • 20. Prime OAC-triclustering Online version of the algorithm A user may require to remove the triclusters with the same extent, intent and modus at the post-processing stage. At this stage we can also check various conditions (for instance, minimal density condition). Input: T = {T = (∗X, ∗Y , ∗Z)} is a current tricluster set; Output: T = {T = (∗X, ∗Y , ∗Z)} — processed tricluster hash-set; 1: for all T ∈ T do 2: Compute hash(T) 3: if hash(T) ̸∈ T .keys() then 4: T := T ∪ T 5: end if 6: end for S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 16 / 39
  • 21. Prime OAC-triclustering Online version of the algorithm Complexity summary: Time complexity: O(|I|) (as there is a constant number of operations on each step); More precisely: 8|I| operations in total; 1 Modification of 3 prime sets (3); 2 Creation of a new tricluster (1); 3 Addition of pointers to its extent, intent, and modus (3); 4 Addition of the tricluster to the set of all triclusters (1). Memory complexity: O(|I|) (as we need to keep in memory only prime sets, |I| elements in each dictionary + keys). S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 17 / 39
  • 22. Prime OAC-triclustering Online version of the algorithm Example: S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
  • 23. Prime OAC-triclustering Online version of the algorithm → (g1, m1, b1) 1 PrimesObjAttr = {((g1, m1), {b1})} 2 PrimesObjCond = {((g1, b1), {m1})} 3 PrimesAttrCond = {((m1, b1), {g1})} 4 T := T ∪ {PrimesAttrCond[m1, b1], PrimesObjCond[g1, b1], PrimesObjAttr[g1, m1]} S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
  • 24. Prime OAC-triclustering Online version of the algorithm → (g1, m2, b1) 1 PrimesObjAttr = {((g1, m1), {b1}), ((g1, m2), {b1})} 2 PrimesObjCond = {((g1, b1), {m1, m2})} 3 PrimesAttrCond = {((m1, b1), {g1}), ((m2, b1), {g1})} 4 T := T ∪ {PrimesAttrCond[m2, b1], PrimesObjCond[g1, b1], PrimesObjAttr[g1, m2]} S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
  • 25. Prime OAC-triclustering Online version of the algorithm → (g2, m1, b1) 1 PrimesObjAttr = {((g1, m1), {b1}), ((g1, m2), {b1}), ((g2, m1), {b1})} 2 PrimesObjCond = {((g1, b1), {m1, m2}), ((g2, b1), {m1})} 3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1})} 4 T := T ∪ {PrimesAttrCond[m1, b1], PrimesObjCond[g2, b1], PrimesObjAttr[g2, m1]} S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
  • 26. Prime OAC-triclustering Online version of the algorithm → (g2, m2, b1) 1 PrimesObjAttr = {((g1, m1), {b1}), ((g1, m2), {b1}), ((g2, m1), {b1}), ((g2, m2), {b1})} 2 PrimesObjCond = {((g1, b1), {m1, m2}), ((g2, b1), {m1, m2})} 3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1, g2})} 4 T := T ∪ {PrimesAttrCond[m2, b1], PrimesObjCond[g2, b1], PrimesObjAttr[g2, m2]} S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
  • 27. Prime OAC-triclustering Online version of the algorithm → (g3, m3, b1) 1 PrimesObjAttr = {((g1, m1), {b1}), ((g1, m2), {b1}), ((g2, m1), {b1}), ((g2, m2), {b1}), ((g3, m3), {b1})} 2 PrimesObjCond = {((g1, b1), {m1, m2}), ((g2, b1), {m1, m2}), ((g3, b1), {m3})} 3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1, g2}), ((m3, b1), {g3})} 4 T := T ∪ {PrimesAttrCond[m3, b1], PrimesObjCond[g3, b1], PrimesObjAttr[g3, m3]} S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
  • 28. Prime OAC-triclustering Online version of the algorithm → (g1, m2, b2) 1 PrimesObjAttr = {((g1, m1), {b1}), ((g1, m2), {b1, b2}), ((g2, m1), {b1}), ((g2, m2), {b1}), ((g3, m3), {b1})} 2 PrimesObjCond = {((g1, b1), {m1, m2}), ((g2, b1), {m1, m2}), ((g3, b1), {m3}), ((g1, b2), {m2})} 3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1, g2}), ((m3, b1), {g3}), ((m2, b2), {g1})} 4 T := T ∪ {PrimesAttrCond[m2, b2], PrimesObjCond[g1, b2], PrimesObjAttr[g1, m2]} S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
  • 29. Prime OAC-triclustering Online version of the algorithm → (g2, m1, b2) 1 PrimesObjAttr = {((g1, m1), {b1}), ((g1, m2), {b1, b2}), ((g2, m1), {b1, b2}), ((g2, m2), {b1}), ((g3, m3), {b1})} 2 PrimesObjCond = {((g1, b1), {m1, m2}), ((g2, b1), {m1, m2}), ((g3, b1), {m3}), ((g1, b2), {m2}), ((g2, b2), {m1})} 3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1, g2}), ((m3, b1), {g3}), ((m2, b2), {g1}), ((m1, b2), {g2})} 4 T := T ∪ {PrimesAttrCond[m1, b2], PrimesObjCond[g2, b2], PrimesObjAttr[g2, m1]} S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
  • 30. Prime OAC-triclustering Online version of the algorithm → (g2, m2, b2) 1 PrimesObjAttr = {((g1, m1), {b1}), ((g1, m2), {b1, b2}), ((g2, m1), {b1, b2}), ((g2, m2), {b1, b2}), ((g3, m3), {b1})} 2 PrimesObjCond = {((g1, b1), {m1, m2}), ((g2, b1), {m1, m2}), ((g3, b1), {m3}), ((g1, b2), {m2}), ((g2, b2), {m1, m2})} 3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1, g2}), ((m3, b1), {g3}), ((m2, b2), {g1, g2}), ((m1, b2), {g2})} 4 T := T ∪ {PrimesAttrCond[m2, b2], PrimesObjCond[g2, b2], PrimesObjAttr[g2, m2]} S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
  • 31. Prime OAC-triclustering Online version of the algorithm → (g3, m3, b2) 1 PrimesObjAttr = {((g1, m1), {b1}), ((g1, m2), {b1, b2}), ((g2, m1), {b1, b2}), ((g2, m2), {b1, b2}), ((g3, m3), {b1, b2})} 2 PrimesObjCond = {((g1, b1), {m1, m2}), ((g2, b1), {m1, m2}), ((g3, b1), {m3}), ((g1, b2), {m2}), ((g2, b2), {m1, m2}), ((g3, b2), {m3})} 3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1, g2}), ((m3, b1), {g3}), ((m2, b2), {g1, g2}), ((m1, b2), {g2}), ((m3, b2), {g3})} 4 T := T ∪ {PrimesAttrCond[m3, b2], PrimesObjCond[g3, b2], PrimesObjAttr[g3, m3]} S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
  • 32. Prime OAC-triclustering Online version of the algorithm Postprocessing: 1 T(g1,m1,b1) = (g1, g2, m1, m2, b1) ← add 2 T(g1,m2,b1) = (g1, g2, m1, m2, b1, b2) ← add 3 T(g2,m1,b1) = (g1, g2, m1, m2, b1, b2) ← the same as T(g1,m2,b1), skip 4 T(g2,m2,b1) = (g1, g2, m1, m2, b1, b2) ← the same as T(g1,m2,b1), skip 5 T(g3,m3,b1) = (g3, m3, b1, b2) ← add 6 T(g1,m2,b2) = (g1, g2, m2, b1, b2) ← add 7 T(g2,m1,b2) = (g2, m1, m2, b1, b2) ← add 8 T(g2,m2,b2) = (g1, g2, m1, m2, b1, b2) ← the same as T(g1,m2,b1), skip 9 T(g3,m3,b2) = (g3, m3, b1, b2) ← the same as T(g3,m3,b1), skip S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
  • 33. Prime OAC-triclustering Online version of the algorithm The final output set of triclusters: 1 T1 = ({g1, g2}, {m1, m2}, {b1}) 2 T2 = ({g1, g2}, {m1, m2}, {b1, b2}) 3 T3 = ({g3}, {m3}, {b1, b2}) 4 T4 = ({g1, g2}, {m2}, {b1, b2}) 5 T5 = ({g2}, {m1, m2}, {b1, b2}) S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 18 / 39
  • 34. Outline 1 Motivation and previous work 2 Prime OAC-triclustering Triadic Formal concept analysis Basic algorithm Online version of the algorithm 3 OAC-triclustering on MapReduce MapReduce technology MapReduce implementation 4 Experiments Description of the experiments Datasets Results 5 Conclusion S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 19 / 39
  • 35. MapReduce Technology MapReduce scheme [Dean and Ghemawat, 2004] S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 20 / 39
  • 36. MapReduce Technology MapReduce example Figure: Word counting. Source: http://blog.trifork.com/2009/08/04/introduction-to-hadoop/ S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 21 / 39
  • 37. MapReduce Technology Communication costs: Mining of Massive Datasets [Leskovec et al., 2013] Chapter 2: MapReduce and the New Software Stack “Replication Rate and Reducer Size: It is often convenient to measure communication by the replication rate, which is the communication per input. Also, the reducer size is the maximum number of inputs associated with any reducer. For many problems, it is possible to derive a lower bound on replication rate as a function of the reducer size.” S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 22 / 39
  • 38. MapReduce Implementation The previous lattice-oriented M/R implementations A version of Close-by-One algorithm was ported to M/R framework [Krajca & Vychodil, 2009] A M/R algorithm for computation of closed cube lattices was proposed [Kudryavcev & Kuznecov, 2009] [Xu et al., 2012] demonstrated that iterative algorithms like Ganter’s NextClosure can benefit from the usage of iterative M/R schemes S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 23 / 39
  • 39. MapReduce Implementation Technologies and code repositories Technologies used Apache Hadoop 1 Apache Maven (framework for automatic project assembling) Apache Commons (for work with extended Java collections) Google Guava (utilities and data structures) Jackson JSON (open-source library for transformation of object-oriented representation of an object like tricluster to string) TypeTools (for real-time type resolution of inbound and outbound key-value pairs) . . . Implementations Source 1: “Chaining-job” module2 Source 2: M/R-based OAC Triclustering3 1http://hadoop.apache.org/ 2https://github.com/zydins/chaining-job 3https://github.com/zydins/DistributedTriclustering S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 24 / 39
  • 40. Two-stage MapReduce Implementation Distributed OAC-triclustering: First Map Input: S is a set of input triples as strings; r is a number of reducers; i is a grouping index (objects, attributes or conditions). Output: ˜J is a list of ⟨key, triple⟩ pairs. 1: for all s ∈ S do 2: t := transform(s) 3: key := hash(t[i]) mod r 4: ˜J := ˜J ∪ {⟨key, t⟩} 5: end for S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 25 / 39
  • 41. Two-stage MapReduce Implementation Distributed OAC-triclustering: First Reduce Input: J is a list of triples (for a certain key); T = {T = (X, Y , Z)} is a current set of triclusters; PrimesOA, PrimesOC, PrimesAC. Output: file of strings – encoded ⟨triple, tricluster⟩ pairs. 1: Primes ← initialise a new multimap 2: for all (g, m, b) ∈ J do 3: Primes[g, m] := Primes[g, m] ∪ {b} 4: Primes[g, b] := Primes[g, b] ∪ {m} 5: Primes[m, b] := Primes[m, b] ∪ {g} 6: end for 7: for all (g, m, b) ∈ J do 8: T := (set(Primes[m, b]), set(Primes[g, b]), set(Primes[g, m])) 9: s := encode(⟨(g, m, b), T⟩) 10: store s 11: end for S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 26 / 39
  • 42. Two-stage MapReduce Implementation Distributed OAC-triclustering: Second Map Input: S is a list of strings. Output: ˜T is an list of ⟨tricluster, tricluster⟩ pairs. 1: Primes ← initialise a new multimap 2: for all s ∈ S do 3: ⟨(g, m, b), T⟩ := decode(s) 4: update Primes multimap appropriately 5: I := I ∪ {(g, m, b)} 6: end for 7: for all (g, m, b) ∈ I do 8: T := (set(Primes[m, b]), set(Primes[g, b]), set(Primes[g, m])) 9: ˜T := ˜T ∪ {⟨T, T⟩} 10: end for S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 27 / 39
  • 43. Two-stage MapReduce Implementation Distributed OAC-triclustering: Second Reduce Input: ˆT is a list of ⟨tricluster, list of triclusters⟩ pairs. Output: File with a final set of triclusters {T = (X, Y , Z)}. 1: for all ⟨T, [T, . . . , T]⟩ ∈ ˆT do 2: store T 3: end for S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 28 / 39
  • 44. Two-stage MapReduce Implementation Communication costs The time complexity of the M/R solution is composed from two terms for each stage: O(|I|/r) (or O(|I|)) and O(|I|). The replication rate for the first M/R stage r1 = 1 (each triple is passed as one key-value pair), the reducer size q1 = |I|/r The replication rate for the second M/R stage is r2 = 1 (it assigns one key-value pair for each tricluster), but the reducer size varies from qmin 2 = 1 (no duplicate triclusters) and qmax 2 = |I| (one final tricluster when all the initial triples belong to one absolutely dense cuboid). S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 29 / 39
  • 45. Outline 1 Motivation and previous work 2 Prime OAC-triclustering Triadic Formal concept analysis Basic algorithm Online version of the algorithm 3 OAC-triclustering on MapReduce MapReduce technology MapReduce implementation 4 Experiments Description of the experiments Datasets Results 5 Conclusion S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 30 / 39
  • 46. Experiments Description of the experiments OS X 10, 1.8 GHz Intel Core i5, 4 Gb 1600 MHz DDR3 and 8 Gb free space on the hard drive (a typical commodity hardware). Two M/R modes have been tested: sequential mode of tasks completion and emulation of distributed one with 16 first reducers and 32 threads for the second stage. To evaluate the runtime more carefully, for each context the average result of 5 runs of the algorithms has been recorded. S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 31 / 39
  • 47. Experiments Datasets Synthetic datasets. 1) 20,000 triples (25 unique entities of each type); 2) 100,000 triples (50 unique entities of each type); 3) 1,000,000 triples (all possible combinations of 100 unique entities of each type). The 1st dataset contains duplicates since 25 × 25 × 25 gives only 15,625 unique triples. The 2nd one contains less triples than 503 = 125, 000, the number of all possible combinations. The 3rd one is an absolutely dense cuboid 100 × 100 × 100. The 3rd dataset does not result in 3min(|G|,|M|,|B|) formal triconcepts, this is an example of the worst case scenario for the second reducer (qmax 2 = |I|). IMDB. Top-250 list of the best movies from Internet Movie Database Bibsonomy. The data of bibsonomy.org from ECML PKDD discovery challenge 2008. Context |G| |M| |B| # triples Density 20k 25 25 25 20,000 1 100k 50 50 50 100,000 0.8 1m 100 100 100 1,000,000 1 IMDB 250 795 22 3,818 0.00087 BibSonomy 2,337 67,464 28,920 816,197 1.8 · 10−7 S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 32 / 39
  • 48. Experiments Results Algorithm/Context IMDB 20k 100k 1m Bibsonomy (≈3k triples) triples triples triples (≈800k triples) Tribox 324 800 1,265 >3,000 >3,000 TRIAS 189 362 862 >3,000 >3,000 OAC Box 374 756 1,265 >3,000 >3,000 OAC Prime 7 8 734 >3,000 >3,000 Online OAC prime 3 3 3 5 >3,000 M/R OAC prime seq. 12 30 81 166 1,534 M/R OAC prime distr. 1 15 20 25 520 S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 33 / 39
  • 49. Alternative MapReduce decomposition Variant I: First stage First Map: Finding primes. During this phase every input triple (g, m, b) is encoded by three key-value pairs ⟨(g, m), b⟩, ⟨(g, b), m⟩, and ⟨(m, b), g⟩. These pairs are passed to the first reducer. The replication rate is r1 = 3. First Reduce: Finding primes. This reducer fills three corresponding dictionaries for primes of keys. So, for example, the first dictionary, PrimeOA contains key-value pairs ⟨(g, m), {b1, b2, . . . , bn}⟩. The reducer size is q1 = max(|G|, |M|, |B|) The process can be stopped after the first reduce phase and all the triclusters found as (Prime[g, m], Prime[g, b], Prime[m, b]) each by enumeration of (g, m, b) ∈ I. However, to do it faster and keep the result for further computation, it is possible to use M/R as well. S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 34 / 39
  • 50. Alternative MapReduce decomposition Variant I: Second stage Second Map: Tricluster generation. The second map does tricluster combining job, i.e. for each triple (g, m, b) it composes the new key-value pair, ⟨(g, m, b), ∅⟩. And for each pair of either type, ⟨(g, m), Prime[g, m]⟩, ⟨(g, b), Prime[g, b]⟩, and ⟨(m, b), Prime[m, b]⟩ it generates key-values pairs ⟨(g, m, ˜b), Prime[g, m]⟩, ⟨(g, ˜m, b), PrimeOC[g, b]⟩, and ⟨(˜g, m, b), Prime[m, b]⟩, where ˜g ∈ G, ˜m ∈ M, and ˜b ∈ B. r2 = (|I| + 3|G||M||B|)/(|I| + |G||M| + |G||B| + |M||B|) ≤ (ρ + 3)/(ρ + 3/max(|G|, |M|, |B|)), where ρ is the input tricontext density. Second Reduce: Tricluster generation. The second reducer just assembles only one value for each key (g, m, b), the generating triple, its tricluster, (Prime[g, m], Prime[g, b], Prime[m, b]). If there is no key-value pair ⟨(g, m, b), ∅⟩ for a particular triple (g, m, b), it does not output any key-value pair for the key. The reducer size q2 is either 3 (no output) or 4 (tricluster assembled). S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 35 / 39
  • 51. Alternative MapReduce decomposition Variant II: Second stage Second Map: Tricluster generation with duplicate generating triples. Second map does tricluster combining job, i.e. for each triple (g, m, b) it composes a new key-value pair: ⟨(Prime[g, m], Prime[g, b], Prime[m, b]), (g, m, b)⟩. Second Map: Tricluster generation with duplicate generating triples. The second reducer just groups values for each key: ⟨(X, Y , Z), {(g1, m1, b1), . . . , (gn, mn, bn)}⟩. S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 36 / 39
  • 52. Outline 1 Motivation and previous work 2 Prime OAC-triclustering Triadic Formal concept analysis Basic algorithm Online version of the algorithm 3 OAC-triclustering on MapReduce MapReduce technology MapReduce implementation 4 Experiments Description of the experiments Datasets Results 5 Conclusion S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 37 / 39
  • 53. Conclusion and further work MapReduce Prime OAC-triclustering implementation has been proposed. Communication costs have been analysed. Comparison of the online version and M/R one has been performed. Further experiments are needed with other M/R variants and other triclustering algorithms. A proper comparison of the proposed OAC triclustering and noise tolerant patterns in n-ary relations, e.g., by DataPeeler descendants [Cerf et al., 2013] is not yet conducted. S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 38 / 39
  • 54. Thank you! Questions? S. Zudin et al. () OAC-triclustering on MapReduce CLA 2015 39 / 39