Orpailleur -- triclustering talk

Dmitry V. Gnatyshak, Dmitry I. Ignatov*,
Sergei O. Kuznetsov
School of Applied Mathematics and Information Science & Intelligence Systems and Structural
Analysis Lab
NRU Higher School of Economics, Moscow, Russia
LORIA Orpailleur meeting, Nancy, France, 2013

Outline
1. Motivation and problem setting
2. FCA basic definitions
3. Triclustering methods
4. Experiments
5. Conclusion
2

Motivation
 A large amount of structured and unstructured data
generates triadic data.
 Example: folksonomy is a set of triples (user, object, tag)
 Examples:
 Bibsonomy.org
(user, bookmark, tag)
 Social networking sites
(user, group, interest)
 Delicious
(user, link, tag)
3

Main goals
1. Comparison of some triclustering methods
2. Development of a toolbox for triclustering experiments
3. New possibly better methods
4. Possible applications
4

FCA: basic definitions
Biology Mathematics Computer
Science
Chemistry
Kate x x
Mike x x x
Alex x x
Pete x x x
5

(R. Wille, 1982; B. Ganter, R. Wille, 1999)


6
Science
Chemistry
Kate x x
Mike x x x
Alex x x
Pete x x x


7
Science
Chemistry
Kate x x
Mike x x x
Alex x x
Pete x x x

Triadic FCA: basic definitions

8
(F. Lehmann, R. Wille, 1995)

OAC-triclusters
(based on box operators)

Box operators
9
(D. Ignatov et al., 2011) … …
…
…

OAC-triclusters
(based on prime operators)

Prime-operators of singletons
10

TriBox

12
(A. Kramarenko & B. Mirkin, 2011)

Spectral Triclustering: SpecTric

13
(D. Ignatov & Z. Sekinaeva, 2011; Ignatov et al. 2013)

Spectral Triclustering: SpecTric
14
(D. Ignatov & Z. Sekinaeva, 2011; Ignatov et al. 2013)

TRIAS

15
(R. Jäschke, 2006)

Experiments
 Main goals:
 Fault-tolerance test
 Comparison by criteria: time, quantity, mean density,
coverage and diversity
 For TriBox and OAC-triclustering we implemented their
parallel versions
 They were included to the comparison
16

OAC-prime triclustering example
 IMDB
20

Results (time, quantity, average density, coverage,
diversity)
Method T,ms #
OAC (box) 407 73 9,88 100,00 0,00 0,00 0,00 0,00
OAC (prime) 312 2659 32,23 100,00 92,51 60,07 59,80 59,45
SepcTric 277 5 8,74 8,84 100,00 100,00 100,00 100,00
TriBox 6218 1011 74,00 96,02 97,42 66,25 79,53 84,80
TRIAS 29367 38356 100,00 100,00 99,99 99,93 4,07 3,51
IMDB
OAC (box) 2314 1500 1,84 100,00 15,65 9,67 0,70 7,87
OAC (prime) 547 1274 53,85 100,00 96,55 94,56 92,14 28,52
Spectric 98799 21 17,07 20,88 100,00 100,00 100,00 100,00
TriBox 197136 328 91,65 98,90 98,89 98,46 95,21 30,94
TRIAS 102554 1956 100,00 100,0 99,89 99,69 52,52 26,18
BibSonomy
OAC (box) 19297 398 4,16 100,00 79,59 67,28 42,83 79,54
OAC (prime) 13556 1289 94,66 100,00 99,74 88,58 99,51 99,53
SpecTric 5906563 2 50,00 100,00 100,00 100,00 100,00 100,00
TriBox Time> 24 hours
TRIAS 110554 1305 100,00 100,00 99,98 91,70 99,78 99,92
21

Method Time Quantity Average
density
Coverage Diversity Efficiency of
parallel
version
OAC(box)
average large
low
high~very low
very low~average
high
OAC (prime)
small large average high~average average~high low
SpecTric
Small for small
contexts
small low average~high 1 –
TriBox high average high high high high
TRIAS
very large 1 high~low high~low –
22
Results (time, quantity, average density,
coverage, diversity)

Conclusion
 There is no a winner according to the comparison criteria
 Method TriBox shows best results but it takes huge
computational time
 OAC-triclustering based on prime operators gives the
second best results and it is sufficiently fast
23

Conclusion
 There is no a winner according to the comparison criteria
 Details by methods:
 TRIAS
 High elapsed time
 Too large number of small well-interpreted triclusters
(triconcepts)
24

Conclusion
 OAC (box operators)
 Large triclusters of low density
 High density, small diversity
 An efficient parallelization
 OAC (prime-operators)
 High speed of computations
 Large number of dense well-interpreted triclusters
 Low efficiency of parallelization
25

Conclusion
 Spectral Triclustering
 High computational speed on small contexts
 Well-interpreted triclusters but of the low density
 Diversity is always equals to 1, but it causes too low coverage
 TriBox
 A moderate number of well-interpreted triclusters
 High elapsed time
 Efficient parallelization
 Reasonably high coverage and diversity
26

Orpailleur -- triclustering talk

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to Orpailleur -- triclustering talk

Similar to Orpailleur -- triclustering talk (20)

More from Dmitrii Ignatov

More from Dmitrii Ignatov (20)

Recently uploaded

Recently uploaded (20)

Orpailleur -- triclustering talk