SlideShare a Scribd company logo
Turning Krimp into a Triclustering Technique on
Sets of Attribute-Condition Pairs that Compress
Maxim Yurov and Dmitry I. Ignatov
National Research University Higher School of Economics, Moscow, Russia
Data Analysis and AI Dept. & Itelligent Systems and Structural Analysis Lab @
Computer Science Faculty
IJCRS 2017
Olsztyn, Poland
03.07.2017
1
Outline
Problem statement
Krimp algorithm
Triadic data and its transformation
Biclustering and Triclustering
Results of experiments
2
Research Domain
Frequent Itemset Mining (FIM) is one of the basic problems in
Data Mining.
One of the first FIM task is market basket analysis (Agrawal et al.,
1993).
One of the first FIM algorithms is Apriori (Agrawal et al., 1994).
3
Frequent Itemset Mining
Problem: a humongous number of frequent itemset, which makes
complicated the search of the most interesting patterns among
them.
Q: How to solve it?
A: For example, to use Minimal Description Lenght princinple:
MDL principal
The best set of frequent itemsets compresses the input data the
best.1
1
Siebes A., Vreeken J., van Leeuwen M., Itemsets that compress (2011).
4
Frequent Itemset Mining
Problem: a humongous number of frequent itemset, which makes
complicated the search of the most interesting patterns among
them.
Q: How to solve it?
A: For example, to use Minimal Description Lenght princinple:
MDL principal
The best set of frequent itemsets compresses the input data the
best.1
1
Siebes A., Vreeken J., van Leeuwen M., Itemsets that compress (2011).
Frequent Itemset Mining
Problem: a humongous number of frequent itemset, which makes
complicated the search of the most interesting patterns among
them.
Q: How to solve it?
A: For example, to use Minimal Description Lenght princinple:
MDL principal
The best set of frequent itemsets compresses the input data the
best.1
1
Siebes A., Vreeken J., van Leeuwen M., Itemsets that compress (2011).
Krimp Algorithm
Input
A database D of transactions over a set items I (like purchases in a
supermarket).
Code Table
The code table CT is the table with two columns: the itemsets are
on the left and their codes are on the right.
The left column contains at least all single itemsets.
The codes are unique.
7
Krimp Algorithm
Input
A database D of transactions over a set items I (like purchases in a
supermarket).
Code Table
The code table CT is the table with two columns: the itemsets are
on the left and their codes are on the right.
The left column contains at least all single itemsets.
The codes are unique.
8
Figure: Code table example. The width of the Code column shows the
length of the code. I = {A, B, C}. NB: the column Usage is not a part
of the code table.
2
2
Siebes A., Vreeken J., van Leeuwen M., Itemsets that compress (2011).
9
Figure: Example of a database, its cover, and the encoded database
based on the previous codetable from Fig. 1. I = {A, B, C}.
3
3
Siebes A., Vreeken J., van Leeuwen M., Itemsets that compress (2011).
10
Figure: Example of the standard codetable for database from Fig. 2, its
cover and the encoded database.
4
4
Siebes A., Vreeken J., van Leeuwen M., Itemsets that compress (2011).
11
Minimal Coding Set Problem
Let I be a set of items, D be a dataset of transactions (some
itemsets) over I, cover be a coverage function, and F be a set of
candidate itemsets. Find the minimal coding set CS ⊆ F such that
the resulting code table CT implies the minimum total size of the
encoded database and the code table L(D, CT).
L(D, CT) = L(D|CT) + L(CT|D)
L(CT|D) =
X∈CT:usageD(X)=0
L(codeST (X)) + L(codeCT (X)))
L(D|CT) =
t∈D
L(t|CT)
L(t|CT) =
X∈cover(CT,t)
L(codeCT (X))
L(codeCT (X)) = |codeCT (X)|
12
Krimp algorithm
The algorithmic strategy
It starts with the standard code table ST, which contains only
singletones X ∈ I
Then it adds one by one othes itemsets (candidates) from F.
If the resulsting codetable maintains better compression, then
Krimp stores it and continues the search. Otherwise, Krimp
eliminates this itemset.
Krimp algorithm
Standard Cover Order
Let us order X ∈ CT by decreasing cardinality, then by decreasing
support, and finally in lectic order:
|X| ↓ suppD(X) ↓ lexicographically ↑
Standard Candidate Order
Frequent and long itemsets are of priority:
suppD(X) ↓ |X| ↓ lexicographically ↑
Krimp algorithm
Input: D is a transaction database and F is a candidate set over a
input set of items I.
Output: A heuristic solution to the Minimal Coding Set Problem,
code table CT.
1 CT ← StandardCodeTable(D)
2 F0 ← F in Standard Candidate Order
3 for F ∈ F0  {{i} | i ∈ I} do
4 CTc ← (CT ∪ F) in Standard Cover Order
5 if L(D, CTc) < L(D, CT) then
6 CT ← CTc
7 end
8 end
9 return CT
15
Krimp algorithm
Figure: The scheme of Krimp.
5
5
Siebes A., Vreeken J., van Leeuwen M., Itemsets that compress (2011).
16
Triadic Data
Folksonomy is a ternary relation over sets of objects, attributes and
conditions.6
From ternary binary relation to dyadic ones
(Obj., Attr., Cond.) → (Obj., Attr. × Cond.),
where A × B is the Cartesian product of A and B.
6
Folksonomy coinage and definition (2007) T. Vander Wal –
http://vanderwal.net/folksonomy.html
Triadic Data
Folksonomy is a ternary relation over sets of objects, attributes and
conditions.6
From ternary binary relation to dyadic ones
(Obj., Attr., Cond.) → (Obj., Attr. × Cond.),
where A × B is the Cartesian product of A and B.
6
Folksonomy coinage and definition (2007) T. Vander Wal –
http://vanderwal.net/folksonomy.html
Data
1. A sample of Top-250 movies from www.IMDB.com.
The objects are movie titles, the attributes are keywords, and
the conditions are genres.
2. A sample from bibliography sharing system BibSonomy.org.
The objects are users, the attributes are tags, and the
conditions are electronic bookmarks.
19
Example of data transformation
If there is a movie description in terms of keywords and genres
{Star Wars} × {Princess, Empire} × {Adventure, Sci-Fi, Action},
then this piece of data can be transformed into object-attribute
form as follows:
{Star Wars} ×



(Princess, Adventure), (Princess, Sci-Fi)
(Princess, Action), (Empire, Adventure)
(Empire, Sci-Fi), (Empire, Action)



.
Biclustering
[Mirkin, 1995]
Coinage the term bicluster
The term bicluster(ing) was proposed by B. Mirkin in the book
Mathematical Classification and Clustering. Kluwer Academic
Publishers (1996).
p. 296
The term biclustering refers to simultaneous clustering of
both row and column sets in a data matrix. Biclustering
addresses the problems of aggregate representation of the
basic features of interrelation between rows and columns
as expressed in the data.
21
Concept-based biclustering
[D. Ignatov and S. Kuznetsov, 2010]
Let K = (G, M, I ⊆ G × M) be a formal context.
Definition 1
If (g, m) ∈ I, then (m , g ) is called an object-attribute bicluster or
OA-bicluster with density ρ(m , g ) = |I∩(m ×g )|
|m |·|g | .7
7
(.) : 2G
→ 2M
and (.) : 2M
→ 2G
are the derivation operators applied to
{g} ⊆ G and {m} ⊆ M in sense of [Ganter & Wille, 1999].
Geometric interpretation of OA-bicluster: connection with
RST
[D. Ignatov and S. Kuznetsov, 2010]
g
m
g''
m''
g'
m'
23
Triadic FCA and Triclustering
[Lehman & Wille, 1993]
Consider K = (G, M, B, J ⊆ G × M × B), a triadic context; in
what follows we will refer to a trisets T = (X, Y , Z) with Z ⊆ G,
Y ⊆ M, Z ⊆ B as an object-attribute-condition tricluster or simply
tricluster8.
8
Ignatov, D.I., Gnatyshak, D.V., Kuznetsov, S.O., Mirkin, B.G.: Triadic
formal concept analysis and triclustering: searching for optimal patterns.
Machine Learning 101(1-3) (2015) 271–302
24
KRIMP-based triclusters
Each encoding set of (object, attribute) pairs found by Krimp
is contained as a coding block in the description of some
object g ∈ G.
Let S be a coding set returned by Krimp that consists of n
attribute-condition pairs from M × B.
Then the first component X of the corresponding tricluster is
{g | (g, m, b) ∈ I for all (m, b) ∈ S}.
The remaining two components are
Y = {m | ∀(m, b) ∈ S} and Z = {b | ∀(m, b) ∈ S}.
S is not necessarily equal to Y × Z, so, some amount of
missing triples is allowed inside T = (X, Y , Z). The quality of
such a tricluster can be assessed by its density.
Quality measures
Density
ρ(Ti ) =
|J ∩ (X × Y × Z)|
|X||Y ||Z|
For the tricluster collection:
ρ(T ) =
Ti ∈T ρ(Ti )
|T |
Coverage
coverage(T , K) =
| (X,Y ,Z)∈T X × Y × Z ∩ J|
|J|
26
Diversity
diversity(T ) = 1 −
j i<j intersect(Ti , Tj )
|T |(|T |−1)
2
,
where:
intersect(Ti , Tj ) =



1, GTi
∩ GTj
= ∅∧
∧MTi
∩ MTj
= ∅∧
∧BTi
∩ BTj
= ∅
0, otherwise
27
IMDB: Top-250 movies
Table: Basic statistics of the IMDB dataset with top-250 movies.
Context |G| |M| |B| # triples Density
IMDB 250 795 22 3818 0.00087
28
Results of experiments with triclustering
Table: Time, cardinality, density, coverage and diversity for Top-250
IMDB movies dataset.
Algorithm t, ms number of triclusters ρ, % Cov, % Div, %
IMDB
OAC ( ) 2314 1500 1.84 100 15.650
OAC ( ) 547 1274 53.85 100 96.550
SpecTric 98799 21 17.07 20.88 100
TriBox 197136 328 91.65 98.90 98.890
TRIAS 102554 1956 100 100 99.890
Krimp (minsup = 2, 87 152 100 24.04 99.556
only non-singletons)
Krimp (minsup = 2, 87 2859 100 99.97 99.997
usage = 0)
Krimp (minsup = 3, 46 57 100 12.07 98.684
only non-singletons)
Krimp (minsup = 3, 46 2966 100 99.97 99.998
usage = 0)
29
Examples of triclusters
Three triclusters extracted by Krimp from IMDB dataset.
Tricluster 1.
Keyword-genre component:
{(Princess,Adventure), (Princess,Fantasy), (Empire,Sci-Fi),
(Empire,Adventure), (Empire,Action), (Princess,Sci-Fi),
(Princess,Action), (Empire,Fantasy), (Death Star,Sci-Fi),
(Death Star,Fantasy), (Death Star,Adventure),
(Death Star,Action)},
(2,2)
Movies component:
{Star Wars: Episode VI – Return of the Jedi (1983),
Star Wars (1977)}
30
Examples of triclusters
Three triclusters extracted by Krimp from IMDB dataset.
Tricluster 2.
Keyword-genre component:
{(Future,Sci-Fi), (Future,Thriller), (Future,Action), (Cyborg,Thriller),
(Cyborg,Sci-Fi), (Cyborg,Action), (The Terminator,Thriller),
(The Terminator,Sci-Fi), (The Terminator,Action) },
(2,2)
Movies component:
{The Terminator (1984), Terminator 2: Judgment Day (1991)}
Tricluster 3.
Keyword-genre component:
{(Gotham,Thriller), (Gotham,Drama), (Gotham,Crime), (Gotham,Action),
(Batman,Thriller), (Batman,Drama), (Batman,Crime), (Batman,Action)},
(2,2)
Movies component:
{Batman Begins (2005), The Dark Knight (2008)}.
31
Conclusion
Krimp (or its descendants) can be considered as a prospective
method for triadic data analysis.
The positive features:
fast computational time (although on the dataset of rather
moderate size with the lowest minimal support minsup = 2);
absolutely dense triclusters (however, this may not be the case
for sparse and noisy datasets);
we can select a rather small set of “large” triclusters (e.g., by
imposing higher support for non-singletons).
The negative features:
the strong trade-off between coverage and the number of
triclusters (switching from coding sets with singletons to
itemsets of higher size);
even higher number of triclusters than the number of
triconcepts when the usage of singletons is allowed.
Thank you!
33

More Related Content

What's hot

Generalized Notions of Data Depth
Generalized Notions of Data DepthGeneralized Notions of Data Depth
Generalized Notions of Data Depth
Mukund Raj
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
Christian Robert
 
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
The Statistical and Applied Mathematical Sciences Institute
 
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
The Statistical and Applied Mathematical Sciences Institute
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
tuxette
 
A Note on Pseudo Operations Decomposable Measure
A Note on Pseudo Operations Decomposable MeasureA Note on Pseudo Operations Decomposable Measure
A Note on Pseudo Operations Decomposable Measure
inventionjournals
 
Lecture12 xing
Lecture12 xingLecture12 xing
Lecture12 xing
Tianlu Wang
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
The Statistical and Applied Mathematical Sciences Institute
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
Christian Robert
 
Lecture50
Lecture50Lecture50
Lecture50
Muhammad Kamran
 
Ch8
Ch8Ch8
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
Christian Robert
 
Max Entropy
Max EntropyMax Entropy
Max Entropyjianingy
 
論文紹介 Fast imagetagging
論文紹介 Fast imagetagging論文紹介 Fast imagetagging
論文紹介 Fast imagetaggingTakashi Abe
 
Boston talk
Boston talkBoston talk
Boston talk
Christian Robert
 
Learning to Reconstruct
Learning to ReconstructLearning to Reconstruct
Learning to Reconstruct
Jonas Adler
 
15_representation.pdf
15_representation.pdf15_representation.pdf
15_representation.pdf
KSChidanandKumarJSSS
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
Christian Robert
 

What's hot (20)

Generalized Notions of Data Depth
Generalized Notions of Data DepthGeneralized Notions of Data Depth
Generalized Notions of Data Depth
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
 
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
A Note on Pseudo Operations Decomposable Measure
A Note on Pseudo Operations Decomposable MeasureA Note on Pseudo Operations Decomposable Measure
A Note on Pseudo Operations Decomposable Measure
 
Lecture12 xing
Lecture12 xingLecture12 xing
Lecture12 xing
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
Lecture50
Lecture50Lecture50
Lecture50
 
Ch8
Ch8Ch8
Ch8
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
Max Entropy
Max EntropyMax Entropy
Max Entropy
 
論文紹介 Fast imagetagging
論文紹介 Fast imagetagging論文紹介 Fast imagetagging
論文紹介 Fast imagetagging
 
Boston talk
Boston talkBoston talk
Boston talk
 
Learning to Reconstruct
Learning to ReconstructLearning to Reconstruct
Learning to Reconstruct
 
15_representation.pdf
15_representation.pdf15_representation.pdf
15_representation.pdf
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 

Similar to Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition Pairs that Compress

Extracting biclusters of similar values with Triadic Concept Analysis
Extracting biclusters of similar values with Triadic Concept AnalysisExtracting biclusters of similar values with Triadic Concept Analysis
Extracting biclusters of similar values with Triadic Concept Analysis
INSA Lyon - L'Institut National des Sciences Appliquées de Lyon
 
A One-Pass Triclustering Approach: Is There any Room for Big Data?
A One-Pass Triclustering Approach: Is There any Room for Big Data?A One-Pass Triclustering Approach: Is There any Room for Big Data?
A One-Pass Triclustering Approach: Is There any Room for Big Data?
Dmitrii Ignatov
 
Traffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equationsTraffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equations
Guillaume Costeseque
 
2019 GDRR: Blockchain Data Analytics - Dissecting Blockchain Price Analytics...
2019 GDRR: Blockchain Data Analytics  - Dissecting Blockchain Price Analytics...2019 GDRR: Blockchain Data Analytics  - Dissecting Blockchain Price Analytics...
2019 GDRR: Blockchain Data Analytics - Dissecting Blockchain Price Analytics...
The Statistical and Applied Mathematical Sciences Institute
 
Cmb part3
Cmb part3Cmb part3
On the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCAOn the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCA
Dmitrii Ignatov
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
The Statistical and Applied Mathematical Sciences Institute
 
Contextual Bandit Survey
Contextual Bandit SurveyContextual Bandit Survey
Contextual Bandit Survey
Sangwoo Mo
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
Frank Nielsen
 
Searching for optimal patterns in Boolean tensors
Searching for optimal patterns in Boolean tensorsSearching for optimal patterns in Boolean tensors
Searching for optimal patterns in Boolean tensors
Dmitrii Ignatov
 
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
The Statistical and Applied Mathematical Sciences Institute
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
Arthur Mensch
 
Digital Signal Processing[ECEG-3171]-Ch1_L07
Digital Signal Processing[ECEG-3171]-Ch1_L07Digital Signal Processing[ECEG-3171]-Ch1_L07
Digital Signal Processing[ECEG-3171]-Ch1_L07
Rediet Moges
 
Interval Pattern Structures: An introdution
Interval Pattern Structures: An introdutionInterval Pattern Structures: An introdution
Interval Pattern Structures: An introdution
INSA Lyon - L'Institut National des Sciences Appliquées de Lyon
 
PhD defense talk slides
PhD  defense talk slidesPhD  defense talk slides
PhD defense talk slides
Chiheb Ben Hammouda
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
The Statistical and Applied Mathematical Sciences Institute
 
Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Netw...
Calibrating the Lee-Carter and the Poisson Lee-Carter models  via Neural Netw...Calibrating the Lee-Carter and the Poisson Lee-Carter models  via Neural Netw...
Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Netw...
Salvatore Scognamiglio
 
Conventional Implicature via Dependent Type Semantics
Conventional Implicature via Dependent Type SemanticsConventional Implicature via Dependent Type Semantics
Conventional Implicature via Dependent Type Semantics
Daisuke BEKKI
 
Generalized Nonlinear Models in R
Generalized Nonlinear Models in RGeneralized Nonlinear Models in R
Generalized Nonlinear Models in R
htstatistics
 
T. Proietti, M. Marczak, G. Mazzi - EuroMInd-D: A density estimate of monthly...
T. Proietti, M. Marczak, G. Mazzi - EuroMInd-D: A density estimate of monthly...T. Proietti, M. Marczak, G. Mazzi - EuroMInd-D: A density estimate of monthly...
T. Proietti, M. Marczak, G. Mazzi - EuroMInd-D: A density estimate of monthly...
Istituto nazionale di statistica
 

Similar to Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition Pairs that Compress (20)

Extracting biclusters of similar values with Triadic Concept Analysis
Extracting biclusters of similar values with Triadic Concept AnalysisExtracting biclusters of similar values with Triadic Concept Analysis
Extracting biclusters of similar values with Triadic Concept Analysis
 
A One-Pass Triclustering Approach: Is There any Room for Big Data?
A One-Pass Triclustering Approach: Is There any Room for Big Data?A One-Pass Triclustering Approach: Is There any Room for Big Data?
A One-Pass Triclustering Approach: Is There any Room for Big Data?
 
Traffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equationsTraffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equations
 
2019 GDRR: Blockchain Data Analytics - Dissecting Blockchain Price Analytics...
2019 GDRR: Blockchain Data Analytics  - Dissecting Blockchain Price Analytics...2019 GDRR: Blockchain Data Analytics  - Dissecting Blockchain Price Analytics...
2019 GDRR: Blockchain Data Analytics - Dissecting Blockchain Price Analytics...
 
Cmb part3
Cmb part3Cmb part3
Cmb part3
 
On the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCAOn the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCA
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Contextual Bandit Survey
Contextual Bandit SurveyContextual Bandit Survey
Contextual Bandit Survey
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 
Searching for optimal patterns in Boolean tensors
Searching for optimal patterns in Boolean tensorsSearching for optimal patterns in Boolean tensors
Searching for optimal patterns in Boolean tensors
 
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Digital Signal Processing[ECEG-3171]-Ch1_L07
Digital Signal Processing[ECEG-3171]-Ch1_L07Digital Signal Processing[ECEG-3171]-Ch1_L07
Digital Signal Processing[ECEG-3171]-Ch1_L07
 
Interval Pattern Structures: An introdution
Interval Pattern Structures: An introdutionInterval Pattern Structures: An introdution
Interval Pattern Structures: An introdution
 
PhD defense talk slides
PhD  defense talk slidesPhD  defense talk slides
PhD defense talk slides
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
 
Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Netw...
Calibrating the Lee-Carter and the Poisson Lee-Carter models  via Neural Netw...Calibrating the Lee-Carter and the Poisson Lee-Carter models  via Neural Netw...
Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Netw...
 
Conventional Implicature via Dependent Type Semantics
Conventional Implicature via Dependent Type SemanticsConventional Implicature via Dependent Type Semantics
Conventional Implicature via Dependent Type Semantics
 
Generalized Nonlinear Models in R
Generalized Nonlinear Models in RGeneralized Nonlinear Models in R
Generalized Nonlinear Models in R
 
T. Proietti, M. Marczak, G. Mazzi - EuroMInd-D: A density estimate of monthly...
T. Proietti, M. Marczak, G. Mazzi - EuroMInd-D: A density estimate of monthly...T. Proietti, M. Marczak, G. Mazzi - EuroMInd-D: A density estimate of monthly...
T. Proietti, M. Marczak, G. Mazzi - EuroMInd-D: A density estimate of monthly...
 

More from Dmitrii Ignatov

Interpretable Concept-Based Classification with Shapley Values
Interpretable Concept-Based Classification with Shapley ValuesInterpretable Concept-Based Classification with Shapley Values
Interpretable Concept-Based Classification with Shapley Values
Dmitrii Ignatov
 
AIST2019 – opening slides
AIST2019 – opening slidesAIST2019 – opening slides
AIST2019 – opening slides
Dmitrii Ignatov
 
Personal Experiences of Publishing with Springer from both Editor and Author ...
Personal Experiences of Publishing with Springer from both Editor and Author ...Personal Experiences of Publishing with Springer from both Editor and Author ...
Personal Experiences of Publishing with Springer from both Editor and Author ...
Dmitrii Ignatov
 
Sequence mining
Sequence miningSequence mining
Sequence mining
Dmitrii Ignatov
 
NIPS 2016, Tensor-Learn@NIPS, and IEEE ICDM 2016
NIPS 2016, Tensor-Learn@NIPS, and IEEE ICDM 2016NIPS 2016, Tensor-Learn@NIPS, and IEEE ICDM 2016
NIPS 2016, Tensor-Learn@NIPS, and IEEE ICDM 2016
Dmitrii Ignatov
 
Experimental Economics and Machine Learning workshop
Experimental Economics and Machine Learning workshopExperimental Economics and Machine Learning workshop
Experimental Economics and Machine Learning workshop
Dmitrii Ignatov
 
Pattern-based classification of demographic sequences
Pattern-based classification of demographic sequencesPattern-based classification of demographic sequences
Pattern-based classification of demographic sequences
Dmitrii Ignatov
 
A lattice-based consensus clustering
A lattice-based consensus clusteringA lattice-based consensus clustering
A lattice-based consensus clustering
Dmitrii Ignatov
 
A lattice-based consensus clustering
A lattice-based consensus clusteringA lattice-based consensus clustering
A lattice-based consensus clustering
Dmitrii Ignatov
 
AIST 2016 Opening Slides
AIST 2016 Opening SlidesAIST 2016 Opening Slides
AIST 2016 Opening Slides
Dmitrii Ignatov
 
Putting OAC-triclustering on MapReduce
Putting OAC-triclustering on MapReducePutting OAC-triclustering on MapReduce
Putting OAC-triclustering on MapReduce
Dmitrii Ignatov
 
Context-Aware Recommender System Based on Boolean Matrix Factorisation
Context-Aware Recommender System Based on Boolean Matrix FactorisationContext-Aware Recommender System Based on Boolean Matrix Factorisation
Context-Aware Recommender System Based on Boolean Matrix Factorisation
Dmitrii Ignatov
 
Pattern Mining and Machine Learning for Demographic Sequences
Pattern Mining and Machine Learning for Demographic SequencesPattern Mining and Machine Learning for Demographic Sequences
Pattern Mining and Machine Learning for Demographic Sequences
Dmitrii Ignatov
 
RAPS: A Recommender Algorithm Based on Pattern Structures
RAPS: A Recommender Algorithm Based on Pattern StructuresRAPS: A Recommender Algorithm Based on Pattern Structures
RAPS: A Recommender Algorithm Based on Pattern Structures
Dmitrii Ignatov
 
Поиск частых множеств признаков (товаров) и ассоциативные правила
Поиск частых множеств признаков (товаров) и ассоциативные правилаПоиск частых множеств признаков (товаров) и ассоциативные правила
Поиск частых множеств признаков (товаров) и ассоциативные правила
Dmitrii Ignatov
 
Введение в рекомендательные системы. 3 case-study без NetFlix.
Введение в рекомендательные системы. 3 case-study без NetFlix.Введение в рекомендательные системы. 3 case-study без NetFlix.
Введение в рекомендательные системы. 3 case-study без NetFlix.
Dmitrii Ignatov
 
Intro to Data Mining and Machine Learning
Intro to Data Mining and Machine LearningIntro to Data Mining and Machine Learning
Intro to Data Mining and Machine Learning
Dmitrii Ignatov
 
Boolean matrix factorisation for collaborative filtering
Boolean matrix factorisation for collaborative filteringBoolean matrix factorisation for collaborative filtering
Boolean matrix factorisation for collaborative filtering
Dmitrii Ignatov
 
Online Recommender System for Radio Station Hosting: Experimental Results Rev...
Online Recommender System for Radio Station Hosting: Experimental Results Rev...Online Recommender System for Radio Station Hosting: Experimental Results Rev...
Online Recommender System for Radio Station Hosting: Experimental Results Rev...
Dmitrii Ignatov
 
Aist2014
Aist2014Aist2014
Aist2014
Dmitrii Ignatov
 

More from Dmitrii Ignatov (20)

Interpretable Concept-Based Classification with Shapley Values
Interpretable Concept-Based Classification with Shapley ValuesInterpretable Concept-Based Classification with Shapley Values
Interpretable Concept-Based Classification with Shapley Values
 
AIST2019 – opening slides
AIST2019 – opening slidesAIST2019 – opening slides
AIST2019 – opening slides
 
Personal Experiences of Publishing with Springer from both Editor and Author ...
Personal Experiences of Publishing with Springer from both Editor and Author ...Personal Experiences of Publishing with Springer from both Editor and Author ...
Personal Experiences of Publishing with Springer from both Editor and Author ...
 
Sequence mining
Sequence miningSequence mining
Sequence mining
 
NIPS 2016, Tensor-Learn@NIPS, and IEEE ICDM 2016
NIPS 2016, Tensor-Learn@NIPS, and IEEE ICDM 2016NIPS 2016, Tensor-Learn@NIPS, and IEEE ICDM 2016
NIPS 2016, Tensor-Learn@NIPS, and IEEE ICDM 2016
 
Experimental Economics and Machine Learning workshop
Experimental Economics and Machine Learning workshopExperimental Economics and Machine Learning workshop
Experimental Economics and Machine Learning workshop
 
Pattern-based classification of demographic sequences
Pattern-based classification of demographic sequencesPattern-based classification of demographic sequences
Pattern-based classification of demographic sequences
 
A lattice-based consensus clustering
A lattice-based consensus clusteringA lattice-based consensus clustering
A lattice-based consensus clustering
 
A lattice-based consensus clustering
A lattice-based consensus clusteringA lattice-based consensus clustering
A lattice-based consensus clustering
 
AIST 2016 Opening Slides
AIST 2016 Opening SlidesAIST 2016 Opening Slides
AIST 2016 Opening Slides
 
Putting OAC-triclustering on MapReduce
Putting OAC-triclustering on MapReducePutting OAC-triclustering on MapReduce
Putting OAC-triclustering on MapReduce
 
Context-Aware Recommender System Based on Boolean Matrix Factorisation
Context-Aware Recommender System Based on Boolean Matrix FactorisationContext-Aware Recommender System Based on Boolean Matrix Factorisation
Context-Aware Recommender System Based on Boolean Matrix Factorisation
 
Pattern Mining and Machine Learning for Demographic Sequences
Pattern Mining and Machine Learning for Demographic SequencesPattern Mining and Machine Learning for Demographic Sequences
Pattern Mining and Machine Learning for Demographic Sequences
 
RAPS: A Recommender Algorithm Based on Pattern Structures
RAPS: A Recommender Algorithm Based on Pattern StructuresRAPS: A Recommender Algorithm Based on Pattern Structures
RAPS: A Recommender Algorithm Based on Pattern Structures
 
Поиск частых множеств признаков (товаров) и ассоциативные правила
Поиск частых множеств признаков (товаров) и ассоциативные правилаПоиск частых множеств признаков (товаров) и ассоциативные правила
Поиск частых множеств признаков (товаров) и ассоциативные правила
 
Введение в рекомендательные системы. 3 case-study без NetFlix.
Введение в рекомендательные системы. 3 case-study без NetFlix.Введение в рекомендательные системы. 3 case-study без NetFlix.
Введение в рекомендательные системы. 3 case-study без NetFlix.
 
Intro to Data Mining and Machine Learning
Intro to Data Mining and Machine LearningIntro to Data Mining and Machine Learning
Intro to Data Mining and Machine Learning
 
Boolean matrix factorisation for collaborative filtering
Boolean matrix factorisation for collaborative filteringBoolean matrix factorisation for collaborative filtering
Boolean matrix factorisation for collaborative filtering
 
Online Recommender System for Radio Station Hosting: Experimental Results Rev...
Online Recommender System for Radio Station Hosting: Experimental Results Rev...Online Recommender System for Radio Station Hosting: Experimental Results Rev...
Online Recommender System for Radio Station Hosting: Experimental Results Rev...
 
Aist2014
Aist2014Aist2014
Aist2014
 

Recently uploaded

ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
RASHMI M G
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Anemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptxAnemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptx
muralinath2
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
SSR02
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Red blood cells- genesis-maturation.pptx
Red blood cells- genesis-maturation.pptxRed blood cells- genesis-maturation.pptx
Red blood cells- genesis-maturation.pptx
muralinath2
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
frank0071
 

Recently uploaded (20)

ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Anemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptxAnemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptx
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Red blood cells- genesis-maturation.pptx
Red blood cells- genesis-maturation.pptxRed blood cells- genesis-maturation.pptx
Red blood cells- genesis-maturation.pptx
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
 

Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition Pairs that Compress

  • 1. Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition Pairs that Compress Maxim Yurov and Dmitry I. Ignatov National Research University Higher School of Economics, Moscow, Russia Data Analysis and AI Dept. & Itelligent Systems and Structural Analysis Lab @ Computer Science Faculty IJCRS 2017 Olsztyn, Poland 03.07.2017 1
  • 2. Outline Problem statement Krimp algorithm Triadic data and its transformation Biclustering and Triclustering Results of experiments 2
  • 3. Research Domain Frequent Itemset Mining (FIM) is one of the basic problems in Data Mining. One of the first FIM task is market basket analysis (Agrawal et al., 1993). One of the first FIM algorithms is Apriori (Agrawal et al., 1994). 3
  • 4. Frequent Itemset Mining Problem: a humongous number of frequent itemset, which makes complicated the search of the most interesting patterns among them. Q: How to solve it? A: For example, to use Minimal Description Lenght princinple: MDL principal The best set of frequent itemsets compresses the input data the best.1 1 Siebes A., Vreeken J., van Leeuwen M., Itemsets that compress (2011). 4
  • 5. Frequent Itemset Mining Problem: a humongous number of frequent itemset, which makes complicated the search of the most interesting patterns among them. Q: How to solve it? A: For example, to use Minimal Description Lenght princinple: MDL principal The best set of frequent itemsets compresses the input data the best.1 1 Siebes A., Vreeken J., van Leeuwen M., Itemsets that compress (2011).
  • 6. Frequent Itemset Mining Problem: a humongous number of frequent itemset, which makes complicated the search of the most interesting patterns among them. Q: How to solve it? A: For example, to use Minimal Description Lenght princinple: MDL principal The best set of frequent itemsets compresses the input data the best.1 1 Siebes A., Vreeken J., van Leeuwen M., Itemsets that compress (2011).
  • 7. Krimp Algorithm Input A database D of transactions over a set items I (like purchases in a supermarket). Code Table The code table CT is the table with two columns: the itemsets are on the left and their codes are on the right. The left column contains at least all single itemsets. The codes are unique. 7
  • 8. Krimp Algorithm Input A database D of transactions over a set items I (like purchases in a supermarket). Code Table The code table CT is the table with two columns: the itemsets are on the left and their codes are on the right. The left column contains at least all single itemsets. The codes are unique. 8
  • 9. Figure: Code table example. The width of the Code column shows the length of the code. I = {A, B, C}. NB: the column Usage is not a part of the code table. 2 2 Siebes A., Vreeken J., van Leeuwen M., Itemsets that compress (2011). 9
  • 10. Figure: Example of a database, its cover, and the encoded database based on the previous codetable from Fig. 1. I = {A, B, C}. 3 3 Siebes A., Vreeken J., van Leeuwen M., Itemsets that compress (2011). 10
  • 11. Figure: Example of the standard codetable for database from Fig. 2, its cover and the encoded database. 4 4 Siebes A., Vreeken J., van Leeuwen M., Itemsets that compress (2011). 11
  • 12. Minimal Coding Set Problem Let I be a set of items, D be a dataset of transactions (some itemsets) over I, cover be a coverage function, and F be a set of candidate itemsets. Find the minimal coding set CS ⊆ F such that the resulting code table CT implies the minimum total size of the encoded database and the code table L(D, CT). L(D, CT) = L(D|CT) + L(CT|D) L(CT|D) = X∈CT:usageD(X)=0 L(codeST (X)) + L(codeCT (X))) L(D|CT) = t∈D L(t|CT) L(t|CT) = X∈cover(CT,t) L(codeCT (X)) L(codeCT (X)) = |codeCT (X)| 12
  • 13. Krimp algorithm The algorithmic strategy It starts with the standard code table ST, which contains only singletones X ∈ I Then it adds one by one othes itemsets (candidates) from F. If the resulsting codetable maintains better compression, then Krimp stores it and continues the search. Otherwise, Krimp eliminates this itemset.
  • 14. Krimp algorithm Standard Cover Order Let us order X ∈ CT by decreasing cardinality, then by decreasing support, and finally in lectic order: |X| ↓ suppD(X) ↓ lexicographically ↑ Standard Candidate Order Frequent and long itemsets are of priority: suppD(X) ↓ |X| ↓ lexicographically ↑
  • 15. Krimp algorithm Input: D is a transaction database and F is a candidate set over a input set of items I. Output: A heuristic solution to the Minimal Coding Set Problem, code table CT. 1 CT ← StandardCodeTable(D) 2 F0 ← F in Standard Candidate Order 3 for F ∈ F0 {{i} | i ∈ I} do 4 CTc ← (CT ∪ F) in Standard Cover Order 5 if L(D, CTc) < L(D, CT) then 6 CT ← CTc 7 end 8 end 9 return CT 15
  • 16. Krimp algorithm Figure: The scheme of Krimp. 5 5 Siebes A., Vreeken J., van Leeuwen M., Itemsets that compress (2011). 16
  • 17. Triadic Data Folksonomy is a ternary relation over sets of objects, attributes and conditions.6 From ternary binary relation to dyadic ones (Obj., Attr., Cond.) → (Obj., Attr. × Cond.), where A × B is the Cartesian product of A and B. 6 Folksonomy coinage and definition (2007) T. Vander Wal – http://vanderwal.net/folksonomy.html
  • 18. Triadic Data Folksonomy is a ternary relation over sets of objects, attributes and conditions.6 From ternary binary relation to dyadic ones (Obj., Attr., Cond.) → (Obj., Attr. × Cond.), where A × B is the Cartesian product of A and B. 6 Folksonomy coinage and definition (2007) T. Vander Wal – http://vanderwal.net/folksonomy.html
  • 19. Data 1. A sample of Top-250 movies from www.IMDB.com. The objects are movie titles, the attributes are keywords, and the conditions are genres. 2. A sample from bibliography sharing system BibSonomy.org. The objects are users, the attributes are tags, and the conditions are electronic bookmarks. 19
  • 20. Example of data transformation If there is a movie description in terms of keywords and genres {Star Wars} × {Princess, Empire} × {Adventure, Sci-Fi, Action}, then this piece of data can be transformed into object-attribute form as follows: {Star Wars} ×    (Princess, Adventure), (Princess, Sci-Fi) (Princess, Action), (Empire, Adventure) (Empire, Sci-Fi), (Empire, Action)    .
  • 21. Biclustering [Mirkin, 1995] Coinage the term bicluster The term bicluster(ing) was proposed by B. Mirkin in the book Mathematical Classification and Clustering. Kluwer Academic Publishers (1996). p. 296 The term biclustering refers to simultaneous clustering of both row and column sets in a data matrix. Biclustering addresses the problems of aggregate representation of the basic features of interrelation between rows and columns as expressed in the data. 21
  • 22. Concept-based biclustering [D. Ignatov and S. Kuznetsov, 2010] Let K = (G, M, I ⊆ G × M) be a formal context. Definition 1 If (g, m) ∈ I, then (m , g ) is called an object-attribute bicluster or OA-bicluster with density ρ(m , g ) = |I∩(m ×g )| |m |·|g | .7 7 (.) : 2G → 2M and (.) : 2M → 2G are the derivation operators applied to {g} ⊆ G and {m} ⊆ M in sense of [Ganter & Wille, 1999].
  • 23. Geometric interpretation of OA-bicluster: connection with RST [D. Ignatov and S. Kuznetsov, 2010] g m g'' m'' g' m' 23
  • 24. Triadic FCA and Triclustering [Lehman & Wille, 1993] Consider K = (G, M, B, J ⊆ G × M × B), a triadic context; in what follows we will refer to a trisets T = (X, Y , Z) with Z ⊆ G, Y ⊆ M, Z ⊆ B as an object-attribute-condition tricluster or simply tricluster8. 8 Ignatov, D.I., Gnatyshak, D.V., Kuznetsov, S.O., Mirkin, B.G.: Triadic formal concept analysis and triclustering: searching for optimal patterns. Machine Learning 101(1-3) (2015) 271–302 24
  • 25. KRIMP-based triclusters Each encoding set of (object, attribute) pairs found by Krimp is contained as a coding block in the description of some object g ∈ G. Let S be a coding set returned by Krimp that consists of n attribute-condition pairs from M × B. Then the first component X of the corresponding tricluster is {g | (g, m, b) ∈ I for all (m, b) ∈ S}. The remaining two components are Y = {m | ∀(m, b) ∈ S} and Z = {b | ∀(m, b) ∈ S}. S is not necessarily equal to Y × Z, so, some amount of missing triples is allowed inside T = (X, Y , Z). The quality of such a tricluster can be assessed by its density.
  • 26. Quality measures Density ρ(Ti ) = |J ∩ (X × Y × Z)| |X||Y ||Z| For the tricluster collection: ρ(T ) = Ti ∈T ρ(Ti ) |T | Coverage coverage(T , K) = | (X,Y ,Z)∈T X × Y × Z ∩ J| |J| 26
  • 27. Diversity diversity(T ) = 1 − j i<j intersect(Ti , Tj ) |T |(|T |−1) 2 , where: intersect(Ti , Tj ) =    1, GTi ∩ GTj = ∅∧ ∧MTi ∩ MTj = ∅∧ ∧BTi ∩ BTj = ∅ 0, otherwise 27
  • 28. IMDB: Top-250 movies Table: Basic statistics of the IMDB dataset with top-250 movies. Context |G| |M| |B| # triples Density IMDB 250 795 22 3818 0.00087 28
  • 29. Results of experiments with triclustering Table: Time, cardinality, density, coverage and diversity for Top-250 IMDB movies dataset. Algorithm t, ms number of triclusters ρ, % Cov, % Div, % IMDB OAC ( ) 2314 1500 1.84 100 15.650 OAC ( ) 547 1274 53.85 100 96.550 SpecTric 98799 21 17.07 20.88 100 TriBox 197136 328 91.65 98.90 98.890 TRIAS 102554 1956 100 100 99.890 Krimp (minsup = 2, 87 152 100 24.04 99.556 only non-singletons) Krimp (minsup = 2, 87 2859 100 99.97 99.997 usage = 0) Krimp (minsup = 3, 46 57 100 12.07 98.684 only non-singletons) Krimp (minsup = 3, 46 2966 100 99.97 99.998 usage = 0) 29
  • 30. Examples of triclusters Three triclusters extracted by Krimp from IMDB dataset. Tricluster 1. Keyword-genre component: {(Princess,Adventure), (Princess,Fantasy), (Empire,Sci-Fi), (Empire,Adventure), (Empire,Action), (Princess,Sci-Fi), (Princess,Action), (Empire,Fantasy), (Death Star,Sci-Fi), (Death Star,Fantasy), (Death Star,Adventure), (Death Star,Action)}, (2,2) Movies component: {Star Wars: Episode VI – Return of the Jedi (1983), Star Wars (1977)} 30
  • 31. Examples of triclusters Three triclusters extracted by Krimp from IMDB dataset. Tricluster 2. Keyword-genre component: {(Future,Sci-Fi), (Future,Thriller), (Future,Action), (Cyborg,Thriller), (Cyborg,Sci-Fi), (Cyborg,Action), (The Terminator,Thriller), (The Terminator,Sci-Fi), (The Terminator,Action) }, (2,2) Movies component: {The Terminator (1984), Terminator 2: Judgment Day (1991)} Tricluster 3. Keyword-genre component: {(Gotham,Thriller), (Gotham,Drama), (Gotham,Crime), (Gotham,Action), (Batman,Thriller), (Batman,Drama), (Batman,Crime), (Batman,Action)}, (2,2) Movies component: {Batman Begins (2005), The Dark Knight (2008)}. 31
  • 32. Conclusion Krimp (or its descendants) can be considered as a prospective method for triadic data analysis. The positive features: fast computational time (although on the dataset of rather moderate size with the lowest minimal support minsup = 2); absolutely dense triclusters (however, this may not be the case for sparse and noisy datasets); we can select a rather small set of “large” triclusters (e.g., by imposing higher support for non-singletons). The negative features: the strong trade-off between coverage and the number of triclusters (switching from coding sets with singletons to itemsets of higher size); even higher number of triclusters than the number of triconcepts when the usage of singletons is allowed.