SlideShare a Scribd company logo
1 of 52
Download to read offline
On the Mining of Numerical Data with
Formal Concept Analysis
Th`ese de doctorat en informatique
Mehdi Kaytoue
22 April 2011
Amedeo Napoli S´ebastien Duplessis
Somewhere... in a temperate forest...
2 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Context
A biological problem
: How does symbiosis work at the cellular level?
Analyse biological processes
Find genes involved in symbiosis
Choose a model for
understanding symbiosis:
Laccaria bicolor
Analysing Gene Expression Data (GED)
F. Martin et al.
The Genome of Laccaria Bicolor Provides Insights into Mycorrhizal Symbiosis.
In Nature., 2008.
3 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Context
Gene expression data (GED)
A numerical dataset, or data-table with
genes in rows
biological situations in columns
expression value of a gene in row for
the situation in column.
A row denotes the expression profile
of a gene (GEP)
m1 m2 m3
g1 5 7 6
g2 6 8 4
g3 4 8 5
g4 4 9 8
g5 5 8 5
Biological hypothesis
A group of genes having a similar expression profile interact to-
gether within the same biological process
4 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Context
With very large datasets...
Gene expression data of Laccaria bicolor
22,294 genes
3 types of biological situations reflecting cells of the organism in
various stages of its biological cycle:
free living mycelium
symbiotic tissues
fruiting bodies
Attribute values ranged in [0, 65000]
5 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Context
Knowledge discovery in databases
An iterative and interactive process
U. Fayyad, G. Piatetsky-Shapiro and P. Smyth
The KDD process for Extracting Useful Knowledge from Volumes of Data.
In Commun. ACM., 1996.
6 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Context
Mining gene expression data
Extracting (maximal) rectangles in numerical data
A set of genes co-expressed in some biological situations
Local patterns: biological processes may be activated in some
situations only
Overlapping patterns: a gene may be involved in several
biological process
m1 m2 m3 m4 m5
g1 1 2 2 1 6
g2 2 1 1 0 6
g3 2 2 1 7 6
g4 8 9 2 6 7
Biclustering: A difficult problem relying on heuristics
R. Peeters
The Maximum Edge Biclique Problem is NP-Complete.
In Discrete Applied Math., vol. 131, no. 3., 2003
7 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Context
Core of the thesis
Mining gene expression data with formal concept analysis
Turning GED into binary, encoding over/under expression
Bringing the problem into well-known settings
Allowing a complete and mathematically well defined approach
Exploiting algorithms and “tools”
m1 m2 m3 m4 m5
g1 1 2 2 1 6
g2 2 1 1 5 6
g3 2 2 1 7 6
g4 8 9 2 6 7
⇒
m1 m2 m3 m4 m5
g1 0 0 0 0 1
g2 0 0 0 0 1
g3 0 0 0 1 1
g4 1 1 0 1 1
Can we work with FCA directly on numerical data?
8 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Context
Core of the thesis
Mining gene expression data with formal concept analysis
Turning GED into binary, encoding over/under expression
Bringing the problem into well-known settings
Allowing a complete and mathematically well defined approach
Exploiting algorithms and “tools”
m1 m2 m3 m4 m5
g1 1 2 2 1 6
g2 2 1 1 5 6
g3 2 2 1 7 6
g4 8 9 2 6 7
⇒
m1 m2 m3 m4 m5
g1 ×
g2 ×
g3 × ×
g4 × × × ×
Can we work with FCA directly on numerical data?
8 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Context
Outline
1 Context
2 Formal Concept Analysis
3 Contributions
Interval pattern structures
Introducing similarity
A KDD-oriented discussion
4 Conclusion and perspectives
9 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Formal Concept Analysis
A binary table as a formal context
Given by (G, M, I) with
G a set of objects
M a set of attributes
I a binary relation between objects and attributes:
(g, m) ∈ I means that “object g owns attribute m”
m1 m2 m3
g1 × ×
g2 × ×
g3 × ×
g4 × ×
g5 × × ×
G = {g1, . . . , g5}
M = {m1, m2, m3}
(g1, m3) ∈ I
B. Ganter and R. Wille
Formal Concept Analysis.
In Springer, Mathematical foundations., 1999.
10 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Formal Concept Analysis
A maximal rectangle as a formal concept
A Galois connection to characterize formal concepts
A = {m ∈ M | ∀g ∈ A ⊆ G : (g, m) ∈ I}
B = {g ∈ G | ∀m ∈ B ⊆ M : (g, m) ∈ I}
(A, B) is a concept with extent A = B and intent B = A
{g3} = {m2, m3}
{m2, m3} = {g3, g4, g5}
m1 m2 m3
g1 × ×
g2 × ×
g3 × ×
g4 × ×
g5 × × ×
({g3, g4, g5}, {m2, m3}) is a formal concept
11 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Formal Concept Analysis
Concept lattice
Ordered set of concepts...
(A1, B1) ≤ (A2, B2) ⇔ A1 ⊆ A2 (⇔ B2 ⊆ B1)
({g1, g5}, {m1, m3}) ≤ ({g1, g2, g5}, {m1})
... with interesting properties
Maximality of concepts as rectangles
Overlapping of concepts
Specialization/generalisation hierarchy
Synthetic representation of the data without loss of information
12 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Formal Concept Analysis
Handling numerical data with FCA?
Initial problem
Extracting groups of genes with similar numerical values
Conceptual scaling (discretization or binarization)
An object has an attribute if its value lies in a predefined interval
m1 m2 m3
g1 5 7 6
g2 6 8 4
g3 4 8 5
g4 4 9 8
g5 5 8 5
m1, [4, 5] m2, [4, 7] m3, [5, 6]
g1 × × ×
g2
g3 × ×
g4 ×
g5 × ×
Different scalings: different interpretations of the data
General problem of the thesis
How to directly build a concept lattice from numerical data?
13 / 40
On the Mining of Numerical Data with Formal Concept Analysis
1 Context
2 Formal Concept Analysis
3 Contributions
Interval pattern structures
Introducing similarity
A KDD-oriented discussion
4 Conclusion and perspectives
Contributions – Interval pattern structures
How to handle complex descriptions
An intersection as a similarity operator
∩ behaves as similarity operator
{m1, m2} ∩ {m1, m3} = {m1}
∩ induces an ordering relation ⊆
N ∩ O = N ⇐⇒ N ⊆ O
{m1} ∩ {m1, m2} = {m1} ⇐⇒ {m1} ⊆ {m1, m2}
∩ has the properties of a meet in a semi lattice,
a commutative, associative and idempotent operation
c d = c ⇐⇒ c d
A. Tversky
Features of similarity.
In Psychological Review, 84 (4), 1977.
15 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – Interval pattern structures
Pattern structure
Given by (G, (D, ), δ)
G a set of objects
(D, ) a semi-lattice of descriptions or patterns
δ a mapping such as δ(g) ∈ D describes object g
A Galois connection
A =
g∈A
δ(g) for A ⊆ G
d = {g ∈ G|d δ(g)} for d ∈ (D, )
B. Ganter and S. O. Kuznetsov
Pattern Structures and their Projections.
In International Conference on Conceptual Structures, 2001.
16 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – Interval pattern structures
Numerical data are pattern structures
Interval pattern structures
m1 m2 m3
g1 5 7 6
g2 6 8 4
g3 4 8 5
g4 4 9 8
g5 5 8 5
{g1, g2} =
g∈{g1,g2}
δ(g)
= 5, 7, 6 6, 8, 4
= [5, 6], [7, 8], [4, 6]
[5, 6], [7, 8], [4, 6] = {g ∈ G| [5, 6], [7, 8], [4, 6] δ(g)}
= {g1, g2, g5}
({g1, g2, g5}, [5, 6], [7, 8], [4, 6] ) is a (pattern) concept
17 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – Interval pattern structures
Interval pattern concept lattice
Lowest concepts: few objects, small intervals
Highest concepts: many objects, large intervals
Concept/pattern overwhelming
18 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – Interval pattern structures
Links with conceptual scaling
Interordinal scaling [Ganter & Wille]
A scale to encode intervals of attribute values
m1 ≤ 4 m1 ≤ 5 m1 ≤ 6 m1 ≥ 4 m1 ≥ 5 m1 ≥ 6
4 × × × ×
5 × × × ×
6 × × × ×
Equivalent concept lattice
Example
({g1, g2, g5}, {m1 ≤ 6, m1 ≥ 4, m1 ≥ 5, ... , ... })
({g1, g2, g5}, [5, 6] , ... , ... )
Why should we use pattern structures as we have scaling?
Processing a pattern structure is more efficient
19 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – Introducing similarity
Outline
1 Context
2 Formal Concept Analysis
3 Contributions
Interval pattern structures
Introducing similarity
A KDD-oriented discussion
4 Conclusion and perspectives
20 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – Introducing similarity
Introducing a similarity relation
Grouping in a same concept objects having similar values?
A natural similarity relation on numbers
a θ b ⇔ |a − b| ≤ θ e.g. 4 1 5 4 1 6
Similarity operator in pattern structures
4 5 6
[4,5] [5,6]
[4,6]
How to consider a similarity relation w.r.t. a distance?
21 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – Introducing similarity
Introducing a similarity relation
Grouping in a same concept objects having similar values?
A natural similarity relation on numbers
a θ b ⇔ |a − b| ≤ θ e.g. 4 1 5 4 1 6
Similarity operator in pattern structures
θ = 2
4 5 6
[4,5] [5,6]
[4,6]
How to consider a similarity relation w.r.t. a distance?
21 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – Introducing similarity
Introducing a similarity relation
Grouping in a same concept objects having similar values?
A natural similarity relation on numbers
a θ b ⇔ |a − b| ≤ θ e.g. 4 1 5 4 1 6
Similarity operator in pattern structures
θ = 1
4 5 6
[4,5] [5,6]
[4,6]
How to consider a similarity relation w.r.t. a distance?
21 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – Introducing similarity
Introducing a similarity relation
Grouping in a same concept objects having similar values?
A natural similarity relation on numbers
a θ b ⇔ |a − b| ≤ θ e.g. 4 1 5 4 1 6
Similarity operator in pattern structures
θ = 04 5 6
[4,5] [5,6]
[4,6]
How to consider a similarity relation w.r.t. a distance?
21 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – Introducing similarity
Towards a similarity between values
Introduce an element ∗ ∈ (D, ) denoting dissimilarity
c d = ∗ iff c θ d
c d = ∗ iff c θ d
Example with θ = 1
m1 m2 m3
g1 5 7 6
g2 6 8 4
g3 4 8 5
g4 4 9 8
g5 5 8 5
{g3, g4} = [4, 4], [8, 9], ∗
[4, 4], [8, 9], ∗ = {g3, g4}
({g3, g4}, [4, 4], [8, 9], ∗ ) is a concept:
g3 and g4 have similar values for attributes m1 and m2 only
22 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – Introducing similarity
Towards a similarity between values
Introduce an element ∗ ∈ (D, ) denoting dissimilarity
c d = ∗ iff c θ d
c d = ∗ iff c θ d
Example with θ = 1
m1 m2 m3
g1 5 7 6
g2 6 8 4
g3 4 8 5
g4 4 9 8
g5 5 8 5
{g3, g4} = [4, 4], [8, 9], ∗
[4, 4], [8, 9], ∗ = {g3, g4}
({g3, g4}, [4, 4], [8, 9], ∗ ) is a concept:
g3 and g4 have similar values for attributes m1 and m2 only
Is {g3, g4} maximal w.r.t. similarity? We can add g5...
22 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – Introducing similarity
Classes of tolerance in numerical data
Towards maximal sets of similar values
θ a tolerance relation : reflexive, symmetric, not transitive
Consider an attribute taking values in {6, 8, 11, 16, 17} and θ = 5
8 5 11, 11 5 16 but 8 5 16
A class of tolerance as a maximal set of pairwise similar values
{6, 8, 11} {11, 16} {16, 17}
[6, 11] [11, 16] [16, 17]
S. O. Kuznetsov
Galois Connections in Data Analysis: Contributions from the Soviet Era and Modern Russian Research.
In Formal Concept Analysis, Foundations and Applications, 2005.
23 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – Introducing similarity
Tolerance in pattern structures
Projecting the pattern structure
Each value is replaced by the interval characterizing its class of
tolerance (if unique)
Each pattern d is projected with a mapping ψ(d) d
(pre-processing)
Example with θ = 1
m1 m2 m3
g1 5 7 6
g2 6 8 4
g3 4 8 5
g4 4 9 8
g5 5 8 5
{g3, g4} = ψ( [4, 4], [8, 9], ∗ )
= [4, 5], [8, 9], ∗
[4, 5], [8, 9], ∗ = {g3, g4, g5}
24 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – Introducing similarity
Biological results
An extracted pattern among 2, 150 others
Genes present a high expression level in the fruit-body situations
Some of these genes encode metabolic enzymes in remobilization
of fungal resources towards the new organ in development
Other genes are unknown but specific to Laccaria Bicolor: it
requires biological experiments
25 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – Introducing similarity
Relevant publications
Interval pattern structures and GED analysis
M. Kaytoue, S. Duplessis, S. O. Kuznetsov, and A. Napoli
Two FCA-Based Methods for Mining Gene Expression Data.
In International Conference on Formal Concept Analysis (ICFCA), 2009.
M. Kaytoue, S. O. Kuznetsov, A. Napoli and S. Duplessis
Mining Gene Expression Data with Pattern Structures in Formal Concept Analysis.
In Information Sciences. Spec. Iss.: Lattices (Elsevier), 2011.
Introducing tolerance relations and information fusion
M. Kaytoue, Z. Assaghir, N. Messai and A. Napoli
Two Complementary Classification Methods for Designing a Concept Lattice from Interval Data.
In Foundations of Information and Knowledge Systems, 6th International Symposium (FoIKS), 2010.
M. Kaytoue, Z. Assaghir, A. Napoli and S. O. Kuznetsov
Embedding Tolerance Relations in Formal Concept Analysis: an Application in Information Fusion.
In ACM Conference on Information and Knowledge Management (CIKM), 2010.
26 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions –
Other works
Pattern structures are useful for several tasks
Bi-clustering and tolerance relations
M. Kaytoue, S. O. Kuznetsov, and A. Napoli
Biclustering Numerical Data in Formal Concept Analysis.
In International Conference on Formal Concept Analysis (ICFCA), 2011.
Information fusion: enhancing decision making
Z. Assaghir, M. Kaytoue, A. Napoli and H. Prade
Managing Information Fusion with Formal Concept Analysis.
In Modeling Decisions for Artificial Intelligence, 6th International Conference (MDAI), 2010.
KDD: a study of equivalence classes of interval patterns
M. Kaytoue, S. O. Kuznetsov, and A. Napoli
Revisiting Numerical Pattern Mining with Formal Concept Analysis.
In International Joint Conference on Artificial Intelligence (IJCAI), 2011.
27 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – A KDD-oriented discussion
Outline
1 Context
2 Formal Concept Analysis
3 Contributions
Interval pattern structures
Introducing similarity
A KDD-oriented discussion
4 Conclusion and perspectives
28 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – A KDD-oriented discussion
Interval pattern search space
Counting all possible interval patterns
[am1 , bm1 ], [am2 , bm2 ], ...
where ami , bmi ∈ Wmi
m1 m2 m3
g1 5 7 6
g2 6 8 4
g3 4 8 5
g4 4 9 8
g5 5 8 5
i∈{1,...,|M|}
|Wmi | × (|Wmi | + 1)
2
360 possible interval patterns in our small example
29 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – A KDD-oriented discussion
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6
g2 6 4
g3 4 5
g4 4 8
g5 5 5
3
4
5
6
7
8
3 4 5 6
m1
m3
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
30 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – A KDD-oriented discussion
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6
g2 6 4
g3 4 5
g4 4 8
g5 5 5
[4, 5], [5, 6] = {g1, g3, g5}
3
4
5
6
7
8
3 4 5 6
m1
m3
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
30 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – A KDD-oriented discussion
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6
g2 6 4
g3 4 5
g4 4 8
g5 5 5
[4, 5], [5, 6] = {g1, g3, g5}
[4, 5], [5, 7] = {g1, g3, g5}
3
4
5
6
7
8
3 4 5 6
m1
m3
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
30 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – A KDD-oriented discussion
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6
g2 6 4
g3 4 5
g4 4 8
g5 5 5
[4, 5], [5, 6] = {g1, g3, g5}
[4, 5], [5, 7] = {g1, g3, g5}
[4, 6], [5, 6] = {g1, g3, g5}
3
4
5
6
7
8
3 4 5 6
m1
m3
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
30 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – A KDD-oriented discussion
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6
g2 6 4
g3 4 5
g4 4 8
g5 5 5
[4, 5], [5, 6] = {g1, g3, g5}
[4, 5], [5, 7] = {g1, g3, g5}
[4, 6], [5, 6] = {g1, g3, g5}
[4, 5], [4, 6] = {g1, g3, g5} 3
4
5
6
7
8
3 4 5 6
m1
m3
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
30 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – A KDD-oriented discussion
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6
g2 6 4
g3 4 5
g4 4 8
g5 5 5
[4, 5], [5, 6] = {g1, g3, g5}
[4, 5], [5, 7] = {g1, g3, g5}
[4, 6], [5, 6] = {g1, g3, g5}
[4, 5], [4, 6] = {g1, g3, g5}
[4, 6], [5, 7] = {g1, g3, g5}
3
4
5
6
7
8
3 4 5 6
m1
m3
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
30 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – A KDD-oriented discussion
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6
g2 6 4
g3 4 5
g4 4 8
g5 5 5
[4, 5], [5, 6] = {g1, g3, g5}
[4, 5], [5, 7] = {g1, g3, g5}
[4, 6], [5, 6] = {g1, g3, g5}
[4, 5], [4, 6] = {g1, g3, g5}
[4, 6], [5, 7] = {g1, g3, g5}
[4, 5], [4, 7] = {g1, g3, g5}
3
4
5
6
7
8
3 4 5 6
m1
m3
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
30 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – A KDD-oriented discussion
A condensed representation
Equivalence classes of interval patterns
Two interval patterns with same image are said to be equivalent
c ∼= d ⇐⇒ c = d
Equivalence class of a pattern d
[d] = {c|c ∼= d}
with a unique closed pattern: the smallest rectangle
and one or several generators: the largest rectangles
Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal.
Mining frequent patterns with counting inference.
SIGKDD Expl., 2(2):66–75, 2000.
In our example: 360 patterns ; 18 closed ; 44 generators
31 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – A KDD-oriented discussion
Algorithms & experiments
Algorithms: MintIntChange, MinIntChangeG[t|h]
4 5 6
[4,5] [5,6]
[4,6]
Experiments
Mining several datasets from Bilkent University Repository
Compression rate varies between 107
and 109
Interordinal scaling: encodes 30.000 binary patterns
not efficient even with best algorithms (e.g. LCMv2)
redundancy problem discarding its use for generator extraction
32 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – A KDD-oriented discussion
Algorithms & experiments
Algorithms: MintIntChange, MinIntChangeG[t|h]
4 5 6
[4,5] [5,6]
[4,6]
Experiments
Mining several datasets from Bilkent University Repository
Compression rate varies between 107
and 109
Interordinal scaling: encodes 30.000 binary patterns
not efficient even with best algorithms (e.g. LCMv2)
redundancy problem discarding its use for generator extraction
32 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Contributions – A KDD-oriented discussion
Discussion
Advantages
Minimum description length principle favours generators
Potential applications
Data privacy and k-anonymisation
k-box problem in computational geometry
Quantitative association rule mining
Data summarization
Problem
With very large data set, compression is not enough
Numerical data are noisy
Need of fault-tolerant condensed representations
33 / 40
On the Mining of Numerical Data with Formal Concept Analysis
1 Context
2 Formal Concept Analysis
3 Contributions
Interval pattern structures
Introducing similarity
A KDD-oriented discussion
4 Conclusion and perspectives
Conclusion and perspectives
Conclusion
A new insight for the mining numerical data
Our main tools...
Formal Concept Analysis and conceptual scaling
Pattern structures and projections
Tolerance relation
... for numerical data mining
Conceptual representations of numerical data
Bi-clustering
Information fusion
Applications: GED analysis and agricultural practice assessment
35 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Conclusion and perspectives
Conclusion
An application in GED analysis
With FCA and pattern structures
Many ways of extracting patterns in GED
Biological validation of several patterns
We now need a systematic validation step using new knowledge
transcription factors
biological knowledge base, e.g. Gene Ontology
36 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Conclusion and perspectives
To be continued...
Short- and mid- term
Handle other types of biclusters and algorithm comparison
S. C. Madeira and A. L. Oliveira
Biclustering Algorithms for Biological Data Analysis: a survey.
In IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004.
Insert domain knowledge for biological data
Study threshold θ effect w.r.t. the number of tolerance classes
Post-doctoral position
Biclustering (multi-dimensional) numerical data
Numerical pattern based classifier and association rules
Data privacy and pattern projection
Wagner Jr. Meira (Universidade Federal de Minas Gerais, Brasil)
37 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Conclusion and perspectives
Cross-domain fertilization
Itemset-mining in KDD
Other frameworks for closed patterns
H. Arimura and T. Uno
Polynomial-Delay and Polynomial-Space Algorithms for Mining Closed Sequences, Graphs, and
Pictures in Accessible Set Systems.
In SIAM International Conference on Data Mining, 2009.
G.C. Garriga
Formal Methods for Mining Structured Objects.
PhD Thesis, Universitat Polit`ecnica de Catalunya, 2006
Condensed representations and fault-tolerant patterns
m1 m2 m3
g1 5 7 6
g2 6 8 4
g3 4 8 5
g4 4 9 8
g5 15 8 5
R. Pensa and J.-F. Boulicaut
Towards Fault-Tolerant Formal Concept Analysis.
In Proc. 9th Congress of the Italian Association for Artificial Intelligence (AI*IA), Springer, 2005.
38 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Conclusion and perspectives
Cross-domain fertilization
Data-analysis
Symbolic data analysis and distances
P. Agarwal, M. Kaytoue, S. O. Kuznetsov, A. Napoli and G. Polaillon
Symbolic Galois Lattices with Pattern Structures.
In International Conference on Rough Sets, Fuzzy Sets, Data-mining and Granularity Computing
(RSFDGrC), 2011.
Information fusion and fuzzy concept analysis
Fuzzy settings and possibility theory
Z. Assaghir, M. Kaytoue, and H. Prade
A Possibility Theory Oriented Discussion of Conceptual Pattern Ptructures.
In Scalable Uncertainty Management, 4th International Conference (SUM), 2010.
39 / 40
On the Mining of Numerical Data with Formal Concept Analysis
Merci
Danke sch¨on
Spasibo
40 / 40
On the Mining of Numerical Data with Formal Concept Analysis

More Related Content

What's hot

Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
Pattern-based classification of demographic sequences
Pattern-based classification of demographic sequencesPattern-based classification of demographic sequences
Pattern-based classification of demographic sequencesDmitrii Ignatov
 
A lattice-based consensus clustering
A lattice-based consensus clusteringA lattice-based consensus clustering
A lattice-based consensus clusteringDmitrii Ignatov
 
Similarity Measures in Formal Concept Analysis
Similarity Measures in Formal Concept AnalysisSimilarity Measures in Formal Concept Analysis
Similarity Measures in Formal Concept AnalysisFaris Alqadah
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modelingHiroyuki Kuromiya
 
Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...tuxette
 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Daniele Di Mitri
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscapeDevansh16
 
Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...butest
 
A sat encoding for solving games with energy objectives
A sat encoding for solving games with energy objectivesA sat encoding for solving games with energy objectives
A sat encoding for solving games with energy objectivescsandit
 
AN ARITHMETIC OPERATION ON HEXADECAGONAL FUZZY NUMBER
AN ARITHMETIC OPERATION ON HEXADECAGONAL FUZZY NUMBERAN ARITHMETIC OPERATION ON HEXADECAGONAL FUZZY NUMBER
AN ARITHMETIC OPERATION ON HEXADECAGONAL FUZZY NUMBERijfls
 
Locally consistent concept factorization for
Locally consistent concept factorization forLocally consistent concept factorization for
Locally consistent concept factorization foringenioustech
 
Adjoint operator in probabilistic hilbert space
Adjoint operator in probabilistic hilbert spaceAdjoint operator in probabilistic hilbert space
Adjoint operator in probabilistic hilbert spaceAlexander Decker
 
Geometric and Topological Data Analysis
Geometric and Topological Data AnalysisGeometric and Topological Data Analysis
Geometric and Topological Data AnalysisDon Sheehy
 
AN ARITHMETIC OPERATION ON HEXADECAGONAL FUZZY NUMBER
AN ARITHMETIC OPERATION ON HEXADECAGONAL FUZZY NUMBERAN ARITHMETIC OPERATION ON HEXADECAGONAL FUZZY NUMBER
AN ARITHMETIC OPERATION ON HEXADECAGONAL FUZZY NUMBERijfls
 
Latent Relational Model for Relation Extraction
Latent Relational Model for Relation ExtractionLatent Relational Model for Relation Extraction
Latent Relational Model for Relation ExtractionGaetano Rossiello, PhD
 

What's hot (20)

Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
Pattern-based classification of demographic sequences
Pattern-based classification of demographic sequencesPattern-based classification of demographic sequences
Pattern-based classification of demographic sequences
 
Extracting biclusters of similar values with Triadic Concept Analysis
Extracting biclusters of similar values with Triadic Concept AnalysisExtracting biclusters of similar values with Triadic Concept Analysis
Extracting biclusters of similar values with Triadic Concept Analysis
 
Entropy 19-00079
Entropy 19-00079Entropy 19-00079
Entropy 19-00079
 
QMC: Transition Workshop - Discussion of "Representative Points for Small and...
QMC: Transition Workshop - Discussion of "Representative Points for Small and...QMC: Transition Workshop - Discussion of "Representative Points for Small and...
QMC: Transition Workshop - Discussion of "Representative Points for Small and...
 
A lattice-based consensus clustering
A lattice-based consensus clusteringA lattice-based consensus clustering
A lattice-based consensus clustering
 
Similarity Measures in Formal Concept Analysis
Similarity Measures in Formal Concept AnalysisSimilarity Measures in Formal Concept Analysis
Similarity Measures in Formal Concept Analysis
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
 
Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...
 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscape
 
Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...
 
A sat encoding for solving games with energy objectives
A sat encoding for solving games with energy objectivesA sat encoding for solving games with energy objectives
A sat encoding for solving games with energy objectives
 
AN ARITHMETIC OPERATION ON HEXADECAGONAL FUZZY NUMBER
AN ARITHMETIC OPERATION ON HEXADECAGONAL FUZZY NUMBERAN ARITHMETIC OPERATION ON HEXADECAGONAL FUZZY NUMBER
AN ARITHMETIC OPERATION ON HEXADECAGONAL FUZZY NUMBER
 
Locally consistent concept factorization for
Locally consistent concept factorization forLocally consistent concept factorization for
Locally consistent concept factorization for
 
Adjoint operator in probabilistic hilbert space
Adjoint operator in probabilistic hilbert spaceAdjoint operator in probabilistic hilbert space
Adjoint operator in probabilistic hilbert space
 
Geometric and Topological Data Analysis
Geometric and Topological Data AnalysisGeometric and Topological Data Analysis
Geometric and Topological Data Analysis
 
AN ARITHMETIC OPERATION ON HEXADECAGONAL FUZZY NUMBER
AN ARITHMETIC OPERATION ON HEXADECAGONAL FUZZY NUMBERAN ARITHMETIC OPERATION ON HEXADECAGONAL FUZZY NUMBER
AN ARITHMETIC OPERATION ON HEXADECAGONAL FUZZY NUMBER
 
Latent Relational Model for Relation Extraction
Latent Relational Model for Relation ExtractionLatent Relational Model for Relation Extraction
Latent Relational Model for Relation Extraction
 
Application of transportation problem under pentagonal neutrosophic environment
Application of transportation problem under pentagonal neutrosophic environmentApplication of transportation problem under pentagonal neutrosophic environment
Application of transportation problem under pentagonal neutrosophic environment
 

Similar to On the Mining of Numerical Data with Formal Concept Analysis

Using Consolidated Tabular and Text Data in Business Predictive Analytics
Using Consolidated Tabular and Text Data  in Business Predictive AnalyticsUsing Consolidated Tabular and Text Data  in Business Predictive Analytics
Using Consolidated Tabular and Text Data in Business Predictive AnalyticsBohdan Pavlyshenko
 
Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Mehwish Alam
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overviewdgarijo
 
An Unorthodox View on Memetic Algorithms
An Unorthodox View on Memetic AlgorithmsAn Unorthodox View on Memetic Algorithms
An Unorthodox View on Memetic AlgorithmsNatalio Krasnogor
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...Albert Orriols-Puig
 
THoSP: an Algorithm for Nesting Property Graphs
THoSP: an Algorithm for Nesting Property GraphsTHoSP: an Algorithm for Nesting Property Graphs
THoSP: an Algorithm for Nesting Property GraphsGiacomo Bergami
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...Shubhashis Shil
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewVahid Mirjalili
 
Big data and SP Theory of Intelligence
Big data and SP Theory of IntelligenceBig data and SP Theory of Intelligence
Big data and SP Theory of IntelligenceVarsha Prabhakar
 
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 r-kor
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selectionchenhm
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Anubhav Jain
 

Similar to On the Mining of Numerical Data with Formal Concept Analysis (20)

Interval Pattern Structures: An introdution
Interval Pattern Structures: An introdutionInterval Pattern Structures: An introdution
Interval Pattern Structures: An introdution
 
Characterizing and mining numerical patterns, an FCA point of view
Characterizing and mining numerical patterns, an FCA point of viewCharacterizing and mining numerical patterns, an FCA point of view
Characterizing and mining numerical patterns, an FCA point of view
 
Using Consolidated Tabular and Text Data in Business Predictive Analytics
Using Consolidated Tabular and Text Data  in Business Predictive AnalyticsUsing Consolidated Tabular and Text Data  in Business Predictive Analytics
Using Consolidated Tabular and Text Data in Business Predictive Analytics
 
Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
 
An Unorthodox View on Memetic Algorithms
An Unorthodox View on Memetic AlgorithmsAn Unorthodox View on Memetic Algorithms
An Unorthodox View on Memetic Algorithms
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
 
I017235662
I017235662I017235662
I017235662
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
 
THoSP: an Algorithm for Nesting Property Graphs
THoSP: an Algorithm for Nesting Property GraphsTHoSP: an Algorithm for Nesting Property Graphs
THoSP: an Algorithm for Nesting Property Graphs
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
Big data and SP Theory of Intelligence
Big data and SP Theory of IntelligenceBig data and SP Theory of Intelligence
Big data and SP Theory of Intelligence
 
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
 
3. Monalisha Pattnaik.pdf
3. Monalisha Pattnaik.pdf3. Monalisha Pattnaik.pdf
3. Monalisha Pattnaik.pdf
 
3. Monalisha Pattnaik.pdf
3. Monalisha Pattnaik.pdf3. Monalisha Pattnaik.pdf
3. Monalisha Pattnaik.pdf
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selection
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 

Recently uploaded

Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 

Recently uploaded (20)

Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 

On the Mining of Numerical Data with Formal Concept Analysis

  • 1. On the Mining of Numerical Data with Formal Concept Analysis Th`ese de doctorat en informatique Mehdi Kaytoue 22 April 2011 Amedeo Napoli S´ebastien Duplessis
  • 2. Somewhere... in a temperate forest... 2 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 3. Context A biological problem : How does symbiosis work at the cellular level? Analyse biological processes Find genes involved in symbiosis Choose a model for understanding symbiosis: Laccaria bicolor Analysing Gene Expression Data (GED) F. Martin et al. The Genome of Laccaria Bicolor Provides Insights into Mycorrhizal Symbiosis. In Nature., 2008. 3 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 4. Context Gene expression data (GED) A numerical dataset, or data-table with genes in rows biological situations in columns expression value of a gene in row for the situation in column. A row denotes the expression profile of a gene (GEP) m1 m2 m3 g1 5 7 6 g2 6 8 4 g3 4 8 5 g4 4 9 8 g5 5 8 5 Biological hypothesis A group of genes having a similar expression profile interact to- gether within the same biological process 4 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 5. Context With very large datasets... Gene expression data of Laccaria bicolor 22,294 genes 3 types of biological situations reflecting cells of the organism in various stages of its biological cycle: free living mycelium symbiotic tissues fruiting bodies Attribute values ranged in [0, 65000] 5 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 6. Context Knowledge discovery in databases An iterative and interactive process U. Fayyad, G. Piatetsky-Shapiro and P. Smyth The KDD process for Extracting Useful Knowledge from Volumes of Data. In Commun. ACM., 1996. 6 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 7. Context Mining gene expression data Extracting (maximal) rectangles in numerical data A set of genes co-expressed in some biological situations Local patterns: biological processes may be activated in some situations only Overlapping patterns: a gene may be involved in several biological process m1 m2 m3 m4 m5 g1 1 2 2 1 6 g2 2 1 1 0 6 g3 2 2 1 7 6 g4 8 9 2 6 7 Biclustering: A difficult problem relying on heuristics R. Peeters The Maximum Edge Biclique Problem is NP-Complete. In Discrete Applied Math., vol. 131, no. 3., 2003 7 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 8. Context Core of the thesis Mining gene expression data with formal concept analysis Turning GED into binary, encoding over/under expression Bringing the problem into well-known settings Allowing a complete and mathematically well defined approach Exploiting algorithms and “tools” m1 m2 m3 m4 m5 g1 1 2 2 1 6 g2 2 1 1 5 6 g3 2 2 1 7 6 g4 8 9 2 6 7 ⇒ m1 m2 m3 m4 m5 g1 0 0 0 0 1 g2 0 0 0 0 1 g3 0 0 0 1 1 g4 1 1 0 1 1 Can we work with FCA directly on numerical data? 8 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 9. Context Core of the thesis Mining gene expression data with formal concept analysis Turning GED into binary, encoding over/under expression Bringing the problem into well-known settings Allowing a complete and mathematically well defined approach Exploiting algorithms and “tools” m1 m2 m3 m4 m5 g1 1 2 2 1 6 g2 2 1 1 5 6 g3 2 2 1 7 6 g4 8 9 2 6 7 ⇒ m1 m2 m3 m4 m5 g1 × g2 × g3 × × g4 × × × × Can we work with FCA directly on numerical data? 8 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 10. Context Outline 1 Context 2 Formal Concept Analysis 3 Contributions Interval pattern structures Introducing similarity A KDD-oriented discussion 4 Conclusion and perspectives 9 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 11. Formal Concept Analysis A binary table as a formal context Given by (G, M, I) with G a set of objects M a set of attributes I a binary relation between objects and attributes: (g, m) ∈ I means that “object g owns attribute m” m1 m2 m3 g1 × × g2 × × g3 × × g4 × × g5 × × × G = {g1, . . . , g5} M = {m1, m2, m3} (g1, m3) ∈ I B. Ganter and R. Wille Formal Concept Analysis. In Springer, Mathematical foundations., 1999. 10 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 12. Formal Concept Analysis A maximal rectangle as a formal concept A Galois connection to characterize formal concepts A = {m ∈ M | ∀g ∈ A ⊆ G : (g, m) ∈ I} B = {g ∈ G | ∀m ∈ B ⊆ M : (g, m) ∈ I} (A, B) is a concept with extent A = B and intent B = A {g3} = {m2, m3} {m2, m3} = {g3, g4, g5} m1 m2 m3 g1 × × g2 × × g3 × × g4 × × g5 × × × ({g3, g4, g5}, {m2, m3}) is a formal concept 11 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 13. Formal Concept Analysis Concept lattice Ordered set of concepts... (A1, B1) ≤ (A2, B2) ⇔ A1 ⊆ A2 (⇔ B2 ⊆ B1) ({g1, g5}, {m1, m3}) ≤ ({g1, g2, g5}, {m1}) ... with interesting properties Maximality of concepts as rectangles Overlapping of concepts Specialization/generalisation hierarchy Synthetic representation of the data without loss of information 12 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 14. Formal Concept Analysis Handling numerical data with FCA? Initial problem Extracting groups of genes with similar numerical values Conceptual scaling (discretization or binarization) An object has an attribute if its value lies in a predefined interval m1 m2 m3 g1 5 7 6 g2 6 8 4 g3 4 8 5 g4 4 9 8 g5 5 8 5 m1, [4, 5] m2, [4, 7] m3, [5, 6] g1 × × × g2 g3 × × g4 × g5 × × Different scalings: different interpretations of the data General problem of the thesis How to directly build a concept lattice from numerical data? 13 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 15. 1 Context 2 Formal Concept Analysis 3 Contributions Interval pattern structures Introducing similarity A KDD-oriented discussion 4 Conclusion and perspectives
  • 16. Contributions – Interval pattern structures How to handle complex descriptions An intersection as a similarity operator ∩ behaves as similarity operator {m1, m2} ∩ {m1, m3} = {m1} ∩ induces an ordering relation ⊆ N ∩ O = N ⇐⇒ N ⊆ O {m1} ∩ {m1, m2} = {m1} ⇐⇒ {m1} ⊆ {m1, m2} ∩ has the properties of a meet in a semi lattice, a commutative, associative and idempotent operation c d = c ⇐⇒ c d A. Tversky Features of similarity. In Psychological Review, 84 (4), 1977. 15 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 17. Contributions – Interval pattern structures Pattern structure Given by (G, (D, ), δ) G a set of objects (D, ) a semi-lattice of descriptions or patterns δ a mapping such as δ(g) ∈ D describes object g A Galois connection A = g∈A δ(g) for A ⊆ G d = {g ∈ G|d δ(g)} for d ∈ (D, ) B. Ganter and S. O. Kuznetsov Pattern Structures and their Projections. In International Conference on Conceptual Structures, 2001. 16 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 18. Contributions – Interval pattern structures Numerical data are pattern structures Interval pattern structures m1 m2 m3 g1 5 7 6 g2 6 8 4 g3 4 8 5 g4 4 9 8 g5 5 8 5 {g1, g2} = g∈{g1,g2} δ(g) = 5, 7, 6 6, 8, 4 = [5, 6], [7, 8], [4, 6] [5, 6], [7, 8], [4, 6] = {g ∈ G| [5, 6], [7, 8], [4, 6] δ(g)} = {g1, g2, g5} ({g1, g2, g5}, [5, 6], [7, 8], [4, 6] ) is a (pattern) concept 17 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 19. Contributions – Interval pattern structures Interval pattern concept lattice Lowest concepts: few objects, small intervals Highest concepts: many objects, large intervals Concept/pattern overwhelming 18 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 20. Contributions – Interval pattern structures Links with conceptual scaling Interordinal scaling [Ganter & Wille] A scale to encode intervals of attribute values m1 ≤ 4 m1 ≤ 5 m1 ≤ 6 m1 ≥ 4 m1 ≥ 5 m1 ≥ 6 4 × × × × 5 × × × × 6 × × × × Equivalent concept lattice Example ({g1, g2, g5}, {m1 ≤ 6, m1 ≥ 4, m1 ≥ 5, ... , ... }) ({g1, g2, g5}, [5, 6] , ... , ... ) Why should we use pattern structures as we have scaling? Processing a pattern structure is more efficient 19 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 21. Contributions – Introducing similarity Outline 1 Context 2 Formal Concept Analysis 3 Contributions Interval pattern structures Introducing similarity A KDD-oriented discussion 4 Conclusion and perspectives 20 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 22. Contributions – Introducing similarity Introducing a similarity relation Grouping in a same concept objects having similar values? A natural similarity relation on numbers a θ b ⇔ |a − b| ≤ θ e.g. 4 1 5 4 1 6 Similarity operator in pattern structures 4 5 6 [4,5] [5,6] [4,6] How to consider a similarity relation w.r.t. a distance? 21 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 23. Contributions – Introducing similarity Introducing a similarity relation Grouping in a same concept objects having similar values? A natural similarity relation on numbers a θ b ⇔ |a − b| ≤ θ e.g. 4 1 5 4 1 6 Similarity operator in pattern structures θ = 2 4 5 6 [4,5] [5,6] [4,6] How to consider a similarity relation w.r.t. a distance? 21 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 24. Contributions – Introducing similarity Introducing a similarity relation Grouping in a same concept objects having similar values? A natural similarity relation on numbers a θ b ⇔ |a − b| ≤ θ e.g. 4 1 5 4 1 6 Similarity operator in pattern structures θ = 1 4 5 6 [4,5] [5,6] [4,6] How to consider a similarity relation w.r.t. a distance? 21 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 25. Contributions – Introducing similarity Introducing a similarity relation Grouping in a same concept objects having similar values? A natural similarity relation on numbers a θ b ⇔ |a − b| ≤ θ e.g. 4 1 5 4 1 6 Similarity operator in pattern structures θ = 04 5 6 [4,5] [5,6] [4,6] How to consider a similarity relation w.r.t. a distance? 21 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 26. Contributions – Introducing similarity Towards a similarity between values Introduce an element ∗ ∈ (D, ) denoting dissimilarity c d = ∗ iff c θ d c d = ∗ iff c θ d Example with θ = 1 m1 m2 m3 g1 5 7 6 g2 6 8 4 g3 4 8 5 g4 4 9 8 g5 5 8 5 {g3, g4} = [4, 4], [8, 9], ∗ [4, 4], [8, 9], ∗ = {g3, g4} ({g3, g4}, [4, 4], [8, 9], ∗ ) is a concept: g3 and g4 have similar values for attributes m1 and m2 only 22 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 27. Contributions – Introducing similarity Towards a similarity between values Introduce an element ∗ ∈ (D, ) denoting dissimilarity c d = ∗ iff c θ d c d = ∗ iff c θ d Example with θ = 1 m1 m2 m3 g1 5 7 6 g2 6 8 4 g3 4 8 5 g4 4 9 8 g5 5 8 5 {g3, g4} = [4, 4], [8, 9], ∗ [4, 4], [8, 9], ∗ = {g3, g4} ({g3, g4}, [4, 4], [8, 9], ∗ ) is a concept: g3 and g4 have similar values for attributes m1 and m2 only Is {g3, g4} maximal w.r.t. similarity? We can add g5... 22 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 28. Contributions – Introducing similarity Classes of tolerance in numerical data Towards maximal sets of similar values θ a tolerance relation : reflexive, symmetric, not transitive Consider an attribute taking values in {6, 8, 11, 16, 17} and θ = 5 8 5 11, 11 5 16 but 8 5 16 A class of tolerance as a maximal set of pairwise similar values {6, 8, 11} {11, 16} {16, 17} [6, 11] [11, 16] [16, 17] S. O. Kuznetsov Galois Connections in Data Analysis: Contributions from the Soviet Era and Modern Russian Research. In Formal Concept Analysis, Foundations and Applications, 2005. 23 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 29. Contributions – Introducing similarity Tolerance in pattern structures Projecting the pattern structure Each value is replaced by the interval characterizing its class of tolerance (if unique) Each pattern d is projected with a mapping ψ(d) d (pre-processing) Example with θ = 1 m1 m2 m3 g1 5 7 6 g2 6 8 4 g3 4 8 5 g4 4 9 8 g5 5 8 5 {g3, g4} = ψ( [4, 4], [8, 9], ∗ ) = [4, 5], [8, 9], ∗ [4, 5], [8, 9], ∗ = {g3, g4, g5} 24 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 30. Contributions – Introducing similarity Biological results An extracted pattern among 2, 150 others Genes present a high expression level in the fruit-body situations Some of these genes encode metabolic enzymes in remobilization of fungal resources towards the new organ in development Other genes are unknown but specific to Laccaria Bicolor: it requires biological experiments 25 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 31. Contributions – Introducing similarity Relevant publications Interval pattern structures and GED analysis M. Kaytoue, S. Duplessis, S. O. Kuznetsov, and A. Napoli Two FCA-Based Methods for Mining Gene Expression Data. In International Conference on Formal Concept Analysis (ICFCA), 2009. M. Kaytoue, S. O. Kuznetsov, A. Napoli and S. Duplessis Mining Gene Expression Data with Pattern Structures in Formal Concept Analysis. In Information Sciences. Spec. Iss.: Lattices (Elsevier), 2011. Introducing tolerance relations and information fusion M. Kaytoue, Z. Assaghir, N. Messai and A. Napoli Two Complementary Classification Methods for Designing a Concept Lattice from Interval Data. In Foundations of Information and Knowledge Systems, 6th International Symposium (FoIKS), 2010. M. Kaytoue, Z. Assaghir, A. Napoli and S. O. Kuznetsov Embedding Tolerance Relations in Formal Concept Analysis: an Application in Information Fusion. In ACM Conference on Information and Knowledge Management (CIKM), 2010. 26 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 32. Contributions – Other works Pattern structures are useful for several tasks Bi-clustering and tolerance relations M. Kaytoue, S. O. Kuznetsov, and A. Napoli Biclustering Numerical Data in Formal Concept Analysis. In International Conference on Formal Concept Analysis (ICFCA), 2011. Information fusion: enhancing decision making Z. Assaghir, M. Kaytoue, A. Napoli and H. Prade Managing Information Fusion with Formal Concept Analysis. In Modeling Decisions for Artificial Intelligence, 6th International Conference (MDAI), 2010. KDD: a study of equivalence classes of interval patterns M. Kaytoue, S. O. Kuznetsov, and A. Napoli Revisiting Numerical Pattern Mining with Formal Concept Analysis. In International Joint Conference on Artificial Intelligence (IJCAI), 2011. 27 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 33. Contributions – A KDD-oriented discussion Outline 1 Context 2 Formal Concept Analysis 3 Contributions Interval pattern structures Introducing similarity A KDD-oriented discussion 4 Conclusion and perspectives 28 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 34. Contributions – A KDD-oriented discussion Interval pattern search space Counting all possible interval patterns [am1 , bm1 ], [am2 , bm2 ], ... where ami , bmi ∈ Wmi m1 m2 m3 g1 5 7 6 g2 6 8 4 g3 4 8 5 g4 4 9 8 g5 5 8 5 i∈{1,...,|M|} |Wmi | × (|Wmi | + 1) 2 360 possible interval patterns in our small example 29 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 35. Contributions – A KDD-oriented discussion Semantics for interval patterns Interval patterns as (hyper) rectangles m1 m3 g1 5 6 g2 6 4 g3 4 5 g4 4 8 g5 5 5 3 4 5 6 7 8 3 4 5 6 m1 m3 δ(g1) δ(g2) δ(g3) δ(g4) δ(g5) 30 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 36. Contributions – A KDD-oriented discussion Semantics for interval patterns Interval patterns as (hyper) rectangles m1 m3 g1 5 6 g2 6 4 g3 4 5 g4 4 8 g5 5 5 [4, 5], [5, 6] = {g1, g3, g5} 3 4 5 6 7 8 3 4 5 6 m1 m3 δ(g1) δ(g2) δ(g3) δ(g4) δ(g5) 30 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 37. Contributions – A KDD-oriented discussion Semantics for interval patterns Interval patterns as (hyper) rectangles m1 m3 g1 5 6 g2 6 4 g3 4 5 g4 4 8 g5 5 5 [4, 5], [5, 6] = {g1, g3, g5} [4, 5], [5, 7] = {g1, g3, g5} 3 4 5 6 7 8 3 4 5 6 m1 m3 δ(g1) δ(g2) δ(g3) δ(g4) δ(g5) 30 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 38. Contributions – A KDD-oriented discussion Semantics for interval patterns Interval patterns as (hyper) rectangles m1 m3 g1 5 6 g2 6 4 g3 4 5 g4 4 8 g5 5 5 [4, 5], [5, 6] = {g1, g3, g5} [4, 5], [5, 7] = {g1, g3, g5} [4, 6], [5, 6] = {g1, g3, g5} 3 4 5 6 7 8 3 4 5 6 m1 m3 δ(g1) δ(g2) δ(g3) δ(g4) δ(g5) 30 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 39. Contributions – A KDD-oriented discussion Semantics for interval patterns Interval patterns as (hyper) rectangles m1 m3 g1 5 6 g2 6 4 g3 4 5 g4 4 8 g5 5 5 [4, 5], [5, 6] = {g1, g3, g5} [4, 5], [5, 7] = {g1, g3, g5} [4, 6], [5, 6] = {g1, g3, g5} [4, 5], [4, 6] = {g1, g3, g5} 3 4 5 6 7 8 3 4 5 6 m1 m3 δ(g1) δ(g2) δ(g3) δ(g4) δ(g5) 30 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 40. Contributions – A KDD-oriented discussion Semantics for interval patterns Interval patterns as (hyper) rectangles m1 m3 g1 5 6 g2 6 4 g3 4 5 g4 4 8 g5 5 5 [4, 5], [5, 6] = {g1, g3, g5} [4, 5], [5, 7] = {g1, g3, g5} [4, 6], [5, 6] = {g1, g3, g5} [4, 5], [4, 6] = {g1, g3, g5} [4, 6], [5, 7] = {g1, g3, g5} 3 4 5 6 7 8 3 4 5 6 m1 m3 δ(g1) δ(g2) δ(g3) δ(g4) δ(g5) 30 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 41. Contributions – A KDD-oriented discussion Semantics for interval patterns Interval patterns as (hyper) rectangles m1 m3 g1 5 6 g2 6 4 g3 4 5 g4 4 8 g5 5 5 [4, 5], [5, 6] = {g1, g3, g5} [4, 5], [5, 7] = {g1, g3, g5} [4, 6], [5, 6] = {g1, g3, g5} [4, 5], [4, 6] = {g1, g3, g5} [4, 6], [5, 7] = {g1, g3, g5} [4, 5], [4, 7] = {g1, g3, g5} 3 4 5 6 7 8 3 4 5 6 m1 m3 δ(g1) δ(g2) δ(g3) δ(g4) δ(g5) 30 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 42. Contributions – A KDD-oriented discussion A condensed representation Equivalence classes of interval patterns Two interval patterns with same image are said to be equivalent c ∼= d ⇐⇒ c = d Equivalence class of a pattern d [d] = {c|c ∼= d} with a unique closed pattern: the smallest rectangle and one or several generators: the largest rectangles Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal. Mining frequent patterns with counting inference. SIGKDD Expl., 2(2):66–75, 2000. In our example: 360 patterns ; 18 closed ; 44 generators 31 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 43. Contributions – A KDD-oriented discussion Algorithms & experiments Algorithms: MintIntChange, MinIntChangeG[t|h] 4 5 6 [4,5] [5,6] [4,6] Experiments Mining several datasets from Bilkent University Repository Compression rate varies between 107 and 109 Interordinal scaling: encodes 30.000 binary patterns not efficient even with best algorithms (e.g. LCMv2) redundancy problem discarding its use for generator extraction 32 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 44. Contributions – A KDD-oriented discussion Algorithms & experiments Algorithms: MintIntChange, MinIntChangeG[t|h] 4 5 6 [4,5] [5,6] [4,6] Experiments Mining several datasets from Bilkent University Repository Compression rate varies between 107 and 109 Interordinal scaling: encodes 30.000 binary patterns not efficient even with best algorithms (e.g. LCMv2) redundancy problem discarding its use for generator extraction 32 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 45. Contributions – A KDD-oriented discussion Discussion Advantages Minimum description length principle favours generators Potential applications Data privacy and k-anonymisation k-box problem in computational geometry Quantitative association rule mining Data summarization Problem With very large data set, compression is not enough Numerical data are noisy Need of fault-tolerant condensed representations 33 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 46. 1 Context 2 Formal Concept Analysis 3 Contributions Interval pattern structures Introducing similarity A KDD-oriented discussion 4 Conclusion and perspectives
  • 47. Conclusion and perspectives Conclusion A new insight for the mining numerical data Our main tools... Formal Concept Analysis and conceptual scaling Pattern structures and projections Tolerance relation ... for numerical data mining Conceptual representations of numerical data Bi-clustering Information fusion Applications: GED analysis and agricultural practice assessment 35 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 48. Conclusion and perspectives Conclusion An application in GED analysis With FCA and pattern structures Many ways of extracting patterns in GED Biological validation of several patterns We now need a systematic validation step using new knowledge transcription factors biological knowledge base, e.g. Gene Ontology 36 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 49. Conclusion and perspectives To be continued... Short- and mid- term Handle other types of biclusters and algorithm comparison S. C. Madeira and A. L. Oliveira Biclustering Algorithms for Biological Data Analysis: a survey. In IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004. Insert domain knowledge for biological data Study threshold θ effect w.r.t. the number of tolerance classes Post-doctoral position Biclustering (multi-dimensional) numerical data Numerical pattern based classifier and association rules Data privacy and pattern projection Wagner Jr. Meira (Universidade Federal de Minas Gerais, Brasil) 37 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 50. Conclusion and perspectives Cross-domain fertilization Itemset-mining in KDD Other frameworks for closed patterns H. Arimura and T. Uno Polynomial-Delay and Polynomial-Space Algorithms for Mining Closed Sequences, Graphs, and Pictures in Accessible Set Systems. In SIAM International Conference on Data Mining, 2009. G.C. Garriga Formal Methods for Mining Structured Objects. PhD Thesis, Universitat Polit`ecnica de Catalunya, 2006 Condensed representations and fault-tolerant patterns m1 m2 m3 g1 5 7 6 g2 6 8 4 g3 4 8 5 g4 4 9 8 g5 15 8 5 R. Pensa and J.-F. Boulicaut Towards Fault-Tolerant Formal Concept Analysis. In Proc. 9th Congress of the Italian Association for Artificial Intelligence (AI*IA), Springer, 2005. 38 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 51. Conclusion and perspectives Cross-domain fertilization Data-analysis Symbolic data analysis and distances P. Agarwal, M. Kaytoue, S. O. Kuznetsov, A. Napoli and G. Polaillon Symbolic Galois Lattices with Pattern Structures. In International Conference on Rough Sets, Fuzzy Sets, Data-mining and Granularity Computing (RSFDGrC), 2011. Information fusion and fuzzy concept analysis Fuzzy settings and possibility theory Z. Assaghir, M. Kaytoue, and H. Prade A Possibility Theory Oriented Discussion of Conceptual Pattern Ptructures. In Scalable Uncertainty Management, 4th International Conference (SUM), 2010. 39 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 52. Merci Danke sch¨on Spasibo 40 / 40 On the Mining of Numerical Data with Formal Concept Analysis