Your SlideShare is downloading. ×
0
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

On the Mining of Numerical Data with Formal Concept Analysis

266

Published on

PhD Dissertation Talk, 22 April 2011 …

PhD Dissertation Talk, 22 April 2011
----
The main topic of this thesis addresses the important problem of mining numerical data, and especially gene expression data. These data characterize the behaviour of thousand of genes in various biological situations (time, cell, etc.).
A difficult task consists in clustering genes to obtain classes of genes with similar behaviour, supposed to be involved together within a biological process.
Accordingly, we are interested in designing and comparing methods in the field of knowledge discovery from biological data. We propose to study how the conceptual classification method called Formal Concept Analysis (FCA) can handle the problem of extracting interesting classes of genes. For this purpose, we have designed and experimented several original methods based on an extension of FCA called pattern structures. Furthermore, we show that these methods can enhance decision making in agronomy and crop sanity in the vast formal domain of information fusion.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
266
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. On the Mining of Numerical Data with Formal Concept Analysis Th`ese de doctorat en informatique Mehdi Kaytoue 22 April 2011 Amedeo Napoli S´ebastien Duplessis
  • 2. Somewhere... in a temperate forest... 2 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 3. Context A biological problem : How does symbiosis work at the cellular level? Analyse biological processes Find genes involved in symbiosis Choose a model for understanding symbiosis: Laccaria bicolor Analysing Gene Expression Data (GED) F. Martin et al. The Genome of Laccaria Bicolor Provides Insights into Mycorrhizal Symbiosis. In Nature., 2008. 3 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 4. Context Gene expression data (GED) A numerical dataset, or data-table with genes in rows biological situations in columns expression value of a gene in row for the situation in column. A row denotes the expression profile of a gene (GEP) m1 m2 m3 g1 5 7 6 g2 6 8 4 g3 4 8 5 g4 4 9 8 g5 5 8 5 Biological hypothesis A group of genes having a similar expression profile interact to- gether within the same biological process 4 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 5. Context With very large datasets... Gene expression data of Laccaria bicolor 22,294 genes 3 types of biological situations reflecting cells of the organism in various stages of its biological cycle: free living mycelium symbiotic tissues fruiting bodies Attribute values ranged in [0, 65000] 5 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 6. Context Knowledge discovery in databases An iterative and interactive process U. Fayyad, G. Piatetsky-Shapiro and P. Smyth The KDD process for Extracting Useful Knowledge from Volumes of Data. In Commun. ACM., 1996. 6 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 7. Context Mining gene expression data Extracting (maximal) rectangles in numerical data A set of genes co-expressed in some biological situations Local patterns: biological processes may be activated in some situations only Overlapping patterns: a gene may be involved in several biological process m1 m2 m3 m4 m5 g1 1 2 2 1 6 g2 2 1 1 0 6 g3 2 2 1 7 6 g4 8 9 2 6 7 Biclustering: A difficult problem relying on heuristics R. Peeters The Maximum Edge Biclique Problem is NP-Complete. In Discrete Applied Math., vol. 131, no. 3., 2003 7 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 8. Context Core of the thesis Mining gene expression data with formal concept analysis Turning GED into binary, encoding over/under expression Bringing the problem into well-known settings Allowing a complete and mathematically well defined approach Exploiting algorithms and “tools” m1 m2 m3 m4 m5 g1 1 2 2 1 6 g2 2 1 1 5 6 g3 2 2 1 7 6 g4 8 9 2 6 7 ⇒ m1 m2 m3 m4 m5 g1 0 0 0 0 1 g2 0 0 0 0 1 g3 0 0 0 1 1 g4 1 1 0 1 1 Can we work with FCA directly on numerical data? 8 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 9. Context Core of the thesis Mining gene expression data with formal concept analysis Turning GED into binary, encoding over/under expression Bringing the problem into well-known settings Allowing a complete and mathematically well defined approach Exploiting algorithms and “tools” m1 m2 m3 m4 m5 g1 1 2 2 1 6 g2 2 1 1 5 6 g3 2 2 1 7 6 g4 8 9 2 6 7 ⇒ m1 m2 m3 m4 m5 g1 × g2 × g3 × × g4 × × × × Can we work with FCA directly on numerical data? 8 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 10. Context Outline 1 Context 2 Formal Concept Analysis 3 Contributions Interval pattern structures Introducing similarity A KDD-oriented discussion 4 Conclusion and perspectives 9 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 11. Formal Concept Analysis A binary table as a formal context Given by (G, M, I) with G a set of objects M a set of attributes I a binary relation between objects and attributes: (g, m) ∈ I means that “object g owns attribute m” m1 m2 m3 g1 × × g2 × × g3 × × g4 × × g5 × × × G = {g1, . . . , g5} M = {m1, m2, m3} (g1, m3) ∈ I B. Ganter and R. Wille Formal Concept Analysis. In Springer, Mathematical foundations., 1999. 10 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 12. Formal Concept Analysis A maximal rectangle as a formal concept A Galois connection to characterize formal concepts A = {m ∈ M | ∀g ∈ A ⊆ G : (g, m) ∈ I} B = {g ∈ G | ∀m ∈ B ⊆ M : (g, m) ∈ I} (A, B) is a concept with extent A = B and intent B = A {g3} = {m2, m3} {m2, m3} = {g3, g4, g5} m1 m2 m3 g1 × × g2 × × g3 × × g4 × × g5 × × × ({g3, g4, g5}, {m2, m3}) is a formal concept 11 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 13. Formal Concept Analysis Concept lattice Ordered set of concepts... (A1, B1) ≤ (A2, B2) ⇔ A1 ⊆ A2 (⇔ B2 ⊆ B1) ({g1, g5}, {m1, m3}) ≤ ({g1, g2, g5}, {m1}) ... with interesting properties Maximality of concepts as rectangles Overlapping of concepts Specialization/generalisation hierarchy Synthetic representation of the data without loss of information 12 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 14. Formal Concept Analysis Handling numerical data with FCA? Initial problem Extracting groups of genes with similar numerical values Conceptual scaling (discretization or binarization) An object has an attribute if its value lies in a predefined interval m1 m2 m3 g1 5 7 6 g2 6 8 4 g3 4 8 5 g4 4 9 8 g5 5 8 5 m1, [4, 5] m2, [4, 7] m3, [5, 6] g1 × × × g2 g3 × × g4 × g5 × × Different scalings: different interpretations of the data General problem of the thesis How to directly build a concept lattice from numerical data? 13 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 15. 1 Context 2 Formal Concept Analysis 3 Contributions Interval pattern structures Introducing similarity A KDD-oriented discussion 4 Conclusion and perspectives
  • 16. Contributions – Interval pattern structures How to handle complex descriptions An intersection as a similarity operator ∩ behaves as similarity operator {m1, m2} ∩ {m1, m3} = {m1} ∩ induces an ordering relation ⊆ N ∩ O = N ⇐⇒ N ⊆ O {m1} ∩ {m1, m2} = {m1} ⇐⇒ {m1} ⊆ {m1, m2} ∩ has the properties of a meet in a semi lattice, a commutative, associative and idempotent operation c d = c ⇐⇒ c d A. Tversky Features of similarity. In Psychological Review, 84 (4), 1977. 15 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 17. Contributions – Interval pattern structures Pattern structure Given by (G, (D, ), δ) G a set of objects (D, ) a semi-lattice of descriptions or patterns δ a mapping such as δ(g) ∈ D describes object g A Galois connection A = g∈A δ(g) for A ⊆ G d = {g ∈ G|d δ(g)} for d ∈ (D, ) B. Ganter and S. O. Kuznetsov Pattern Structures and their Projections. In International Conference on Conceptual Structures, 2001. 16 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 18. Contributions – Interval pattern structures Numerical data are pattern structures Interval pattern structures m1 m2 m3 g1 5 7 6 g2 6 8 4 g3 4 8 5 g4 4 9 8 g5 5 8 5 {g1, g2} = g∈{g1,g2} δ(g) = 5, 7, 6 6, 8, 4 = [5, 6], [7, 8], [4, 6] [5, 6], [7, 8], [4, 6] = {g ∈ G| [5, 6], [7, 8], [4, 6] δ(g)} = {g1, g2, g5} ({g1, g2, g5}, [5, 6], [7, 8], [4, 6] ) is a (pattern) concept 17 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 19. Contributions – Interval pattern structures Interval pattern concept lattice Lowest concepts: few objects, small intervals Highest concepts: many objects, large intervals Concept/pattern overwhelming 18 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 20. Contributions – Interval pattern structures Links with conceptual scaling Interordinal scaling [Ganter & Wille] A scale to encode intervals of attribute values m1 ≤ 4 m1 ≤ 5 m1 ≤ 6 m1 ≥ 4 m1 ≥ 5 m1 ≥ 6 4 × × × × 5 × × × × 6 × × × × Equivalent concept lattice Example ({g1, g2, g5}, {m1 ≤ 6, m1 ≥ 4, m1 ≥ 5, ... , ... }) ({g1, g2, g5}, [5, 6] , ... , ... ) Why should we use pattern structures as we have scaling? Processing a pattern structure is more efficient 19 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 21. Contributions – Introducing similarity Outline 1 Context 2 Formal Concept Analysis 3 Contributions Interval pattern structures Introducing similarity A KDD-oriented discussion 4 Conclusion and perspectives 20 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 22. Contributions – Introducing similarity Introducing a similarity relation Grouping in a same concept objects having similar values? A natural similarity relation on numbers a θ b ⇔ |a − b| ≤ θ e.g. 4 1 5 4 1 6 Similarity operator in pattern structures 4 5 6 [4,5] [5,6] [4,6] How to consider a similarity relation w.r.t. a distance? 21 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 23. Contributions – Introducing similarity Introducing a similarity relation Grouping in a same concept objects having similar values? A natural similarity relation on numbers a θ b ⇔ |a − b| ≤ θ e.g. 4 1 5 4 1 6 Similarity operator in pattern structures θ = 2 4 5 6 [4,5] [5,6] [4,6] How to consider a similarity relation w.r.t. a distance? 21 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 24. Contributions – Introducing similarity Introducing a similarity relation Grouping in a same concept objects having similar values? A natural similarity relation on numbers a θ b ⇔ |a − b| ≤ θ e.g. 4 1 5 4 1 6 Similarity operator in pattern structures θ = 1 4 5 6 [4,5] [5,6] [4,6] How to consider a similarity relation w.r.t. a distance? 21 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 25. Contributions – Introducing similarity Introducing a similarity relation Grouping in a same concept objects having similar values? A natural similarity relation on numbers a θ b ⇔ |a − b| ≤ θ e.g. 4 1 5 4 1 6 Similarity operator in pattern structures θ = 04 5 6 [4,5] [5,6] [4,6] How to consider a similarity relation w.r.t. a distance? 21 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 26. Contributions – Introducing similarity Towards a similarity between values Introduce an element ∗ ∈ (D, ) denoting dissimilarity c d = ∗ iff c θ d c d = ∗ iff c θ d Example with θ = 1 m1 m2 m3 g1 5 7 6 g2 6 8 4 g3 4 8 5 g4 4 9 8 g5 5 8 5 {g3, g4} = [4, 4], [8, 9], ∗ [4, 4], [8, 9], ∗ = {g3, g4} ({g3, g4}, [4, 4], [8, 9], ∗ ) is a concept: g3 and g4 have similar values for attributes m1 and m2 only 22 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 27. Contributions – Introducing similarity Towards a similarity between values Introduce an element ∗ ∈ (D, ) denoting dissimilarity c d = ∗ iff c θ d c d = ∗ iff c θ d Example with θ = 1 m1 m2 m3 g1 5 7 6 g2 6 8 4 g3 4 8 5 g4 4 9 8 g5 5 8 5 {g3, g4} = [4, 4], [8, 9], ∗ [4, 4], [8, 9], ∗ = {g3, g4} ({g3, g4}, [4, 4], [8, 9], ∗ ) is a concept: g3 and g4 have similar values for attributes m1 and m2 only Is {g3, g4} maximal w.r.t. similarity? We can add g5... 22 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 28. Contributions – Introducing similarity Classes of tolerance in numerical data Towards maximal sets of similar values θ a tolerance relation : reflexive, symmetric, not transitive Consider an attribute taking values in {6, 8, 11, 16, 17} and θ = 5 8 5 11, 11 5 16 but 8 5 16 A class of tolerance as a maximal set of pairwise similar values {6, 8, 11} {11, 16} {16, 17} [6, 11] [11, 16] [16, 17] S. O. Kuznetsov Galois Connections in Data Analysis: Contributions from the Soviet Era and Modern Russian Research. In Formal Concept Analysis, Foundations and Applications, 2005. 23 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 29. Contributions – Introducing similarity Tolerance in pattern structures Projecting the pattern structure Each value is replaced by the interval characterizing its class of tolerance (if unique) Each pattern d is projected with a mapping ψ(d) d (pre-processing) Example with θ = 1 m1 m2 m3 g1 5 7 6 g2 6 8 4 g3 4 8 5 g4 4 9 8 g5 5 8 5 {g3, g4} = ψ( [4, 4], [8, 9], ∗ ) = [4, 5], [8, 9], ∗ [4, 5], [8, 9], ∗ = {g3, g4, g5} 24 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 30. Contributions – Introducing similarity Biological results An extracted pattern among 2, 150 others Genes present a high expression level in the fruit-body situations Some of these genes encode metabolic enzymes in remobilization of fungal resources towards the new organ in development Other genes are unknown but specific to Laccaria Bicolor: it requires biological experiments 25 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 31. Contributions – Introducing similarity Relevant publications Interval pattern structures and GED analysis M. Kaytoue, S. Duplessis, S. O. Kuznetsov, and A. Napoli Two FCA-Based Methods for Mining Gene Expression Data. In International Conference on Formal Concept Analysis (ICFCA), 2009. M. Kaytoue, S. O. Kuznetsov, A. Napoli and S. Duplessis Mining Gene Expression Data with Pattern Structures in Formal Concept Analysis. In Information Sciences. Spec. Iss.: Lattices (Elsevier), 2011. Introducing tolerance relations and information fusion M. Kaytoue, Z. Assaghir, N. Messai and A. Napoli Two Complementary Classification Methods for Designing a Concept Lattice from Interval Data. In Foundations of Information and Knowledge Systems, 6th International Symposium (FoIKS), 2010. M. Kaytoue, Z. Assaghir, A. Napoli and S. O. Kuznetsov Embedding Tolerance Relations in Formal Concept Analysis: an Application in Information Fusion. In ACM Conference on Information and Knowledge Management (CIKM), 2010. 26 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 32. Contributions – Other works Pattern structures are useful for several tasks Bi-clustering and tolerance relations M. Kaytoue, S. O. Kuznetsov, and A. Napoli Biclustering Numerical Data in Formal Concept Analysis. In International Conference on Formal Concept Analysis (ICFCA), 2011. Information fusion: enhancing decision making Z. Assaghir, M. Kaytoue, A. Napoli and H. Prade Managing Information Fusion with Formal Concept Analysis. In Modeling Decisions for Artificial Intelligence, 6th International Conference (MDAI), 2010. KDD: a study of equivalence classes of interval patterns M. Kaytoue, S. O. Kuznetsov, and A. Napoli Revisiting Numerical Pattern Mining with Formal Concept Analysis. In International Joint Conference on Artificial Intelligence (IJCAI), 2011. 27 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 33. Contributions – A KDD-oriented discussion Outline 1 Context 2 Formal Concept Analysis 3 Contributions Interval pattern structures Introducing similarity A KDD-oriented discussion 4 Conclusion and perspectives 28 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 34. Contributions – A KDD-oriented discussion Interval pattern search space Counting all possible interval patterns [am1 , bm1 ], [am2 , bm2 ], ... where ami , bmi ∈ Wmi m1 m2 m3 g1 5 7 6 g2 6 8 4 g3 4 8 5 g4 4 9 8 g5 5 8 5 i∈{1,...,|M|} |Wmi | × (|Wmi | + 1) 2 360 possible interval patterns in our small example 29 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 35. Contributions – A KDD-oriented discussion Semantics for interval patterns Interval patterns as (hyper) rectangles m1 m3 g1 5 6 g2 6 4 g3 4 5 g4 4 8 g5 5 5 3 4 5 6 7 8 3 4 5 6 m1 m3 δ(g1) δ(g2) δ(g3) δ(g4) δ(g5) 30 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 36. Contributions – A KDD-oriented discussion Semantics for interval patterns Interval patterns as (hyper) rectangles m1 m3 g1 5 6 g2 6 4 g3 4 5 g4 4 8 g5 5 5 [4, 5], [5, 6] = {g1, g3, g5} 3 4 5 6 7 8 3 4 5 6 m1 m3 δ(g1) δ(g2) δ(g3) δ(g4) δ(g5) 30 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 37. Contributions – A KDD-oriented discussion Semantics for interval patterns Interval patterns as (hyper) rectangles m1 m3 g1 5 6 g2 6 4 g3 4 5 g4 4 8 g5 5 5 [4, 5], [5, 6] = {g1, g3, g5} [4, 5], [5, 7] = {g1, g3, g5} 3 4 5 6 7 8 3 4 5 6 m1 m3 δ(g1) δ(g2) δ(g3) δ(g4) δ(g5) 30 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 38. Contributions – A KDD-oriented discussion Semantics for interval patterns Interval patterns as (hyper) rectangles m1 m3 g1 5 6 g2 6 4 g3 4 5 g4 4 8 g5 5 5 [4, 5], [5, 6] = {g1, g3, g5} [4, 5], [5, 7] = {g1, g3, g5} [4, 6], [5, 6] = {g1, g3, g5} 3 4 5 6 7 8 3 4 5 6 m1 m3 δ(g1) δ(g2) δ(g3) δ(g4) δ(g5) 30 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 39. Contributions – A KDD-oriented discussion Semantics for interval patterns Interval patterns as (hyper) rectangles m1 m3 g1 5 6 g2 6 4 g3 4 5 g4 4 8 g5 5 5 [4, 5], [5, 6] = {g1, g3, g5} [4, 5], [5, 7] = {g1, g3, g5} [4, 6], [5, 6] = {g1, g3, g5} [4, 5], [4, 6] = {g1, g3, g5} 3 4 5 6 7 8 3 4 5 6 m1 m3 δ(g1) δ(g2) δ(g3) δ(g4) δ(g5) 30 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 40. Contributions – A KDD-oriented discussion Semantics for interval patterns Interval patterns as (hyper) rectangles m1 m3 g1 5 6 g2 6 4 g3 4 5 g4 4 8 g5 5 5 [4, 5], [5, 6] = {g1, g3, g5} [4, 5], [5, 7] = {g1, g3, g5} [4, 6], [5, 6] = {g1, g3, g5} [4, 5], [4, 6] = {g1, g3, g5} [4, 6], [5, 7] = {g1, g3, g5} 3 4 5 6 7 8 3 4 5 6 m1 m3 δ(g1) δ(g2) δ(g3) δ(g4) δ(g5) 30 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 41. Contributions – A KDD-oriented discussion Semantics for interval patterns Interval patterns as (hyper) rectangles m1 m3 g1 5 6 g2 6 4 g3 4 5 g4 4 8 g5 5 5 [4, 5], [5, 6] = {g1, g3, g5} [4, 5], [5, 7] = {g1, g3, g5} [4, 6], [5, 6] = {g1, g3, g5} [4, 5], [4, 6] = {g1, g3, g5} [4, 6], [5, 7] = {g1, g3, g5} [4, 5], [4, 7] = {g1, g3, g5} 3 4 5 6 7 8 3 4 5 6 m1 m3 δ(g1) δ(g2) δ(g3) δ(g4) δ(g5) 30 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 42. Contributions – A KDD-oriented discussion A condensed representation Equivalence classes of interval patterns Two interval patterns with same image are said to be equivalent c ∼= d ⇐⇒ c = d Equivalence class of a pattern d [d] = {c|c ∼= d} with a unique closed pattern: the smallest rectangle and one or several generators: the largest rectangles Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal. Mining frequent patterns with counting inference. SIGKDD Expl., 2(2):66–75, 2000. In our example: 360 patterns ; 18 closed ; 44 generators 31 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 43. Contributions – A KDD-oriented discussion Algorithms & experiments Algorithms: MintIntChange, MinIntChangeG[t|h] 4 5 6 [4,5] [5,6] [4,6] Experiments Mining several datasets from Bilkent University Repository Compression rate varies between 107 and 109 Interordinal scaling: encodes 30.000 binary patterns not efficient even with best algorithms (e.g. LCMv2) redundancy problem discarding its use for generator extraction 32 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 44. Contributions – A KDD-oriented discussion Algorithms & experiments Algorithms: MintIntChange, MinIntChangeG[t|h] 4 5 6 [4,5] [5,6] [4,6] Experiments Mining several datasets from Bilkent University Repository Compression rate varies between 107 and 109 Interordinal scaling: encodes 30.000 binary patterns not efficient even with best algorithms (e.g. LCMv2) redundancy problem discarding its use for generator extraction 32 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 45. Contributions – A KDD-oriented discussion Discussion Advantages Minimum description length principle favours generators Potential applications Data privacy and k-anonymisation k-box problem in computational geometry Quantitative association rule mining Data summarization Problem With very large data set, compression is not enough Numerical data are noisy Need of fault-tolerant condensed representations 33 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 46. 1 Context 2 Formal Concept Analysis 3 Contributions Interval pattern structures Introducing similarity A KDD-oriented discussion 4 Conclusion and perspectives
  • 47. Conclusion and perspectives Conclusion A new insight for the mining numerical data Our main tools... Formal Concept Analysis and conceptual scaling Pattern structures and projections Tolerance relation ... for numerical data mining Conceptual representations of numerical data Bi-clustering Information fusion Applications: GED analysis and agricultural practice assessment 35 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 48. Conclusion and perspectives Conclusion An application in GED analysis With FCA and pattern structures Many ways of extracting patterns in GED Biological validation of several patterns We now need a systematic validation step using new knowledge transcription factors biological knowledge base, e.g. Gene Ontology 36 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 49. Conclusion and perspectives To be continued... Short- and mid- term Handle other types of biclusters and algorithm comparison S. C. Madeira and A. L. Oliveira Biclustering Algorithms for Biological Data Analysis: a survey. In IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004. Insert domain knowledge for biological data Study threshold θ effect w.r.t. the number of tolerance classes Post-doctoral position Biclustering (multi-dimensional) numerical data Numerical pattern based classifier and association rules Data privacy and pattern projection Wagner Jr. Meira (Universidade Federal de Minas Gerais, Brasil) 37 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 50. Conclusion and perspectives Cross-domain fertilization Itemset-mining in KDD Other frameworks for closed patterns H. Arimura and T. Uno Polynomial-Delay and Polynomial-Space Algorithms for Mining Closed Sequences, Graphs, and Pictures in Accessible Set Systems. In SIAM International Conference on Data Mining, 2009. G.C. Garriga Formal Methods for Mining Structured Objects. PhD Thesis, Universitat Polit`ecnica de Catalunya, 2006 Condensed representations and fault-tolerant patterns m1 m2 m3 g1 5 7 6 g2 6 8 4 g3 4 8 5 g4 4 9 8 g5 15 8 5 R. Pensa and J.-F. Boulicaut Towards Fault-Tolerant Formal Concept Analysis. In Proc. 9th Congress of the Italian Association for Artificial Intelligence (AI*IA), Springer, 2005. 38 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 51. Conclusion and perspectives Cross-domain fertilization Data-analysis Symbolic data analysis and distances P. Agarwal, M. Kaytoue, S. O. Kuznetsov, A. Napoli and G. Polaillon Symbolic Galois Lattices with Pattern Structures. In International Conference on Rough Sets, Fuzzy Sets, Data-mining and Granularity Computing (RSFDGrC), 2011. Information fusion and fuzzy concept analysis Fuzzy settings and possibility theory Z. Assaghir, M. Kaytoue, and H. Prade A Possibility Theory Oriented Discussion of Conceptual Pattern Ptructures. In Scalable Uncertainty Management, 4th International Conference (SUM), 2010. 39 / 40 On the Mining of Numerical Data with Formal Concept Analysis
  • 52. Merci Danke sch¨on Spasibo 40 / 40 On the Mining of Numerical Data with Formal Concept Analysis

×