Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# Mining Implications from Lattices of Closed Trees

454
views

Published on

Talk about mining implications from datasets of unlabeled trees, using lattices of closed trees.

Talk about mining implications from datasets of unlabeled trees, using lattices of closed trees.

Published in: Technology

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
454
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
7
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. Mining Implications from Lattices of Closed Trees José L. Balcázar, Albert Bifet and Antoni Lozano Universitat Politècnica de Catalunya Extraction et Gestion des Connaissances EGC’2008 2008 Sophia Antipolis, France
• 2. Introduction Problem Given a dataset D of rooted, unlabelled and unordered trees, ﬁnd a “basis”: a set of rules that are sufﬁcient to infer all the rules that hold in the dataset D. ∧ → ∧ → → D ∧ →
• 3. Introduction Set of Rules: A → ΓD (A). antecedents are 2 1 3 obtained through a computation akin to a hypergraph transversal consequents 12 13 23 follow from an application of the closure operators 123
• 4. Introduction Set of Rules: A → ΓD (A). ∧ → 1 2 3 ∧ → → 12 13 23 ∧ → 123
• 5. Trees Our trees are: Our subtrees are: Rooted Induced Unlabeled Top-down Unordered Two different ordered trees but the same unordered tree
• 6. Deterministic association rules Logical implications are the traditional mean of representing knowledge in formal AI systems. In the ﬁeld of data mining they are known as association rules. M a b c d m1 1 1 0 1 a → b, d m2 0 1 1 1 d → b m3 0 1 0 1 a, b → d Deterministic association rules are implications with 100% conﬁdence. An advantatge of deterministic association rules is that they can be studied in purely logical terms with propositional Horn logic.
• 7. Propositional Horn Logic M a b c d m1 1 1 0 1 a → b, d ¯ ¯ (a ∨ b) ∧ (a ∨ d) m2 0 1 1 1 d →b ¯ d ∨b m3 0 1 0 1 a, b → d ¯ ¯ a∨b∨d Assume a ﬁnite number of variables. V = {a, b, c, d} A clause is Horn iff it contains at most one positive literal. ¯ ¯ a∨b∨d a, b → d A model is a complete truth assignment from variables to {0, 1}. m(a) = 0, m(b) = 1, m(c) = 1, . . . Given a set of models M, the Horn theory of M corresponds to the conjunction of all Horn clauses satisﬁed by all models from M.
• 8. Propositional Horn Logic Theorem Given a set of models M, there is exactly one minimal Horn theory containing it. Semantically, it contains all the models that are intersections of models of M. This is sometimes called the empirical Horn approximation. We propose Closure operator translation of tree set of rules to a speciﬁc propositional theory
• 9. Closure Operator D: the ﬁnite input dataset of trees T : the (inﬁnite) set of all trees Deﬁnition We deﬁne the following the Galois connection pair: For ﬁnite A ⊆ D σ (A) is the set of subtrees of the A trees in T σ (A) = {t ∈ T ∀ t ∈ A (t t )} For ﬁnite B ⊂ T τD (B) is the set of supertrees of the B trees in D τD (B) = {t ∈ D ∀ t ∈ B (t t )} Closure Operator The composition ΓD = σ ◦ τD is a closure operator.
• 10. Galois Lattice of closed set of trees 1 2 3 12 13 23 123
• 11. Model transformation Intuition One propositional variable vt is assigned to each possible subtree t. A set of trees A corresponds in a natural way to a model mA . Let mA be a model: we impose on mA the constraints that if mA (vt ) = 1 for a variable vt , then mA (vt ) = 1 for all those variables vt such that vt represents a subtree of the tree represented by vt . R0 = {vt → vt t t, t ∈ U , t ∈ U }
• 12. Model transformation Theorem The following propositional formulas are logically equivalent: the conjunction of all the Horn formulas that are satisﬁed by all the models mt for t ∈ D the conjunction of R0 and all the propositional translations of the formulas in RD RD = {A → t ΓD (A) = C , t ∈ C } C the conjunction of R0 and all the propositional translations of the formulas in a subset of RD obtained transversing the hypergraph of differences between the nodes of the lattice.
• 13. Association Rule Computation Example 1 2 3 12 13 23 123
• 14. Association Rule Computation Example 1 2 3 12 13 23 123
• 15. Association Rule Computation Example → 1 2 3 12 13 23 123
• 16. Implicit rules Implicit Rule ∧ → D Given three trees t1 , t2 , t3 , we say that t1 ∧ t2 → t3 , is an implicit Horn rule (abbreviately, an implicit rule) if for every tree t it holds t1 t ∧ t2 t ↔ t3 t. t1 and t2 have implicit rules if t1 ∧ t2 → t is an implicit rule for some t.
• 17. Implicit rules Implicit Rule ∧ → D Given three trees t1 , t2 , t3 , we say that t1 ∧ t2 → t3 , is an implicit Horn rule (abbreviately, an implicit rule) if for every tree t it holds t1 t ∧ t2 t ↔ t3 t. t1 and t2 have implicit rules if t1 ∧ t2 → t is an implicit rule for some t.
• 18. Implicit rules NOT Implicit Rule ∧ → D Given three trees t1 , t2 , t3 , we say that t1 ∧ t2 → t3 , is an implicit Horn rule (abbreviately, an implicit rule) if for every tree t it holds t1 t ∧ t2 t ↔ t3 t. t1 and t2 have implicit rules if t1 ∧ t2 → t is an implicit rule for some t.
• 19. Implicit rules NOT Implicit Rule ∧ → This supertree of the antecedents is NOT a supertree of the consequents.
• 20. Implicit rules NOT Implicit Rule → D Given three trees t1 , t2 , t3 , we say that t1 ∧ t2 → t3 , is an implicit Horn rule (abbreviately, an implicit rule) if for every tree t it holds t1 t ∧ t2 t ↔ t3 t. t1 and t2 have implicit rules if t1 ∧ t2 → t is an implicit rule for some t.
• 21. Implicit Rules Theorem All trees a, b such that a b have implicit rules. Theorem Suppose that b has only one component. Then they have implicit rules if and only if a has a maximum component which is a subtree of the component of b. for all i < n ai an b1 ∧ → a1 ··· an−1 an b1 a1 ··· an−1 b1
• 22. Experimental Validation: CSLOGS 800 Number of rules 700 Number of rules not implicit Number of detected rules 600 500 400 300 200 100 0 5000 10000 15000 20000 25000 30000 Support
• 23. Summary Conclusions A way of extracting high-conﬁdence association rules from datasets consisting of unlabeled trees antecedents are obtained through a computation akin to a hypergraph transversal consequents follow from an application of the closure operators Detection of some cases of implicit rules: rules that always hold, independently of the dataset