Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Learning Characteristic Rules in Geographic
Information Systems
A. Salleb-Aouissi 1, C. Vrain 2, D. Cassard 3
1CCLS - Colu...
Plan
1 Introduction
2 Distance-based characteristic rules
3 Experiments
Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learn...
Plan
1 Introduction
2 Distance-based characteristic rules
3 Experiments
Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learn...
The characterization task
Characterization: a descriptive data mining task
given a target set of objets (denoted by X0)
⇒
...
Extension to relational databases [PKDD03]
An intermediate language based on existential and universal
quantifiers
A set of...
Contributions
Extension of the work presented in [PKDD 03] for relational
databases
⇒ Flexible quantifiers: ∃e
, ∀f
Movie(S...
Geographic Information Systems
A GIS allows to handle geographic, spatially
referenced data: a position and a shape in
the...
Plan
1 Introduction
2 Distance-based characteristic rules
3 Experiments
Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learn...
Specification of the characterization task
Inputs
E: a set of geographic objects organized into layers
E = E1 ∪ E2 · · · ∪ ...
Distance quantified paths
X0 − Q1 X1 . . . Qn Xn
where
n ≥ 0
X0 represents the target set of objects to characterize,
for e...
Language of properties
Given for each type Ti,
a language Li specifying the properties that can be built
a boolean functio...
Generality order between paths
Let δ1 and δ2 be two distance quantified paths.
δ1 is more general than δ2 (δ1 δ2) iff
lengt...
Generality order between rules
δ1 → p1 is more general than δ2 → p2 (r1 r2) iff
either δ1 δ2 and p1 p2,
or length(δ1) < le...
Notion of coverage
Let o an objet and let δ → p be a rule.
δ is decomposed into QλX.δ and we consider the objects o1, . . ...
Geographic Information Systems
Let Etarg a given target set of objects
coverage(r, Etarg) =
{o|o ∈ Etarg, Vr (o) = true}
E...
Link-coverage
Definition of the link-coverage of a rule r (δ → p):
L-coverage(r, Etarg) = coverage(open(δ) → True, Etarg)
w...
SIGMiner
Input:
- Etarg, Ei , Pi , i ∈ {1..n}
- Rij binary relations between Ei and Ej , i, j ∈ {1..n}
- MinCov.
Output:
-...
Plan
1 Introduction
2 Distance-based characteristic rules
3 Experiments
Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learn...
GIS Andes
Figure: Database schema of GIS Andes. Links represent an “is_distant”
relationship.
Pre-computation of the dista...
An example
Figure: Example of tree exploration in GISMiner.
Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2...
Classical learned rules
Rule Coverage
Mines → Mines.Era ∈ {Mesozoic, Cretacious} 4%
Mines → Mines.Era ∈ {Mesozoic, Jurassi...
More complex rules
Rule Coverage
Minesgold − ∃1
10kmGeology → True 95%
Minesgold − ∃1
10kmGeology → Geology.Age ∈ {Cenozoi...
Conclusion
Extension of the framework based on quantified paths
Introduction of distance-based relations for GIS
⇒ allows t...
Links with description logics
Let X0 − QR0
X1 . . . QRn−1
Xn → p, we associate
the atomic concept Xi to each type of objec...
Upcoming SlideShare
Loading in …5
×

RuleML2015: Learning Characteristic Rules in Geographic Information Systems

738 views

Published on

We provide a general framework for learning characterization
rules of a set of objects in Geographic Information Systems (GIS) relying
on the definition of distance quantified paths. Such expressions specify
how to navigate between the different layers of the GIS starting from
the target set of objects to characterize. We have defined a generality
relation between quantified paths and proved that it is monotonous with
respect to the notion of coverage, thus allowing to develop an interactive
and effective algorithm to explore the search space of possible rules. We
describe GISMiner, an interactive system that we have developed based
on our framework. Finally, we present our experimental results from a
real GIS about mineral exploration.

Published in: Science
  • Be the first to comment

  • Be the first to like this

RuleML2015: Learning Characteristic Rules in Geographic Information Systems

  1. 1. Learning Characteristic Rules in Geographic Information Systems A. Salleb-Aouissi 1, C. Vrain 2, D. Cassard 3 1CCLS - Columbia University - New York 2LIFO - Université d’Orléans - France 3French Geological Survey (BRGM) RuleML 2015 Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 1 / 24
  2. 2. Plan 1 Introduction 2 Distance-based characteristic rules 3 Experiments Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 2 / 24
  3. 3. Plan 1 Introduction 2 Distance-based characteristic rules 3 Experiments Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 3 / 24
  4. 4. The characterization task Characterization: a descriptive data mining task given a target set of objets (denoted by X0) ⇒ find a description of these objects X0 → p (measure) A set of movies (for instance the movies produced by S. Spielberg) Movie(Sp) → date ∈ [1974, 2010](86%) Main advantages focused on a set of positive examples negative examples can be used to focus on important properties ⇒ Supervised Descriptive Rule Discovery: mining emergent patterns, subgroup discovery, mining contrast set ⇒ differs from discrimination and classification Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 4 / 24
  5. 5. Extension to relational databases [PKDD03] An intermediate language based on existential and universal quantifiers A set of movies (movies produced by S. Spielberg) A relation between movies and awards Movie(Sp) → ∃Award Award.kind in {Oscar, GoldenPalm}(25%) Movie(Sp) → ∀Award Award.kind in {Oscar, GoldenPalm}(10%) X0 → Q1 X1 . . . Qn Xn p X0: the target objects Xi: a type of objects there exists a relation between Xi−1 and Xi Qi = ∀ or ∃ The quantifier can be indexed by the name of the relation if needed. Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 5 / 24
  6. 6. Contributions Extension of the work presented in [PKDD 03] for relational databases ⇒ Flexible quantifiers: ∃e , ∀f Movie(Sp) → ∃2 Actor Actor.nationality = French (xxx%) Movie(Sp) → ∀20% Actor Actor.nationality = French (xxx%) ⇒ Application to GIS: management of spatial data and spatial relations between objects Introduction of distance-based relations for GIS → allows to model spatial buffers around objects, as suggested in [PKDD 03] Extension of the generality relation between rules People → ∃Movie ∃Award p People → ∀Movie ∃Award p ∃2 10KmFault ∃2 5KmFault ∃3 3KmFault Experiments on a SIG Andes with an interactive algorithm Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 6 / 24
  7. 7. Geographic Information Systems A GIS allows to handle geographic, spatially referenced data: a position and a shape in the space. → organization into thematic layers, linked by geography → descriptions of the geographical objects by attribute-value tables ⇒ Experiments on a homogeneous GIS, a tool for mineral exploration and development extending for some 8,500 km long, from the Guajira Peninsula (northern Colombia) to Cape Horn (Tierra del Fuego) → an area of 3.83 million km2 more than 70 thousands geographic objects geographic, geologic, seismic, volcanic, mineralogy, gravimetric, . . . layers mines, volcanos, faults Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 7 / 24
  8. 8. Plan 1 Introduction 2 Distance-based characteristic rules 3 Experiments Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 8 / 24
  9. 9. Specification of the characterization task Inputs E: a set of geographic objects organized into layers E = E1 ∪ E2 · · · ∪ En, where each Ei represents a set of objects with the same type Ti. A set of attributes for each type of objects; objects are described by attribute-value pairs Two kinds of relations between objects classical relations between objects: intersect, overlap, . . . rλ ij for each type of objects Ei and Ej . rλ ij (oi , oj ) is true when d(oi , oj ) ≤ λ A measure: support, novelty, . . . Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 9 / 24
  10. 10. Distance quantified paths X0 − Q1 X1 . . . Qn Xn where n ≥ 0 X0 represents the target set of objects to characterize, for each i = 0, Xi is a type of objects, for each i = 0, Qi is either: ∀f rij , ∃e rij , ∀f λ, ∃e λ f is a percentage (f = 0), e is a natural number (e = 0) the indexation by λ stands for the distance relation rλ (i−1)i between Xi−1 and Xi ∀100% (resp. ∃1) stands for ∀ (resp. ∃). Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 10 / 24
  11. 11. Language of properties Given for each type Ti, a language Li specifying the properties that can be built a boolean function V, determining for each object o of type Ti and for each property p in Li whether Vp(o) = true or Vp(o) = false A geographic characteristic rule on a target set X0 a conjunction of a distance quantified path δ and a property p X0 − δ → p Mines − ∃3 5km Faults → True: there exist at least 3 Faults within 5km of the a target object (mineral deposits). Mines − ∃1 1km Volcano → (active=yes): there exist at least one active volcano within 1km of a target object (mineral deposits). Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 11 / 24
  12. 12. Generality order between paths Let δ1 and δ2 be two distance quantified paths. δ1 is more general than δ2 (δ1 δ2) iff length(δ1) = length(δ2) δ1 and δ2 involve the same type of objects in the same order for 1 ≤ i ≤ length(δ1), either: Q1 i ≡ Q2 i , or Q1 i = ∃rij and Q2 i = ∀rij Q1 i = ∃λ and Q2 i = ∀λ Q1 i = ∃e rij and Q2 i = ∃e rij , with e ≤ e Q1 i = ∃e λ and Q2 i = ∃e λ , with λ ≥ λ and e ≤ e Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 12 / 24
  13. 13. Generality order between rules δ1 → p1 is more general than δ2 → p2 (r1 r2) iff either δ1 δ2 and p1 p2, or length(δ1) < length(δ2), δ1 is more general than the prefix of δ2 with length equal to length(δ1) and p1 = True. ∃2 10KmFault ∃2 5KmFault ∃3 3KmFault True is more general than ∃2 10KmFault We have ∀3KmFault ∀5KmFault ∀10KmFault but no relation between ∀40% 5KmFault and ∀20% 10KmFault. Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 13 / 24
  14. 14. Notion of coverage Let o an objet and let δ → p be a rule. δ is decomposed into QλX.δ and we consider the objects o1, . . . on of type X at a distance less than λ from o. If n = 0 (no objects of X at a distance less than λ from o) V∀f λX.δ →p(o) = V∃e λX.δ →p(o) = False V∀f λX.δ →p(o) = True if |{oi |Vδ →p(oi )=True}| n ≥ f , False otherwise V∃e λX.δ →p(o) = True if |{oi|Vδ →p(oi) = True}| ≥ e, False otherwise. Let us notice that V∀λX.δ →p(o) = Vδ →p(o1) ∧ · · · ∧ Vδ →p(on) V∃λX.δ →p(o) = Vδ →p(o1) ∨ · · · ∨ Vδ →p(on) The same definition easily extends to a relation rij by considering the objects o1, . . . on linked to o by the relation rij. Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 14 / 24
  15. 15. Geographic Information Systems Let Etarg a given target set of objects coverage(r, Etarg) = {o|o ∈ Etarg, Vr (o) = true} Etarg Proposition. Let r1 (δ1 → p1) and r2 (δ2 → p2) be two geographic rules then r1 r2 ⇒ coverage(r1, Etarg) ≥ coverage(r2, Etarg) Corollary: If r1 is not frequent, r2 is not frequent. Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 15 / 24
  16. 16. Link-coverage Definition of the link-coverage of a rule r (δ → p): L-coverage(r, Etarg) = coverage(open(δ) → True, Etarg) where open(δ) is obtained by setting all the quantifiers of δ to ∃ (with no constraint on the number of elements). Proposition: If L-coverage(r, Etarg) ≤ then coverage(r, Etarg) ≤ Corollary: If open(δ) → True is not frequent, then all its specializations are not frequent. Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 16 / 24
  17. 17. SIGMiner Input: - Etarg, Ei , Pi , i ∈ {1..n} - Rij binary relations between Ei and Ej , i, j ∈ {1..n} - MinCov. Output: - A set of characterization rules R and a tree representing the rules. QP =empty string, response=T while response do Choose a quantifier q ∈ {∀, ∃} Choose a buffer λ or a relation ri,j Choose a parameter k for the quantifier Choose a set of objects Ej ∈ {Ei , i ∈ {1..n}} QP = QP.Qk λ Ej if L-coverage(Etarg − QP → True) ≥ MinCov then foreach property p ∈ Pj do if coverage(Etarg − QP → p, Etarg) ≥ MinCov then if interesting(Etarg − QP → p) then R=R ∪ {Etarg − QP → p} if user no longer wishes to extend QP then response=F Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 17 / 24
  18. 18. Plan 1 Introduction 2 Distance-based characteristic rules 3 Experiments Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 18 / 24
  19. 19. GIS Andes Figure: Database schema of GIS Andes. Links represent an “is_distant” relationship. Pre-computation of the distance between objects, given a large distance thresold Pre-computation of relation tables between objects Only rules with |novelty| ≥ 0.05 are kept. novelty(r) = |{o|o∈Etarg, Vr (o)=true}| |E| - |Etarg| |E| · |{o|o∈E, Vr (o)=true}| |E| Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 19 / 24
  20. 20. An example Figure: Example of tree exploration in GISMiner. Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 20 / 24
  21. 21. Classical learned rules Rule Coverage Mines → Mines.Era ∈ {Mesozoic, Cretacious} 4% Mines → Mines.Era ∈ {Mesozoic, Jurassic, Cretacious} 6% Mines → Mines.Lithology = sedimentary deposits 5% Mines → Mines.Lithology = volcanic deposits 64% Mines → Mines.Distance_Benioff ∈ [170..175] 67% Minesgold → substance = Gold/Copper 12% Minesgold → Country = Peru 31% Minesgold → Country = Chile 16% Minesgold → Country = Argentina 22% Minesgold → Morphology = Present − dayorrecentplacers 16% Minesgold → Morphology = Discordantlodeorvein(thickness > 50cm), · · · 30% Minesgold → Gitology = Alluvial − eluvialplacers 14% Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 21 / 24
  22. 22. More complex rules Rule Coverage Minesgold − ∃1 10kmGeology → True 95% Minesgold − ∃1 10kmGeology → Geology.Age ∈ {Cenozoic, Tertiary} 58% Minesgold − ∃1 10kmGeology → Geology.Age ∈ {Cenozoic, Quaternary} 40% Minesgold − ∃1 10kmGeology → Geology.Age = Paleozoic 38% Minesgold − ∃1 10kmGeology → Geology.System = Neogene 41% Minesgold − ∃1 10kmGeology → Geology.GeolType = Sedimentary 35% Minesgold − ∃1 15kmFaults → True 63% Minesgold − ∃2 15kmFaults → True 51% Minesgold − ∃3 15kmFaults → True 43% Minesgold − ∀75% 10kmGeology∃1 20kmFault → True 58% Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 22 / 24
  23. 23. Conclusion Extension of the framework based on quantified paths Introduction of distance-based relations for GIS ⇒ allows to model spatial buffers around objects, as suggested in [PKDD 03] Introduction of flexible operators ∃e and ∀f allowing much more interesting rules ⇒ ∃e is more interesting than ∀f from the point of view of generality An interactive algorithm for mining distance based geographic rules. In progress, an implementation of a relational rule mining system performing a breadth-first search. Interest of the formalism for learning in Description Logics? Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 23 / 24
  24. 24. Links with description logics Let X0 − QR0 X1 . . . QRn−1 Xn → p, we associate the atomic concept Xi to each type of object Xi the role Ri to each relation Ri linking Xi to Xi+1 the concept P to the property p quantified path + property representation in DL ∅ p P ∀Xi p Xi ∀Ri .P ∃Xi p Xi ∃Ri .P ∃e is a cardinality constraint. Salleb, Vrain,Cassard (CCLS,LIFO,BRGM) Rules learning RuleML 2015 24 / 24

×