What is concept dirft and how to measure it?

1,732 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,732
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
35
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

What is concept dirft and how to measure it?

  1. 1. Introduction A theory of concept drift Case studies Summary and future work What is concept drift and how to measure it? Shenghui Wang, Stefan Schlobach, Michel Klein Vrije Universiteit Amsterdam EKAW 2010 Lisbon
  2. 2. Introduction A theory of concept drift Case studies Summary and future work Outline 1 Introduction 2 A theory of concept drift 3 Case studies Concept drift in political communication Concept drift in DBpedia Concept drift in LKIF-Core 4 Summary and future work
  3. 3. Introduction A theory of concept drift Case studies Summary and future work Introduction Knowledge organisation systems (KOS) play a crucial role in providing semantic interoperability formal ontologies (modelled in OWL) thesauri or taxonomies (described in SKOS) other term classification schemes Concepts are the central constructs However, it is also recognised that concepts drift the meaning of a concept changes over time, location, or culture
  4. 4. Introduction A theory of concept drift Case studies Summary and future work Introduction Knowledge organisation systems (KOS) play a crucial role in providing semantic interoperability formal ontologies (modelled in OWL) thesauri or taxonomies (described in SKOS) other term classification schemes Concepts are the central constructs However, it is also recognised that concepts drift the meaning of a concept changes over time, location, or culture
  5. 5. Introduction A theory of concept drift Case studies Summary and future work Example 1: Follow the Fashion?
  6. 6. Introduction A theory of concept drift Case studies Summary and future work Example 2: Women’s role? Suffragettes said that women’s role in society is unacceptable Pope says that women’s role in society is unacceptable
  7. 7. Introduction A theory of concept drift Case studies Summary and future work Example 3: European Union (1979) The European Community is a common denominator for the European Economic (EEC), the European Coal and Steel Community (ECSC), and the European Atomic Energy Community (EAEC). – DTV Atlas (1999) The European Community is the new stage in the implementation of increasing the Union of the European people. – Brockhaus: Europaeische Gemeinschaft (2003) The European Union or EU is an international organisation of European states, established by the Treaty on European Union. – Wikipedia 2003 (2006) The European Union (EU) is a supranational and intergovernmental union of 25 independent, democratic member states. – Wikipedia 2006 (2010) The European Union is an international organisation comprising 27 European countries and governing common economic, social, and security policies. – Encyclopedia Britanica
  8. 8. Introduction A theory of concept drift Case studies Summary and future work Research questions 1 What is concept drift, and how to formalise it? 2 Can we identify the impact of concept-drift?
  9. 9. Introduction A theory of concept drift Case studies Summary and future work The meaning of a concept We consider the intension, extension and label as three components of the meaning of a concept: Definition The meaning Ct of a concept C at some moment in time t is a triple (labelt(C), intt(C), extt(C)), where labelt(C) is a String, intt(C) a set of properties (the intension of C), and extt(C) a subset of the universe (the extension of C).
  10. 10. Introduction A theory of concept drift Case studies Summary and future work Identity Identity allows us to compare two variants of the same concept at different moments in time even if the meaning (either label, extension or the non-rigid part of its intension) has changed. Definition Two concepts C1 and C2 are considered identical if and only if, their rigid intension are equivalent, i.e., intr (C1) = intr (C2).
  11. 11. Introduction A theory of concept drift Case studies Summary and future work Identity Identity allows us to compare two variants of the same concept at different moments in time even if the meaning (either label, extension or the non-rigid part of its intension) has changed. Definition Two concepts C1 and C2 are considered identical if and only if, their rigid intension are equivalent, i.e., intr (C1) = intr (C2).
  12. 12. Introduction A theory of concept drift Case studies Summary and future work Concept drift This definition of drift is based on the idea that a concept retains its identity over time, i.e., remains the same at least temporarily. Definition A concept C has extensionally drifted between time ti and tj if and only if simext(Cti , Ctj ) = 1. Intensional and label drift are defined similarly. The meaning of a concept has drifted if one of the aspects has drifted.
  13. 13. Introduction A theory of concept drift Case studies Summary and future work Concept shift Definition The meaning of a concept C extensionally shifts between two of its variants Cti and Ctj if the extension of Ctj is more similar to the extension of a non-identical concept rather than to the extension of Cti . Intensional and label shift are defined similarly. C1 t1 C1 t2 C2 t2 time t1 t2
  14. 14. Introduction A theory of concept drift Case studies Summary and future work (In)stability The more the meaning of a concept drifts, the more unstable it becomes. We put the variants of one concept at different moments into a chain, i.e., chain(C, t1, tn) = Ct1 → Ct2 → . . . → Ctn We take the average similarity of all steps along this chain as the stability measure As an relative measure, it tells whether one concept is more stable than another over a certain period of time
  15. 15. Introduction A theory of concept drift Case studies Summary and future work Applying the framework To apply our framework for concept drift in a specific use-case, the following steps are required: 1 to define intension, extension and a labelling function 2 to define the identity of concepts 3 to define similarity functions over intension, extension and labels
  16. 16. Introduction A theory of concept drift Case studies Summary and future work Case studies Political communication (a political vocabulary described in SKOS) DBpedia (a general-purposed ontology modelled in RDF(S)) LKIF-core (a legal ontology modelled in OWL)
  17. 17. Introduction A theory of concept drift Case studies Summary and future work Concept drift in political communication Concept drift in political vocabularies Communication scientists use certain vocabularies to annotate newspapers, so that they can do content analysis. We studied five variants of a SKOS vocabulary of political concepts used during five recent Dutch national election campaigns, which took place in 1994, 1998, 2002, 2003 and 2006. We also collected all newspaper articles which were manually annotated with the concepts from the particular variant of that year. Manuel mappings are used as the identities.
  18. 18. Introduction A theory of concept drift Case studies Summary and future work Concept drift in political communication Intension, extension and label of political concepts The label of a concept is obtained using the SKOS Core labelling property skos:prefLabel. The extension ext(Ct) of a concept Ct ∈ Vt at time t is the set of all sentences annotated by Ct, i.e., exts(Ct) = {s ∈ ∆t | annotatedBy Ct}. The intension of a concept int(Ct) is determined by the most associated concepts. For each concept C, its intension is a set of concepts which co-occur the most in the sentences they code in one moment in time.
  19. 19. Introduction A theory of concept drift Case studies Summary and future work Concept drift in political communication Intension, extension and label of political concepts The label of a concept is obtained using the SKOS Core labelling property skos:prefLabel. The extension ext(Ct) of a concept Ct ∈ Vt at time t is the set of all sentences annotated by Ct, i.e., exts(Ct) = {s ∈ ∆t | annotatedBy Ct}. The intension of a concept int(Ct) is determined by the most associated concepts. For each concept C, its intension is a set of concepts which co-occur the most in the sentences they code in one moment in time.
  20. 20. Introduction A theory of concept drift Case studies Summary and future work Concept drift in political communication Intension, extension and label of political concepts The label of a concept is obtained using the SKOS Core labelling property skos:prefLabel. The extension ext(Ct) of a concept Ct ∈ Vt at time t is the set of all sentences annotated by Ct, i.e., exts(Ct) = {s ∈ ∆t | annotatedBy Ct}. The intension of a concept int(Ct) is determined by the most associated concepts. For each concept C, its intension is a set of concepts which co-occur the most in the sentences they code in one moment in time.
  21. 21. Introduction A theory of concept drift Case studies Summary and future work Concept drift in political communication Intension, extension and label of political concepts The label of a concept is obtained using the SKOS Core labelling property skos:prefLabel. The extension ext(Ct) of a concept Ct ∈ Vt at time t is the set of all sentences annotated by Ct, i.e., exts(Ct) = {s ∈ ∆t | annotatedBy Ct}. The intension of a concept int(Ct) is determined by the most associated concepts. For each concept C, its intension is a set of concepts which co-occur the most in the sentences they code in one moment in time.
  22. 22. Introduction A theory of concept drift Case studies Summary and future work Concept drift in political communication Similarity measures Edit distance between concept labels Jaccard similarity between concept intensions Instance-matching based similarity between concept extensions
  23. 23. Introduction A theory of concept drift Case studies Summary and future work Concept drift in political communication Stability of political concepts 2002 2003 2006 Environmental Activist Democracy 0.03 Moroccans 0.02 Rechtsstaat 0.03 Democracy High Incomes 0.02 Referendum 0.04 Bureaucracy 0.04 Democracy Islam 0.02 Voting Computers 0.01 Sharia 0.03 Figure: Intension of concept Democracy in 3 years, with average stability of (Sint = 0.02)
  24. 24. Introduction A theory of concept drift Case studies Summary and future work Concept drift in political communication Stability of political concepts 2002 2003 2006 unions employees unions Socio-Economic Council employees employers 0.1220.09 0.229 social pact employers 0.189 0.266 0.26 employers employees work migration 0.032 0.085 discrimination 0.048 Figure: Intension of concept Employers in 3 years, with average stability of (Sint = 0.15)
  25. 25. Introduction A theory of concept drift Case studies Summary and future work Concept drift in political communication Concept shifts of political concepts 1994 1998 2006 Military Military Dutch military deployment Military 2003 2006 Childcare Childcare Free Childcare (a) Label shift (b) Extensional shift Figure: Example of label shift and extension shift, where the red links indicate the two concepts are identical according to our domain experts, while the blue links are the most similar concepts in terms of the corresponding aspect.
  26. 26. Introduction A theory of concept drift Case studies Summary and future work Concept drift in DBpedia Concept drift in DBpedia We studied 4 versions of DBpedia: 3.2, 3.3. 3.4 and 3.5 We use URI references as identities of concepts
  27. 27. Introduction A theory of concept drift Case studies Summary and future work Concept drift in DBpedia Concept drift in DBpedia We studied 4 versions of DBpedia: 3.2, 3.3. 3.4 and 3.5 We use URI references as identities of concepts
  28. 28. Introduction A theory of concept drift Case studies Summary and future work Concept drift in DBpedia RDF(S) concepts and their meaning Definition Let O be the DBpedia ontology, i.e., a set of triples (s, p, o), and O∗ the semantic closure of O. The rdf-label labr (C) of C is defined as the object of the (C,rdfs:label, o). The rdf-extension extr (C) of C is defined as the set of resources r such that (r rdf:type C) ∈ O∗. The rdf-intension intr (C) of C is defined as the set of all triples (C, p, o) ∈ O∗ in O where p =rdfs:subclass and (s, p, C), where p ∈ {rdfs:subclass, rdfs:domain, rdfs:range}.
  29. 29. Introduction A theory of concept drift Case studies Summary and future work Concept drift in DBpedia Stability ranking of DBpedia concepts Rank Extensional Intensional 1 Planet SportsEvent 2 Road FormulaOneRacer 3 Infrastructure WineRegion 4 Cyclist Cleric 5 LunarCrater WrestlingEvent ... ... 163 OfficeHolder Vein 164 Politician BasketballPlayer 165 City EthnicGroup 166 College Band 167 ChemicalCompound BritishRoyalty Table: The top 5 most stable and last 5 least stable DBpedia concepts in terms of their extension and intension (of the 167 concepts present in all four versions)
  30. 30. Introduction A theory of concept drift Case studies Summary and future work Concept drift in DBpedia Concept shifts in DBpedia dbpedia32 dbpedia33 dbpedia34 dbpedia35 SportsEvent SportsEvent 0.98 Protista Protista 0.89 City City 0.99 River River 0.99 ChemicalCompound ChemicalCompound 0.64 SportsEvent 0.98 Fungus 0.77 City 0.84 River 0.78 ChemicalCompound 0.47 SportsEvent 0.97 Fungus 0.89 Settlement 0.60 Stream 0.62 ChemicalCompound 0.71
  31. 31. Introduction A theory of concept drift Case studies Summary and future work Concept drift in LKIF-Core LKIF-Core ontology The Legal Knowledge Interchange Format (LKIF) Core Ontology is a core ontology of basic legal concepts, developed by the ESTRELLA consortium We study 4 major versions of LKIF-Core: 1.0, 1.0.2, 1.0.3 and 1.1. Unfortunately, the rdfs:label actually was rarely used; only 4 concepts specify their labels which stay constant for all variants. There are no instances associated with these legal concepts
  32. 32. Introduction A theory of concept drift Case studies Summary and future work Concept drift in LKIF-Core LKIF-Core ontology The Legal Knowledge Interchange Format (LKIF) Core Ontology is a core ontology of basic legal concepts, developed by the ESTRELLA consortium We study 4 major versions of LKIF-Core: 1.0, 1.0.2, 1.0.3 and 1.1. Unfortunately, the rdfs:label actually was rarely used; only 4 concepts specify their labels which stay constant for all variants. There are no instances associated with these legal concepts
  33. 33. Introduction A theory of concept drift Case studies Summary and future work Concept drift in LKIF-Core The meaning of OWL concepts Definition Let O to be the OWL ontology and O∗ denote the OWLIM inferred semantic closure. The owl-label labo(C) of C is defined as the object of the (C,rdfs:label, o). The owl-intension into(C) of C is defined: 1 all triples (C, p, o) ∈ O∗ and (s, p, C) ∈ O∗ 2 all triples in chains {(C, p1, o1) ◦ (s2, p2, o2) ◦ . . . , ◦(sn, pn, on)} where sk = ok−1, plus 3 all triples in chains {(s1, p1, o1) ◦ (s2, p2, o2), ◦, . . . , ◦(sn, pn, C)} where sk+1 = ok being blank nodes.
  34. 34. Introduction A theory of concept drift Case studies Summary and future work Concept drift in LKIF-Core Stable and unstable concepts Most stable concepts Most unstable concepts norm.owl#Custom legal-action.owl#Mandate expression.owl#Promise legal-action.owl#Public Law norm.owl#Potestative Expression legal-action.owl#Asignment norm.owl#Hohfeldian Power legal-action.owl#Act of Law relative-places.owl#Place legal-action.owl#Delegation Table: Top 5 stable and unstable concepts.
  35. 35. Introduction A theory of concept drift Case studies Summary and future work Concept drift in LKIF-Core Intensional shift in LKIF-Core lkif1.0:action.owl#Speech Act lkif1.0.2:expression.owl#Speech Act lkif1.0:action.owl#Termination lkif1.0.2:process.owl#Termination lkif1.0.2:lkif-top.owl#Mental Concept lkif1.0.3:lkif-top.owl#Mental Entity lkif1.0.2:lkif-top.owl#Physical Concept lkif1.0.3:lkif-top.owl#Physical Entity Table: Examples of confirmed intensional shift in LKIF-Core
  36. 36. Introduction A theory of concept drift Case studies Summary and future work Summary We proposed a general theory to study concept drift based on concept identity. We introduced a theoretical foundation for the notions of drift, shift and stability We applied the general mechenism in three practical applications modelled in SKOS, RDFS and OWL respectively.
  37. 37. Introduction A theory of concept drift Case studies Summary and future work Future work Investigate alternative theories for concept drift, such as based on morphing Develop systematic evaluation methods Develop applications which leverage the detected concept drift

×