Rough Set Semantics for Identity Management on the Web

1,836 views

Published on

Presented at the AAAI Fall Symposium for Big Data on 2013-11-15.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,836
On SlideShare
0
From Embeds
0
Number of Embeds
510
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Rough Set Semantics for Identity Management on the Web

  1. 1. Rough Set Semantics for Identity management on the Web Wouter Beek (wouterbeek.com) Stefan Schlobach Frank van Harmelen
  2. 2. Problems of identity β€’ Statements only hold in certain contexts (no substitution salva veritate) β€’ Identity is mistaken for representation. β€’ Identity is mistaken for (close) relatedness. But more importantly: β€’ Semantics: identity assertion (claim about meaning) β€’ Pragmatics: data linking (import additional properties) β€’ Due to: Open World Assumption
  3. 3. owl:differentFrom(Semantics,Pragmatics) SEMANTICS PRACTICE π‘Ž1 , π‘Ž2 ∈ 𝐸π‘₯𝑑 𝐼 π‘œπ‘€π‘™: π‘ π‘Žπ‘šπ‘’π΄π‘  iff π‘Ž1 = π‘Ž2 β€œLink your data to other people’s data to provide context.” [5-star LOD] β€œRDF links often have the owl:sameAs predicate.” [VoID]
  4. 4. Can Leibniz help? β€’ Indiscernibility of identicals (Leibniz’ principle) β€’ π‘Ž = 𝑏 β†’ βˆ€πœ™ πœ™ π‘Ž = πœ™ 𝑏 β€’ Identity of indiscernibles β€’ βˆ€πœ™ πœ™ π‘Ž = πœ™ 𝑏 β†’ π‘Ž = 𝑏 β€’ Trivially true, since πœ†π‘₯. (π‘₯ = 𝑏) is one of the πœ™β€™s
  5. 5. Solutions (as identified in the literature) [1/2] 1) Weaken owl:sameAs E.g. skos:closeMatch 2) Extend owl:sameAs Annotate with Fuzzyness or uncertainty. 3) Make contexts explicit E.g. use named graphs E.g. use namespaces β€œThat is the star that can be seen in the morning, but not in the evening”@geolocation
  6. 6. Solutions (as identified in the literature) [2/2] 4) Use domain-specific identity relations β€œx and y have the same medical use” @medicine β€œx and y are the same molecule” @chemistry 5) Change modeling practice Notification upon read. Require reciprocal confirmation upon change. β€œOn the Web of Data, anybody can say anything about anything.” [Van Harmelen]
  7. 7. Indiscernibility Identity is the smallest equivalence relation. Indiscernibility: resources are the same w.r.t. a limited set of predicates. Indiscernibility is an equivalence relation (reasoning!), although not necessarily the smallest one. Every indiscernibility relation is also an identity relation, but over a different domain: β€’ Example: Take the set of people and property 𝑃𝑖 βŠ† π‘ƒπ‘’π‘œπ‘π‘™π‘’ Γ— πΌπ‘›π‘π‘œπ‘šπ‘’. Context {𝑃𝑖 } induces the identity relation between income-groups.
  8. 8. Indiscernibility 1 Two resources are indiscernible w.r.t. a set of predicates 𝑃 βŠ† 𝑃 𝐺 (predicate terms in G), if they share the predicate-object pairs for 𝑃. 𝐼𝑁𝐷 𝑃 = π‘₯, 𝑦 ∈ 𝑆 2 βˆ€ π‘βˆˆπ‘ƒ (𝑓 𝑝 π‘₯ = 𝑓 𝑝 (𝑦))} 𝐺 where 𝑓 𝑝 π‘₯ = {𝑦| 𝐼 π‘₯ , 𝑦 ∈ 𝐸π‘₯𝑑 𝐼 𝑝 } Example: β€œWouter and Stefan have the same employer, so they are indiscernible w.r.t. predicate hasEmployer.
  9. 9. Indiscernibility 2 β€’ We take a given identity relation and partition it into subsets (i.e. identity sub-relations) which are described in terms of the vocabulary. β€’ Subsets of the given identity relation are π‘ƒβˆ— -indiscernible, for sets of predicates π‘ƒβˆ— βŠ† β„˜ 𝑃 𝐺 Example: β€’ β€œ(Wouter and Albert) and (Stefan and Paul) belong to the same identity sub-relation, since they are indiscernible w.r.t. the same collections of properties. β€’ Wouter and Albert are β€œemployedAs PhD”; Stefan and Paul are β€œemployedAs Assistant Professor”.
  10. 10. Indiscernibility 2 π‘ƒβˆ— βŠ† β„˜ 𝑃 𝐺 𝐼𝑁𝐷 π‘ƒβˆ— = π‘₯1 , 𝑦1 , π‘₯2 , 𝑦2 ∈ 𝑆 2 2 𝐺 βˆ€π‘ƒ ∈ π‘ƒβˆ— ( π‘₯1 , 𝑦1 ∈ 𝐼𝑁𝐷 𝑃 ↔ π‘₯2 , 𝑦2 ∈ 𝐼𝑁𝐷(𝑃)} For comparison: 𝑃 βŠ† 𝑃𝐺 𝐼𝑁𝐷 𝑃 = π‘₯, 𝑦 ∈ 𝑆 2 βˆ€ π‘βˆˆπ‘ƒ 𝑓 𝑝 π‘₯ = 𝑓 𝑝 (𝑦)} 𝐺
  11. 11. Example of an indiscernibility partition
  12. 12. Rough set approximation Higher approximation: π‘₯ β‰ˆ 𝐻 𝑦 ⇔ βˆƒπ‘’, 𝑣( 𝑒, 𝑣 β„› π‘₯, 𝑦 ∧ 𝑒 β‰ˆ 𝑣) Lower approximation: π‘₯ β‰ˆ 𝐿 𝑦 ⇔ βˆ€π‘’, 𝑣( 𝑒, 𝑣 β„› π‘₯, 𝑦 β†’ 𝑒 β‰ˆ 𝑣) But what is β„› (β€˜resemblance’)? β„› = 𝐼𝑁𝐷(β„˜ 𝑃 𝐺 )
  13. 13. Example of indiscernibility approximations
  14. 14. Quality | β‰ˆπΏ | ∝ β‰ˆ = |β‰ˆπ»| β€’ Based on the rough set approximation β‰ˆ 𝐿 , β‰ˆ 𝐻 . β€’ Since a consistently applied identity relation has relatively many partition sets that contain either no identity pairs (small value for | β‰ˆ 𝐻 |) or only identity pairs (large value for | β‰ˆ 𝐿 |), a more consistent identity relation has a higher quality metric.
  15. 15. Generalizations β€’ This works for any binary relation (not only owl:sameAs). β€’ We only discussed the identity of non-property resources, but properties can also be identical. β€’ We skipped the treatment of blank nodes and typed literals (which have special identity criteria). β€’ The indiscernibility β€˜language’ can be made must stronger, allowing more fine-grained identity sub-relations: β€’ β€’ β€’ β€’ Length-1 paths, e.g. β€œWouter lives in the Netherlands.” Length-2 paths, e.g. β€œWouter lives in a country which borders Germany.” Length-𝑛 paths. Intervals in the value space of typed literals, e.g. β€œwas published between 1901 and 1905” β€’ Natural language translation, e.g. β€œlives in Germany” and β€œlives in Deutschland”
  16. 16. Depth-𝑛 Predicate Path Map (PPM) A sequence of 𝑛 predicates denoting a (functional) mapping from subject terms into sets of object terms: 𝑓 𝑝1 ,…,𝑝 𝑛 π‘›βˆ’1 𝑖=1 𝑠 = {π‘œ ∈ 𝑂 𝐺 |βˆƒπ‘₯1 , … , π‘₯ π‘›βˆ’1 (π‘₯ 𝑛 = π‘œ ∧ 𝐼 π‘₯ 𝑖 , 𝐼 π‘₯ 𝑖+1 ∈ 𝐸π‘₯𝑑 𝐼 𝑝 } π‘βˆˆ 𝑝 𝑖 +1
  17. 17. Indiscernibility 1 (generalized) Two resources are indiscernible w.r.t a set of PPMs 𝑃 βŠ‚ 𝑃 𝐺𝑛 , if they share the properties denoted by 𝑃. 𝐼𝑁𝐷 𝑃 = π‘₯, 𝑦 ∈ 𝑆 2 βˆ€ π‘βˆˆ 𝐺 𝑃 (𝑓 𝑝 π‘₯ ≍ 𝑓 𝑝 (𝑦))} Example: β€œWouter and Stefan have the same employer, so they are indiscernible w.r.t. has-employer. Details: β€’ 𝑃 = 𝑝1 ,…,𝑝 𝑛 βˆˆπ‘ƒ 𝑝1 Γ— β‹― Γ— 𝑝 𝑛
  18. 18. Indiscernibility 2 (generalized) We take a given set of pairs (e.g. an identity relation) and partition it into subsets which are described in terms of the schema. Subsets of the given (identity) relation are 𝑃 -indiscernible, for sets of PPNs π‘ƒβˆ— βŠ† β„˜ 𝑃 𝐺𝑛
  19. 19. Indiscernibility 2 (generalized) π‘ƒβˆ— βŠ† β„˜ 𝑃 𝐺𝑛 𝐼𝑁𝐷 π‘ƒβˆ— = π‘₯1 , 𝑦1 , π‘₯2 , 𝑦2 ∈ 𝑆 2 2 𝐺 βˆ€π‘ƒ ∈ π‘ƒβˆ— ( π‘₯1 , 𝑦1 ∈ 𝐼𝑁𝐷 𝑃 ↔ π‘₯2 , 𝑦2 ∈ 𝐼𝑁𝐷(𝑃)} For comparison: 𝑃 βŠ‚ 𝑃 𝐺𝑛 𝐼𝑁𝐷 𝑃 = π‘₯, 𝑦 ∈ 𝑆 2 βˆ€ π‘βˆˆ 𝐺 𝑃 𝑓 𝑝 π‘₯ ≍ 𝑓 𝑝 (𝑦)}
  20. 20. Conclusion Problem: β€’ There is a conflict between semantics and pragmatics of identity. β€’ This will not be fixed in the short term by using extensions to existing logics (e.g. contexts, fuzziness, probability). Solution: β€’ Identify different identity relations automatically, and in terms of the domain predicates (no extra constructs are needed!). β€’ Define the meaning of a specific identity relation in terms of its indiscernibility criteria.

Γ—