Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How many truths can you handle?

554 views

Published on

A set of practical strategies and techniques for tackling vagueness in data modeling and creating models that are semantically more accurate and interoperable.

Published in: Technology
  • Be the first to comment

How many truths can you handle?

  1. 1. Panos Alexopoulos Data and Knowledge Technologies Professional http://www.panosalexopoulos.com p.alexopoulos@gmail.com @PAlexop How many truths can you handle? Strategies and techniques for handling vagueness in conceptual data models
  2. 2. Talk Identity
  3. 3. Conceptual Data Models
  4. 4. Semantics
  5. 5. Semantic gap
  6. 6. Vagueness
  7. 7. What is it and how it differs from other phenomena Guidelines and (automatic) techniques Approaches and trade-offs Why you should care Metrics and methods Topics covered UNDERSTANDING VAGUENESS DETECTING VAGUENESS TACKLING VAGUENESS VAGUENESS RAMIFICATIONS MEASURING VAGUENESS
  8. 8. Understanding Vagueness What it is and what it is not
  9. 9. The Sorites Paradox ● 1 grain of wheat does not make a heap. ● If 1 grain doesn’t make a heap, then 2 grains don’t. ● If 2 grains don’t make a heap, then 3 grains don’t. ● … ● If 999,999 grains don’t make a heap, then 1 million grains don’t. ● Therefore, 1 million grains don’t make a heap!
  10. 10. What is vagueness “Vagueness is a semantic phenomenon where predicates admit borderline cases, namely cases where it is not determinately true that the predicate applies or not” —Shapiro 2006
  11. 11. What is not vagueness AMBIGUITY E.g., “Last week I visited Tripoli” INEXACTNESS E.g., “My height is between 165 and 175 cm” UNCERTAINTY E.g., “The temperature in Amsterdam right now might be 15 degrees”,
  12. 12. Vagueness Types QUANTITATIVE Borderline cases stem from the lack of precise boundaries along some measurable dimension (e.g. “Bald”, “Tall”, “Near”) QUALITATIVE Borderline cases stem from not being able to decide which dimensions and conditions are sufficient and/or necessary for the predicate to apply. (e.g., “Religion”, “Expert”)
  13. 13. Vagueness Ramifications Why should we care
  14. 14. Miscommunication
  15. 15. Disagreements
  16. 16. How would you model this?
  17. 17. Problematic Scenarios USING VAGUE DATA REUSING VAGUE DATA INTEGRATING VAGUE DATA
  18. 18. Detecting Vagueness Where and what to look
  19. 19. How to detect vagueness ● Identify which of your data model’s elements are vague ● Investigate whether these elements are indeed vague. ● Investigate and determine potential dimensions and applicability contexts.
  20. 20. Where to look ● Classes: E.g. “Tall Person”, “Strategic Customer”, “Experienced Researcher” ● Relations and attributes: E.g., “hasGenre”, “hasIdeology” ● Attribute values: E.g., the “price” of a restaurant could take as values the vague terms “cheap”, “moderate” and “expensive”
  21. 21. What to look for ● Vague terms in names and definitions ● Disagreements and inconsistencies among data modelers, domain experts, and data stewards during model development and maintenance ● Disagreements and inconsistencies in user feedback during model application.
  22. 22. Examples from Wordnet Vague senses Non vague senses Yellowish: of the color intermediate between green and orange in the color spectrum, of something resembling the color of an egg yolk. Compound: composed of more than one part Impenitent: impervious to moral persuasion Biweekly: occurring every two weeks. Notorious: known widely and usually unfavorably Outermost: situated at the farthest possible point from a center.
  23. 23. Examples from the Citation Ontology Vague relations Non vague relations plagiarizes: A property indicating that the author of the citing entity plagiarizes the cited entity, by including textual or other elements from the cited entity without formal acknowledgement of their source. sharesAuthorInstitutionWith: Each entity has at least one author that shares a common institutional affiliation with an author of the other entity. citesAsAuthority: The citing entity cites the cited entity as one that provides an authoritative description or definition of the subject under discussion. retracts: The citing entity constitutes a formal retraction of the cited entity. supports: The citing entity provides intellectual or factual support for statements, ideas or conclusions presented in the cited entity. includesExcerptFrom: The citing entity includes one or more excerpts from the cited entity.
  24. 24. Measuring Vagueness Key metrics
  25. 25. Vagueness spread ● The ratio of model elements (classes, relations, datatypes, etc) that are vague ● A data model with a high vagueness spread is less explicit and shareable than an ontology with a low one.
  26. 26. Vagueness intensity ● The degree to which the model’s users disagree on the validity of the (potential) instances of the elements. ● The higher this disagreement is for an element, the more problems the element is likely to cause. ● Calculation: ○ Consider a sample set of vague element instances ○ Have human judges denote whether and to what extent they believe these instances are valid ○ Measure the inter-agreement between users (e.g. by using Cohen’s kappa)
  27. 27. Tackling Vagueness Approaches and trade-offs
  28. 28. Three (complementary) techniques VAGUENESS AWARENESS TRUTH CONTEXTUALIZATIO N TRUTH FUZZIFICATION
  29. 29. Vagueness-aware data models Data models whose vague elements are accompanied by meta-information that describes the nature and characteristics of their vagueness in an explicit way.
  30. 30. E.g. “Tall Person” is vague and “Adult” is non-vague E.g. “Strategic Client" is vague in the dimension of the generated revenue” E.g. “Strategic Client" is vague in the dimension of the generated revenue according to the Financial Manager. E.g. “Low Budget” has quantitative vagueness and “Expert Consultant” qualitative. E.g. “Strategic Client" is vague in the dimension of the generated revenue in the context of Financial Reporting” What to make explicit VAGUENESS EXISTENCE VAGUENESS DIMENSIONS VAGUENESS PROVENANCE VAGUENESS TYPE APPLICABILITY CONTEXTS
  31. 31. A Vagueness Metamodel
  32. 32. Truth contextualization ● The same statement in the data model can be true in some contexts and false in other contexts. ● E.g., “Stephen Curry is short” is true in the context of “Basketball Playing” but false in all others. ● Potential contexts: ○ Cultures ○ Locations ○ Industries ○ Processes ○ Demographics ○ ...
  33. 33. Contextualized poverty
  34. 34. When to contextualize? ● When vagueness intensity is high and consensus is impossible ● When you are able to identify truth contexts ● When the applications that use the model applications can actually handle the contexts. ● When contextualization actually manages to reduce disagreements and have a positive effect to the model’s applications. ● When the contextualization benefits outweigh the context management overhead.
  35. 35. Truth fuzzification ● The basic idea is that we can assign a real number to a vague statement, within a range from 0 to 1. ○ A value of 1 would mean that the statement is completely true ○ A value of 0 that it is completely false ○ Any value in between that it is “partly true” to a given, quantifiable extent. ● For example: ○ “John is an instance of YoungPerson to a degree of 0.8” ○ “Google has Competitor Microsoft B to a degree of 0.4”. ● The premise is that fuzzy degrees can reduce the disagreements around the truth of a vague statement.
  36. 36. Truth degrees are not probabilities● A probability statement is about quantifying the likelihood of events or facts whose truth conditions are well defined to come true ○ e.g., “it will rain tomorrow with a probability of 0.8” ● A fuzzy statement is about quantifying the extent to which events or facts whose truth conditions are undefined to be perceived as true. ○ e.g., “It’s now raining to a degree of 0.6” ● That’s the reason why they are supported by different mathematical frameworks, namely probability theory and fuzzy logic
  37. 37. What fuzzification involves 1. Detect and analyze all vague elements in your model 1. Decide how to fuzzify each element 1. Harvest truth degrees 1. Assess fuzzy model quality 1. Represent fuzzy degrees 1. Apply the fuzzy model
  38. 38. Fuzzification options ● The number and kind of fuzzy degrees you need to acquire for your model’s vague elements depend on the latter’s vagueness type and dimensions. ● If your element has quantitative vagueness in one dimension, then all you need is a fuzzy membership function that maps numerical values of the dimension to fuzzy degrees in the range [0,1]
  39. 39. Fuzzy membership functions
  40. 40. Fuzzy membership functions
  41. 41. Fuzzification options ● If an element has quantitative vagueness in more than one dimensions then you can either: ○ Define a multivariate fuzzy membership function ○ Define one membership function per dimension and then combine these via some fuzzy logic operation, like fuzzy conjunction or fuzzy disjunction
  42. 42. Multivariate fuzzy membership function
  43. 43. Fuzzy conjunction and disjunction
  44. 44. Fuzzification options ● A third option is to just define one direct degree per statement. ○ “John is tall to a degree of 0.8” ○ “Maria is expert in data modeling to a degree of 0.6” ● This approach makes sense when: ○ Your element is vague in too many dimensions and you cannot find a proper membership function, ○ When the element’s vagueness is qualitative and, thus, you have no dimensions to use. ● The drawback is that you will have to harvest a lot of degrees!
  45. 45. Harvesting truth degrees ● Remember that vague statements provoke disagreements and debates among people or even among people and systems. ● To generate fuzzy degrees for these statements you need practically to capture and quantify these disagreements. ● How to capture: ○ Ask people directly ○ Ask people indirectly ○ Mine from data
  46. 46. Explanation and feedback based harvesting
  47. 47. Multiple fuzzy truths ● Even with fuzzification you still may be getting disagreements ● This can be an indication of context-dependence ● Different contexts may require different fuzzy degrees or membership functions ● In other words, contextualization and fuzzification are orthogonal approaches.
  48. 48. Fuzzy model quality ● Main questions you need to consider: ○ Have I fuzzified the correct elements? ○ Are the truth degrees consistent? ○ Are the truth degrees accurate? ○ Is the provenance of the truth degrees well documented? ● Both accuracy and consistency are best treated not as a binary metric but rather as a distance
  49. 49. Fuzzy model representation ● To represent a truth degree for a relation you simply need to define a relation attribute named “truth degree” or similar. ● This is straightforward if you work with E-R models or property graphs, but also possible in RDF or OWL, even if these languages do not directly support relation attributes. ● Things can become more difficult when you need to represent fuzzy membership functions or more complex fuzzy rules and axioms, along with their necessary reasoning support.
  50. 50. Fuzzy model application ● This last step might not look like a semantic modeling task, yet it is a crucial one if you want your fuzzification effort to pay off ● A fuzzy data model can be helpful in: ○ Semantic tagging and disambiguation ○ Semantic search and match ○ Decision support systems ○ Conversational agents (aka chatbots) ● In both cases proper design and adaptation of the underlying algorithms is needed
  51. 51. When to fuzzify? ● Questions you need to consider: ○ Which elements in your model are unavoidably vague? ○ How severe and impactful are the disagreements you (expect to) get on the veracity of these vague elements? ○ Are these disagreements caused by vagueness or other factors?
  52. 52. When to fuzzify? ● Questions you need to consider: ○ If your model’s elements had fuzzy degrees, would you get less disagreement? ○ Are the applications that use the model able to exploit and benefit from truth degrees? ○ Can you develop a scalable way to get and maintain fuzzy degrees that costs less than the benefits they bring you?
  53. 53. How would you tackle this?
  54. 54. How would you tackle this?
  55. 55. ● (Perceived) inaccuracy ● Disagreements and misinterpretations ● Reduced semantic interoperability Take Aways Data and information quality can be negatively affected by vagueness ● It’s how we think and communicate ● Insisting on crispness is unproductive ● But leaving things as-is is also bad. Treating vagueness as noise doesn’t help ● Make your data models Vagueness-Aware ● Contextualize truth ● Fuzzify truth Three complementary weapons to tackle vagueness
  56. 56. Currently writing a book on semantic data modeling To be published by O’Reilly in September 2020 Early release expected at O’Reilly Learning Platform in December 2019 To get news about the book progress and a free preview chapter send me an email to p.alexopoulos@gmail.com

×