Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

How many truths can you handle? Slide 1 How many truths can you handle? Slide 2 How many truths can you handle? Slide 3 How many truths can you handle? Slide 4 How many truths can you handle? Slide 5 How many truths can you handle? Slide 6 How many truths can you handle? Slide 7 How many truths can you handle? Slide 8 How many truths can you handle? Slide 9 How many truths can you handle? Slide 10 How many truths can you handle? Slide 11 How many truths can you handle? Slide 12 How many truths can you handle? Slide 13 How many truths can you handle? Slide 14 How many truths can you handle? Slide 15 How many truths can you handle? Slide 16 How many truths can you handle? Slide 17 How many truths can you handle? Slide 18 How many truths can you handle? Slide 19 How many truths can you handle? Slide 20 How many truths can you handle? Slide 21 How many truths can you handle? Slide 22 How many truths can you handle? Slide 23 How many truths can you handle? Slide 24 How many truths can you handle? Slide 25 How many truths can you handle? Slide 26 How many truths can you handle? Slide 27 How many truths can you handle? Slide 28 How many truths can you handle? Slide 29 How many truths can you handle? Slide 30 How many truths can you handle? Slide 31 How many truths can you handle? Slide 32 How many truths can you handle? Slide 33 How many truths can you handle? Slide 34 How many truths can you handle? Slide 35 How many truths can you handle? Slide 36 How many truths can you handle? Slide 37 How many truths can you handle? Slide 38 How many truths can you handle? Slide 39 How many truths can you handle? Slide 40 How many truths can you handle? Slide 41 How many truths can you handle? Slide 42 How many truths can you handle? Slide 43 How many truths can you handle? Slide 44 How many truths can you handle? Slide 45 How many truths can you handle? Slide 46 How many truths can you handle? Slide 47 How many truths can you handle? Slide 48 How many truths can you handle? Slide 49 How many truths can you handle? Slide 50 How many truths can you handle? Slide 51 How many truths can you handle? Slide 52 How many truths can you handle? Slide 53 How many truths can you handle? Slide 54 How many truths can you handle? Slide 55 How many truths can you handle? Slide 56 How many truths can you handle? Slide 57 How many truths can you handle? Slide 58 How many truths can you handle? Slide 59 How many truths can you handle? Slide 60 How many truths can you handle? Slide 61
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

1 Like

Share

Download to read offline

How many truths can you handle?

Download to read offline

A set of practical strategies and techniques for tackling vagueness in data modeling and creating models that are semantically more accurate and interoperable.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

How many truths can you handle?

  1. 1. Panos Alexopoulos Data and Knowledge Technologies Professional http://www.panosalexopoulos.com p.alexopoulos@gmail.com @PAlexop How many truths can you handle? Strategies and techniques for handling vagueness in conceptual data models
  2. 2. Talk Identity
  3. 3. Conceptual Data Models
  4. 4. Semantics
  5. 5. Semantic gap
  6. 6. Vagueness
  7. 7. What is it and how it differs from other phenomena Guidelines and (automatic) techniques Approaches and trade-offs Why you should care Metrics and methods Topics covered UNDERSTANDING VAGUENESS DETECTING VAGUENESS TACKLING VAGUENESS VAGUENESS RAMIFICATIONS MEASURING VAGUENESS
  8. 8. Understanding Vagueness What it is and what it is not
  9. 9. The Sorites Paradox ● 1 grain of wheat does not make a heap. ● If 1 grain doesn’t make a heap, then 2 grains don’t. ● If 2 grains don’t make a heap, then 3 grains don’t. ● … ● If 999,999 grains don’t make a heap, then 1 million grains don’t. ● Therefore, 1 million grains don’t make a heap!
  10. 10. What is vagueness “Vagueness is a semantic phenomenon where predicates admit borderline cases, namely cases where it is not determinately true that the predicate applies or not” —Shapiro 2006
  11. 11. What is not vagueness AMBIGUITY E.g., “Last week I visited Tripoli” INEXACTNESS E.g., “My height is between 165 and 175 cm” UNCERTAINTY E.g., “The temperature in Amsterdam right now might be 15 degrees”,
  12. 12. Vagueness Types QUANTITATIVE Borderline cases stem from the lack of precise boundaries along some measurable dimension (e.g. “Bald”, “Tall”, “Near”) QUALITATIVE Borderline cases stem from not being able to decide which dimensions and conditions are sufficient and/or necessary for the predicate to apply. (e.g., “Religion”, “Expert”)
  13. 13. Vagueness Ramifications Why should we care
  14. 14. Miscommunication
  15. 15. Disagreements
  16. 16. How would you model this?
  17. 17. Problematic Scenarios USING VAGUE DATA REUSING VAGUE DATA INTEGRATING VAGUE DATA
  18. 18. Detecting Vagueness Where and what to look
  19. 19. How to detect vagueness ● Identify which of your data model’s elements are vague ● Investigate whether these elements are indeed vague. ● Investigate and determine potential dimensions and applicability contexts.
  20. 20. Where to look ● Classes: E.g. “Tall Person”, “Strategic Customer”, “Experienced Researcher” ● Relations and attributes: E.g., “hasGenre”, “hasIdeology” ● Attribute values: E.g., the “price” of a restaurant could take as values the vague terms “cheap”, “moderate” and “expensive”
  21. 21. What to look for ● Vague terms in names and definitions ● Disagreements and inconsistencies among data modelers, domain experts, and data stewards during model development and maintenance ● Disagreements and inconsistencies in user feedback during model application.
  22. 22. Examples from Wordnet Vague senses Non vague senses Yellowish: of the color intermediate between green and orange in the color spectrum, of something resembling the color of an egg yolk. Compound: composed of more than one part Impenitent: impervious to moral persuasion Biweekly: occurring every two weeks. Notorious: known widely and usually unfavorably Outermost: situated at the farthest possible point from a center.
  23. 23. Examples from the Citation Ontology Vague relations Non vague relations plagiarizes: A property indicating that the author of the citing entity plagiarizes the cited entity, by including textual or other elements from the cited entity without formal acknowledgement of their source. sharesAuthorInstitutionWith: Each entity has at least one author that shares a common institutional affiliation with an author of the other entity. citesAsAuthority: The citing entity cites the cited entity as one that provides an authoritative description or definition of the subject under discussion. retracts: The citing entity constitutes a formal retraction of the cited entity. supports: The citing entity provides intellectual or factual support for statements, ideas or conclusions presented in the cited entity. includesExcerptFrom: The citing entity includes one or more excerpts from the cited entity.
  24. 24. Measuring Vagueness Key metrics
  25. 25. Vagueness spread ● The ratio of model elements (classes, relations, datatypes, etc) that are vague ● A data model with a high vagueness spread is less explicit and shareable than an ontology with a low one.
  26. 26. Vagueness intensity ● The degree to which the model’s users disagree on the validity of the (potential) instances of the elements. ● The higher this disagreement is for an element, the more problems the element is likely to cause. ● Calculation: ○ Consider a sample set of vague element instances ○ Have human judges denote whether and to what extent they believe these instances are valid ○ Measure the inter-agreement between users (e.g. by using Cohen’s kappa)
  27. 27. Tackling Vagueness Approaches and trade-offs
  28. 28. Three (complementary) techniques VAGUENESS AWARENESS TRUTH CONTEXTUALIZATIO N TRUTH FUZZIFICATION
  29. 29. Vagueness-aware data models Data models whose vague elements are accompanied by meta-information that describes the nature and characteristics of their vagueness in an explicit way.
  30. 30. E.g. “Tall Person” is vague and “Adult” is non-vague E.g. “Strategic Client" is vague in the dimension of the generated revenue” E.g. “Strategic Client" is vague in the dimension of the generated revenue according to the Financial Manager. E.g. “Low Budget” has quantitative vagueness and “Expert Consultant” qualitative. E.g. “Strategic Client" is vague in the dimension of the generated revenue in the context of Financial Reporting” What to make explicit VAGUENESS EXISTENCE VAGUENESS DIMENSIONS VAGUENESS PROVENANCE VAGUENESS TYPE APPLICABILITY CONTEXTS
  31. 31. A Vagueness Metamodel
  32. 32. Truth contextualization ● The same statement in the data model can be true in some contexts and false in other contexts. ● E.g., “Stephen Curry is short” is true in the context of “Basketball Playing” but false in all others. ● Potential contexts: ○ Cultures ○ Locations ○ Industries ○ Processes ○ Demographics ○ ...
  33. 33. Contextualized poverty
  34. 34. When to contextualize? ● When vagueness intensity is high and consensus is impossible ● When you are able to identify truth contexts ● When the applications that use the model applications can actually handle the contexts. ● When contextualization actually manages to reduce disagreements and have a positive effect to the model’s applications. ● When the contextualization benefits outweigh the context management overhead.
  35. 35. Truth fuzzification ● The basic idea is that we can assign a real number to a vague statement, within a range from 0 to 1. ○ A value of 1 would mean that the statement is completely true ○ A value of 0 that it is completely false ○ Any value in between that it is “partly true” to a given, quantifiable extent. ● For example: ○ “John is an instance of YoungPerson to a degree of 0.8” ○ “Google has Competitor Microsoft B to a degree of 0.4”. ● The premise is that fuzzy degrees can reduce the disagreements around the truth of a vague statement.
  36. 36. Truth degrees are not probabilities● A probability statement is about quantifying the likelihood of events or facts whose truth conditions are well defined to come true ○ e.g., “it will rain tomorrow with a probability of 0.8” ● A fuzzy statement is about quantifying the extent to which events or facts whose truth conditions are undefined to be perceived as true. ○ e.g., “It’s now raining to a degree of 0.6” ● That’s the reason why they are supported by different mathematical frameworks, namely probability theory and fuzzy logic
  37. 37. What fuzzification involves 1. Detect and analyze all vague elements in your model 1. Decide how to fuzzify each element 1. Harvest truth degrees 1. Assess fuzzy model quality 1. Represent fuzzy degrees 1. Apply the fuzzy model
  38. 38. Fuzzification options ● The number and kind of fuzzy degrees you need to acquire for your model’s vague elements depend on the latter’s vagueness type and dimensions. ● If your element has quantitative vagueness in one dimension, then all you need is a fuzzy membership function that maps numerical values of the dimension to fuzzy degrees in the range [0,1]
  39. 39. Fuzzy membership functions
  40. 40. Fuzzy membership functions
  41. 41. Fuzzification options ● If an element has quantitative vagueness in more than one dimensions then you can either: ○ Define a multivariate fuzzy membership function ○ Define one membership function per dimension and then combine these via some fuzzy logic operation, like fuzzy conjunction or fuzzy disjunction
  42. 42. Multivariate fuzzy membership function
  43. 43. Fuzzy conjunction and disjunction
  44. 44. Fuzzification options ● A third option is to just define one direct degree per statement. ○ “John is tall to a degree of 0.8” ○ “Maria is expert in data modeling to a degree of 0.6” ● This approach makes sense when: ○ Your element is vague in too many dimensions and you cannot find a proper membership function, ○ When the element’s vagueness is qualitative and, thus, you have no dimensions to use. ● The drawback is that you will have to harvest a lot of degrees!
  45. 45. Harvesting truth degrees ● Remember that vague statements provoke disagreements and debates among people or even among people and systems. ● To generate fuzzy degrees for these statements you need practically to capture and quantify these disagreements. ● How to capture: ○ Ask people directly ○ Ask people indirectly ○ Mine from data
  46. 46. Explanation and feedback based harvesting
  47. 47. Multiple fuzzy truths ● Even with fuzzification you still may be getting disagreements ● This can be an indication of context-dependence ● Different contexts may require different fuzzy degrees or membership functions ● In other words, contextualization and fuzzification are orthogonal approaches.
  48. 48. Fuzzy model quality ● Main questions you need to consider: ○ Have I fuzzified the correct elements? ○ Are the truth degrees consistent? ○ Are the truth degrees accurate? ○ Is the provenance of the truth degrees well documented? ● Both accuracy and consistency are best treated not as a binary metric but rather as a distance
  49. 49. Fuzzy model representation ● To represent a truth degree for a relation you simply need to define a relation attribute named “truth degree” or similar. ● This is straightforward if you work with E-R models or property graphs, but also possible in RDF or OWL, even if these languages do not directly support relation attributes. ● Things can become more difficult when you need to represent fuzzy membership functions or more complex fuzzy rules and axioms, along with their necessary reasoning support.
  50. 50. Fuzzy model application ● This last step might not look like a semantic modeling task, yet it is a crucial one if you want your fuzzification effort to pay off ● A fuzzy data model can be helpful in: ○ Semantic tagging and disambiguation ○ Semantic search and match ○ Decision support systems ○ Conversational agents (aka chatbots) ● In both cases proper design and adaptation of the underlying algorithms is needed
  51. 51. When to fuzzify? ● Questions you need to consider: ○ Which elements in your model are unavoidably vague? ○ How severe and impactful are the disagreements you (expect to) get on the veracity of these vague elements? ○ Are these disagreements caused by vagueness or other factors?
  52. 52. When to fuzzify? ● Questions you need to consider: ○ If your model’s elements had fuzzy degrees, would you get less disagreement? ○ Are the applications that use the model able to exploit and benefit from truth degrees? ○ Can you develop a scalable way to get and maintain fuzzy degrees that costs less than the benefits they bring you?
  53. 53. How would you tackle this?
  54. 54. How would you tackle this?
  55. 55. ● (Perceived) inaccuracy ● Disagreements and misinterpretations ● Reduced semantic interoperability Take Aways Data and information quality can be negatively affected by vagueness ● It’s how we think and communicate ● Insisting on crispness is unproductive ● But leaving things as-is is also bad. Treating vagueness as noise doesn’t help ● Make your data models Vagueness-Aware ● Contextualize truth ● Fuzzify truth Three complementary weapons to tackle vagueness
  56. 56. Currently writing a book on semantic data modeling To be published by O’Reilly in September 2020 Early release expected at O’Reilly Learning Platform in December 2019 To get news about the book progress and a free preview chapter send me an email to p.alexopoulos@gmail.com
  • aaranged

    Nov. 14, 2019

A set of practical strategies and techniques for tackling vagueness in data modeling and creating models that are semantically more accurate and interoperable.

Views

Total views

1,392

On Slideshare

0

From embeds

0

Number of embeds

20

Actions

Downloads

3

Shares

0

Comments

0

Likes

1

×