Intelligent Data & Knowledge Technologies Professional
Nov. 11, 2019•0 likes•1,326 views
1 of 61
How many truths can you handle?
Nov. 11, 2019•0 likes•1,326 views
Download to read offline
Report
Technology
A set of practical strategies and techniques for tackling vagueness in data modeling and creating models that are semantically more accurate and interoperable.
1. Panos Alexopoulos
Data and Knowledge Technologies
Professional
http://www.panosalexopoulos.com
p.alexopoulos@gmail.com
@PAlexop
How many truths can you handle?
Strategies and techniques for handling vagueness in
conceptual data models
7. What is it and how it differs
from other phenomena
Guidelines and (automatic)
techniques
Approaches and trade-offs
Why you should care Metrics and methods
Topics covered
UNDERSTANDING
VAGUENESS
DETECTING
VAGUENESS
TACKLING
VAGUENESS
VAGUENESS
RAMIFICATIONS
MEASURING
VAGUENESS
9. The Sorites Paradox
● 1 grain of wheat does not make a heap.
● If 1 grain doesn’t make a heap, then 2 grains
don’t.
● If 2 grains don’t make a heap, then 3 grains
don’t.
● …
● If 999,999 grains don’t make a heap, then 1
million grains don’t.
● Therefore, 1 million grains don’t make a
heap!
10. What is vagueness
“Vagueness is a semantic
phenomenon where predicates admit
borderline cases, namely cases where
it is not determinately true that the
predicate applies or not”
—Shapiro 2006
11. What is not vagueness
AMBIGUITY
E.g., “Last week I visited
Tripoli”
INEXACTNESS
E.g., “My height is
between 165 and 175
cm”
UNCERTAINTY
E.g., “The temperature
in Amsterdam right now
might be 15 degrees”,
12. Vagueness Types
QUANTITATIVE
Borderline cases stem from the
lack of precise boundaries along
some measurable dimension
(e.g. “Bald”, “Tall”, “Near”)
QUALITATIVE
Borderline cases stem from not
being able to decide which
dimensions and conditions are
sufficient and/or necessary for
the predicate to apply. (e.g.,
“Religion”, “Expert”)
21. How to detect vagueness
● Identify which of your data model’s
elements are vague
● Investigate whether these elements are
indeed vague.
● Investigate and determine potential
dimensions and applicability contexts.
22. Where to look
● Classes: E.g. “Tall Person”, “Strategic
Customer”, “Experienced Researcher”
● Relations and attributes: E.g., “hasGenre”,
“hasIdeology”
● Attribute values: E.g., the “price” of a
restaurant could take as values the vague
terms “cheap”, “moderate” and “expensive”
23. What to look for
● Vague terms in names and definitions
● Disagreements and inconsistencies among
data modelers, domain experts, and data
stewards during model development and
maintenance
● Disagreements and inconsistencies in user
feedback during model application.
24. Examples from Wordnet
Vague senses Non vague senses
Yellowish: of the color intermediate
between green and orange in the color
spectrum, of something resembling the
color of an egg yolk.
Compound: composed of more than one
part
Impenitent: impervious to moral persuasion Biweekly: occurring every two weeks.
Notorious: known widely and usually
unfavorably
Outermost: situated at the farthest possible
point from a center.
25. Examples from the Citation Ontology
Vague relations Non vague relations
plagiarizes: A property indicating that the
author of the citing entity plagiarizes
the cited entity, by including textual or other
elements from the cited entity
without formal acknowledgement of their
source.
sharesAuthorInstitutionWith: Each entity
has at least one author that shares a
common institutional affiliation with an
author of the other entity.
citesAsAuthority: The citing entity cites the
cited entity as one that provides an
authoritative description or definition of the
subject under discussion.
retracts: The citing entity constitutes a
formal retraction of the cited entity.
supports: The citing entity provides
intellectual or factual support for
statements, ideas or conclusions presented
in the cited entity.
includesExcerptFrom: The citing entity
includes one or more excerpts from the
cited entity.
27. Vagueness spread
● The ratio of model elements (classes,
relations, datatypes, etc) that are vague
● A data model with a high vagueness spread
is less explicit and shareable than an
ontology with a low one.
28. Vagueness intensity
● The degree to which the model’s users disagree
on the validity of the (potential) instances of the
elements.
● The higher this disagreement is for an element,
the more problems the element is likely to cause.
● Calculation:
○ Consider a sample set of vague element
instances
○ Have human judges denote whether and to
what extent they believe these instances are
valid
○ Measure the inter-agreement between users
(e.g. by using Cohen’s kappa)
31. Vagueness-aware data models
Data models whose vague elements
are accompanied by meta-information
that describes the nature and
characteristics of their vagueness in
an explicit way.
32. E.g. “Tall Person” is vague and
“Adult” is non-vague
E.g. “Strategic Client" is vague
in the dimension of the
generated revenue”
E.g. “Strategic Client" is vague
in the dimension of the
generated revenue according
to the Financial Manager.
E.g. “Low Budget” has
quantitative vagueness and
“Expert Consultant” qualitative.
E.g. “Strategic Client" is vague
in the dimension of the
generated revenue in the
context of Financial Reporting”
What to make explicit
VAGUENESS EXISTENCE VAGUENESS DIMENSIONS
VAGUENESS
PROVENANCE
VAGUENESS TYPE
APPLICABILITY
CONTEXTS
34. Truth contextualization
● The same statement in the data model can be true
in some contexts and false in other contexts.
● E.g., “Stephen Curry is short” is true in the context
of “Basketball Playing” but false in all others.
● Potential contexts:
○ Cultures
○ Locations
○ Industries
○ Processes
○ Demographics
○ ...
36. When to contextualize?
● When vagueness intensity is high and consensus
is impossible
● When you are able to identify truth contexts
● When the applications that use the model
applications can actually handle the contexts.
● When contextualization actually manages to
reduce disagreements and have a positive effect
to the model’s applications.
● When the contextualization benefits outweigh the
context management overhead.
37. Truth fuzzification
● The basic idea is that we can assign a real number
to a vague statement, within a range from 0 to 1.
○ A value of 1 would mean that the statement
is completely true
○ A value of 0 that it is completely false
○ Any value in between that it is “partly true” to
a given, quantifiable extent.
● For example:
○ “John is an instance of YoungPerson to a
degree of 0.8”
○ “Google has Competitor Microsoft B to a
degree of 0.4”.
● The premise is that fuzzy degrees can reduce the
disagreements around the truth of a vague
statement.
38. Truth degrees are not
probabilities● A probability statement is about quantifying the
likelihood of events or facts whose truth conditions
are well defined to come true
○ e.g., “it will rain tomorrow with a probability of
0.8”
● A fuzzy statement is about quantifying the extent to
which events or facts whose truth conditions are
undefined to be perceived as true.
○ e.g., “It’s now raining to a degree of 0.6”
● That’s the reason why they are supported by different
mathematical frameworks, namely probability theory
and fuzzy logic
39. What fuzzification involves
1. Detect and analyze all vague elements in your
model
1. Decide how to fuzzify each element
1. Harvest truth degrees
1. Assess fuzzy model quality
1. Represent fuzzy degrees
1. Apply the fuzzy model
40. Fuzzification options
● The number and kind of fuzzy degrees you
need to acquire for your model’s vague
elements depend on the latter’s vagueness
type and dimensions.
● If your element has quantitative vagueness
in one dimension, then all you need is a
fuzzy membership function that maps
numerical values of the dimension to fuzzy
degrees in the range [0,1]
43. Fuzzification options
● If an element has quantitative vagueness in
more than one dimensions then you can
either:
○ Define a multivariate fuzzy
membership function
○ Define one membership function per
dimension and then combine these via
some fuzzy logic operation, like fuzzy
conjunction or fuzzy disjunction
46. Fuzzification options
● A third option is to just define one direct degree
per statement.
○ “John is tall to a degree of 0.8”
○ “Maria is expert in data modeling to a degree
of 0.6”
● This approach makes sense when:
○ Your element is vague in too many
dimensions and you cannot find a proper
membership function,
○ When the element’s vagueness is qualitative
and, thus, you have no dimensions to use.
● The drawback is that you will have to harvest a lot
of degrees!
47. Harvesting truth degrees
● Remember that vague statements provoke
disagreements and debates among people or
even among people and systems.
● To generate fuzzy degrees for these statements
you need practically to capture and quantify these
disagreements.
● How to capture:
○ Ask people directly
○ Ask people indirectly
○ Mine from data
49. Multiple fuzzy truths
● Even with fuzzification you still may be getting
disagreements
● This can be an indication of context-dependence
● Different contexts may require different fuzzy
degrees or membership functions
● In other words, contextualization and fuzzification
are orthogonal approaches.
50. Fuzzy model quality
● Main questions you need to consider:
○ Have I fuzzified the correct elements?
○ Are the truth degrees consistent?
○ Are the truth degrees accurate?
○ Is the provenance of the truth degrees well
documented?
● Both accuracy and consistency are best treated
not as a binary metric but rather as a distance
51. Fuzzy model representation
● To represent a truth degree for a relation you
simply need to define a relation attribute named
“truth degree” or similar.
● This is straightforward if you work with E-R
models or property graphs, but also possible in
RDF or OWL, even if these languages do not
directly support relation attributes.
● Things can become more difficult when you need
to represent fuzzy membership functions or more
complex fuzzy rules and axioms, along with their
necessary reasoning support.
52. Fuzzy model application
● This last step might not look like a semantic
modeling task, yet it is a crucial one if you want
your fuzzification effort to pay off
● A fuzzy data model can be helpful in:
○ Semantic tagging and disambiguation
○ Semantic search and match
○ Decision support systems
○ Conversational agents (aka chatbots)
● In both cases proper design and adaptation of
the underlying algorithms is needed
53. When to fuzzify?
● Questions you need to consider:
○ Which elements in your model are
unavoidably vague?
○ How severe and impactful are the
disagreements you (expect to) get on the
veracity of these vague elements?
○ Are these disagreements caused by
vagueness or other factors?
54. When to fuzzify?
● Questions you need to consider:
○ If your model’s elements had fuzzy degrees,
would you get less disagreement?
○ Are the applications that use the model able
to exploit and benefit from truth degrees?
○ Can you develop a scalable way to get and
maintain fuzzy degrees that costs less than
the benefits they bring you?
58. ● (Perceived) inaccuracy
● Disagreements and
misinterpretations
● Reduced semantic
interoperability
Take Aways
Data and information
quality can be negatively
affected by vagueness
● It’s how we think and
communicate
● Insisting on crispness is
unproductive
● But leaving things as-is is
also bad.
Treating vagueness as
noise doesn’t help
● Make your data models
Vagueness-Aware
● Contextualize truth
● Fuzzify truth
Three complementary
weapons to tackle
vagueness
60. Currently writing a book on
semantic data modeling
To be published by O’Reilly in
September 2020
Early release expected at O’Reilly
Learning Platform in December
2019
To get news about the book
progress and a free preview
chapter send me an email to
p.alexopoulos@gmail.com