Objective Fiction
The semantic construction of (web) reality

Aldo Gangemi
aldo.gangemi@cnr.it, @lipn.univ-paris13.fr
RCLN...
My arguments
Semantic technologies only sparsely
address real semantic phenomena
My arguments
Semantic technologies only sparsely
address real semantic phenomena
Semantic Framing is not explicit
My arguments
Semantic technologies only sparsely
address real semantic phenomena
Semantic Framing is not explicit
We need ...
Objectivity or fiction?

• Objective fiction!
• Data scientists contribute to build the reality
we live in

• The Web gives ...
Objective fiction
Bare fact
Berlusconi has been sentenced for tax fraud

Opinion
I like the decision of Berlusconi’s trial ...
Semantic technologies have progressed
significantly, but they still miss most relevant
semantic phenomena in social life
How to synthetically tell what a text is about, in
the open domain, i.e. without specific training?
What is its core meanin...
How to analytically mashup data in relevant
ways, in the open domain, i.e. without
someone putting the necessary intellige...
A sample correlation fallacy

• In 2010, a data scientist presented an
analysis of Facebook status updates

• Focus was on...
Interpretation of the speaker
“relationships melt down because of the
stress of spending time together”

Reported by: Has ...
Question by an attendee
“maybe not about relationships, but the end of
terms, e.g. broke up for Christmas last Tuesday”

R...
Partly open problems

•

Data integration interpretation without
designers (risk of correlation fallacy)

•

Opportunistic...
Human, social
knowledge management
is not exempt from
framing: it is modulated
by frames, metaphors,
and stories that make...
People can be aware of framing ...

Pareidolia
... but often they are not
Think about how to frame an issue using your value
For example, if the issue is poverty and you...
... but often they are not
Think about how to frame an issue using your value
For example, if the issue is poverty and you...
Political reframing

•

Conservatives say

•
•

We need tax relief

•

Same sex marriage will
undermine family life

•

Tr...
I’m in good company
•

Myself (you never know!) (ESWC2009 keynote): knowledge patterns as
objects of empirical investigati...
... on the shoulders of

• Köhler, Bartlett, Piaget, Fillmore, Minsky, and
many cognitive and neuro- scientists ...
Hypotheses from cognitive
semantics
Cf. 2012 Dagstuhl Seminar on Cognitive
Science for the Semantic Web
Reality is “framed” by
our prior knowledge,
expectations,
opportunities, goals,
which are organized as
conceptual frames a...
Public
services

Becoming
visible
Desirability

Quantity
IS Vertical
position

Price per
unit

Arithmetic
commutative

Ris...
Frames are part of our biology
Frames create our individual counterpart to reality

They are activated in neural binding, ...
• Semantics of social reality is implemented
in media applications but it is hard-coded

• It remains in the mind of desig...
Semantic expressivity?

•

Is our semantics enough to support extraction,
representation, and harnessing of social semanti...
Administrative
frames

Geographic
frames

Communication
frames

DBpedia
When triplifying Wikipedia
infoboxes, its designer...
The case of Infoboxes

•

Infobox framing is missing in the DBpedia
ontology too

•

If we mine the ontology to check what...
Interaction semantics

• Interfaces and interaction patterns convey
frame semantics

•

Schema induction and triplification...
Cognitive principles of KR on the Web
Empirical Conservativeness

Relevance in Modeling

Structured Provenance
To be added...
Cognitive principles of KR on the Web
Empirical Conservativeness
Relevance in Modeling
Structured Provenance

To be added ...
Empirical conservativeness

What is present with a function in (evident,
extracted, emerging) empirical data should be
pre...
Empirical conservativeness

• It is a measure against “oversimplification”
• The case of Infobox framing loss is a sample
v...
Special case

• Keeping interaction boundaries is a special
case of empirical conservativeness

• Like neural binding (at ...
Is there anything like that in OWL or RDF?
Maybe ontology modules, classes, named
graphs, hasKey axioms
Not specific to the...
• With infoboxes, we are still discussing a
case of basic semantics, since it is fully
presented in data

• More cases fro...
• Those relations are partly implicit, and the

modeling practices we use on SW/LOD are
not designed for that, contra Mins...
More semantics or more distinctions?

•

I am not advocating for “more semantics” in
terms of complexity, rather for more ...
More semantics or more distinctions?

• Fixation on classes goes with a trade-off
•

Classes need to be distinguished in t...
Class types?
Types of classes have been distinguished in the past

•
•

AI: sorts and types

•

OWL2 punning: arbitrary ty...
Pragmatic meta-classes

• How about distinguishing classes that

implement interesting social relations?

• This classifica...
Relevance in modeling

When modeling a class, its design motivation
(relevance) must be explicit: it should be typed
with ...
Relevance in modelling

• “What’s special in that class?”
•

E.g., it’s central in the data, it’s a frame, it’s an n-ary
r...
• Top-down principles do not work unless
people adopt them or there are
procedures to discover them in data

•
•

Dissemin...
Structured provenance

When merging RDF triples, they should come
with their provenance data
The RDF mix case (sig.ma)
Some results from STLab
http://wit.istc.cnr.it/stlab-tools/
We (STLab and RCLN) are researching on

Discovering

Reengineering
knowledge patterns ...
A broader vision: knowledge
patterns and their façades
Informal
graphs

•

DL & Rule
patterns

Data patterns

Cognitive
pa...
Description:
Compatibility
scenario
Assessment layer

A pattern framework:
Descriptions & Situations

satisfies

Situation...
Layered pattern morphisms
An ontology design pattern describes a formal expression that
can be exemplified, morphed, instan...
N-ary patterns in KR
• Temporal indexing pattern
– (R(a,b))+t sentence indexing
• quads, external time stamps

– R(a,b)+t ...
Centrality discovery in datasets
mo:Track

mo:Playlist

mo:track

mo:Record
mo:available_as

mo:MusicAr.st

mo:ED2K
mo:ava...
Encyclopedic Knowledge Patterns:
example
•
•

An Encyclopedic Knowledge Pattern (EKP) is discovered from the paths
emergin...
Encyclopedic KP: input data
• Wikipedia page links generate 107.9M triples
• Infobox-based triples are 13.6M, including da...
Encyclopedic KP: input data
• Wikipedia page links generate 107.9M triples
• Infobox-based triples are 13.6M, including da...
Encyclopedic KP: input data
• Wikipedia page links generate 107.9M triples
• Infobox-based triples are 13.6M, including da...
Encyclopedic KP: input data
• Wikipedia page links generate 107.9M triples
• Infobox-based triples are 13.6M, including da...
k-means clustering
on Path Popularity

Sample distribution of pathPopularity for DBpedia
paths. The y-axis indicates how m...
k-means clustering
on Path Popularity

Sample distribution of pathPopularity for DBpedia
paths. The y-axis indicates how m...
k-means clustering
on Path Popularity

Sample distribution of pathPopularity for DBpedia
paths. The y-axis indicates how m...
k-means clustering
on Path Popularity

Sample distribution of pathPopularity for DBpedia
paths. The y-axis indicates how m...
k-means clustering
on Path Popularity

Sample distribution of pathPopularity for DBpedia
paths. The y-axis indicates how m...
Serendipity in exploratory browsing
http://www.aemoo.org

Andrea	
  Giovanni	
  Nuzzolese,	
  Valen&na	
  Presu,,	
  Aldo	...
Machine reading with FRED
http://wit.istc.cnr.it/stlab-tools/fred/

Valen&na	
  Presu,,	
  Francesco	
  Draicchio,	
  Aldo...
Event and Frames from text
http://wit.istc.cnr.it/stlab-tools/fred/
The	
  New	
  York	
  Times	
  reported	
  that	
  Joh...
Sentic frames from text
http://wit.istc.cnr.it/stlab-tools/sentilo
Sentic frames from text
http://wit.istc.cnr.it/stlab-tools/sentilo
Paul	
  Newman	
  thinks	
  that	
  Barack	
  Obama	
  ...
But beware “patternicity”
Psychopathology of big data

• Pattern recognition vs. patternicity:
•
•
•
•
•

Simple correlation fallacy
Synchronicity
P...
Web helps spreading patternicity

• More information
• Faster spread of information
• More difficult provenance checking
Finally, back to the objective
fiction problem
Logic, ontologies, and data design practices
construct a reality
Similarly t...
The reality currently built by triple-based
languages is basic
It looks more like a quiz-show than fullfledged social reali...
• On the contrary, the reality constructed by
current institutions and media is a glorious
“objective fiction”

• A powerfu...
• Our use of semantics should try to

approximate the level of sophistication that
objective fiction has gathered to now

•...
•

When simple objective data are provided, its
semantic representation is too simple to
provide added value to users

•

...
That's very difficult, but the alternative is leaving
real semantics to spin doctors
•

Worse, simple triple-based semantics gives them
more power

•

Transparency and distributed decision making
being easil...
Thanks for your attention!
Upcoming SlideShare
Loading in …5
×

Objective Fiction, i-semantics keynote

812 views
741 views

Published on

"Objective fiction: the semantic construction of web reality" talks about current challenges for semantic technologies, and the Semantic Web in particular, focusing on cognitive and social dimensions of human semantics.

Published in: Technology, Education
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total views
812
On SlideShare
0
From Embeds
0
Number of Embeds
31
Actions
Shares
0
Downloads
20
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Objective Fiction, i-semantics keynote

  1. 1. Objective Fiction The semantic construction of (web) reality Aldo Gangemi aldo.gangemi@cnr.it, @lipn.univ-paris13.fr RCLN, LIPN Université Paris 13, UMR CNRS, Sorbonne Cité Semantic Technology Lab, Institute for Cognitive Sciences, CNR, Rome, Italy Work described jointly with STLab people in the last years: Valentina Presutti, Francesco Draicchio, Andrea Nuzzolese, Diego Reforgiato Alessandro Adamou, Eva Blomqvist, Enrico Daga, Alfio Gliozzo, Alberto Musetti
  2. 2. My arguments Semantic technologies only sparsely address real semantic phenomena
  3. 3. My arguments Semantic technologies only sparsely address real semantic phenomena Semantic Framing is not explicit
  4. 4. My arguments Semantic technologies only sparsely address real semantic phenomena Semantic Framing is not explicit We need a semantic data science
  5. 5. Objectivity or fiction? • Objective fiction! • Data scientists contribute to build the reality we live in • The Web gives them more empowerment • Semantics should give them even more • Is that happening?
  6. 6. Objective fiction Bare fact Berlusconi has been sentenced for tax fraud Opinion I like the decision of Berlusconi’s trial court Framing The arrest of Berlusconi’s would be an offense to millions of voters Like Al Capone, he’s been sentenced for the least severe crime Story-based repositioning Berlusconi’s sentencing is like the story of Enzo Tortora’s (a talk show host on Italian television, who was falsely accused of drug trafficking) Disinformation The law that allows banning Berlusconi from public offices is unconstitutional A snakes and ladders game about Berlusconi’s conviction
  7. 7. Semantic technologies have progressed significantly, but they still miss most relevant semantic phenomena in social life
  8. 8. How to synthetically tell what a text is about, in the open domain, i.e. without specific training? What is its core meaning, emotional content, position in the evolution of its topic, etc.?
  9. 9. How to analytically mashup data in relevant ways, in the open domain, i.e. without someone putting the necessary intelligence within?
  10. 10. A sample correlation fallacy • In 2010, a data scientist presented an analysis of Facebook status updates • Focus was on terms break up, broken up • Results show a curve that peaks in early summer and before Christmas Reported by: Has the Internet changed Science?, by E. Pisani, Prospect, November 2010
  11. 11. Interpretation of the speaker “relationships melt down because of the stress of spending time together” Reported by: Has the Internet changed Science?, by E. Pisani, Prospect, November 2010
  12. 12. Question by an attendee “maybe not about relationships, but the end of terms, e.g. broke up for Christmas last Tuesday” Reported by: Has the Internet changed Science?, by E. Pisani, Prospect, November 2010
  13. 13. Partly open problems • Data integration interpretation without designers (risk of correlation fallacy) • Opportunistic reasoning: travel planning, financial opportunities, team building, etc. • • • Smart text summarization Opinion mining on the right spots Domain dynamics: science evolution, scholar changes, market dynamics, ...
  14. 14. Human, social knowledge management is not exempt from framing: it is modulated by frames, metaphors, and stories that make something relevant through neural activation patterns
  15. 15. People can be aware of framing ... Pareidolia
  16. 16. ... but often they are not Think about how to frame an issue using your value For example, if the issue is poverty and your value is protection From a Foot Print Strategies Inc. spin doctor’s presentation
  17. 17. ... but often they are not Think about how to frame an issue using your value For example, if the issue is poverty and your value is protection “Every family deserves a safe and healthy place to live” Being_at_risk frame (FrameNet) Restructured from a Foot Print Strategies Inc. spin doctor’s presentation
  18. 18. Political reframing • Conservatives say • • We need tax relief • Same sex marriage will undermine family life • Trial lawyer. Frivolous lawsuits We need a strong president to protect us • Progressives say • • Taxes are investments • Marriage is the realization of love in a lifetime commitment • Public protection attorney We have a weak president who didn’t protect us Restructured from a Foot Print Strategies Inc. spin doctor’s presentation
  19. 19. I’m in good company • Myself (you never know!) (ESWC2009 keynote): knowledge patterns as objects of empirical investigation • Frank Van Harmelen (ISWC2011 keynote): route to empirical research: data science, data patterns • Martin Hepp (EKAW2012 keynote): web semantics not necessarily coincident with DL and traditional OE • David Karger (ESWC2013 keynote): what can the SW do for average users? Not much until now • Enrico Motta (ESWC2013 keynote): what semantics in the current SW? Different forces, still faith in good old logic • John Sowa (SemTech2013 lecture): patterns exist at different levels of data, ontologies, and reality
  20. 20. ... on the shoulders of • Köhler, Bartlett, Piaget, Fillmore, Minsky, and many cognitive and neuro- scientists ...
  21. 21. Hypotheses from cognitive semantics Cf. 2012 Dagstuhl Seminar on Cognitive Science for the Semantic Web
  22. 22. Reality is “framed” by our prior knowledge, expectations, opportunities, goals, which are organized as conceptual frames and stories, and linked to our emotions
  23. 23. Public services Becoming visible Desirability Quantity IS Vertical position Price per unit Arithmetic commutative Risky situation Institution IS Family Frames are diverse Precariousness more or less abstract, complex, metaphoric, or specific Criminal investigation Partwhole many of them stay unconscious most of the time Affectivity IS Temperature Being ‘mbari in Catania Family in Italian Law Facebook Timeline format Address microformat most are evident in human artifacts they play a role in narratives (idealized stories) used to drive our decisions and motivate our plans Cf. George Lakoff ’s The Political Mind
  24. 24. Frames are part of our biology Frames create our individual counterpart to reality They are activated in neural binding, emotional paths, somatic markers and mirror neurons Neural binding allows to “connect the pieces” that come from perception, recall, abstraction, and imagination Neural binding works according to emotional paths (dopaminergic, noradrenergic), linked to narratives and frames (hypothesis) Mirror neurons activate the same circuitry for actual perception, recall of perception, abstraction of perception, and imagination of new perceptions
  25. 25. • Semantics of social reality is implemented in media applications but it is hard-coded • It remains in the mind of designers • Semantic Web is overlooking to make it explicit
  26. 26. Semantic expressivity? • Is our semantics enough to support extraction, representation, and harnessing of social semantics? • • triples are simple structures classes represent arbitrary concepts where is cognitive adequacy? when does a class represent arbitrary data, and when is it a counterpart of a human knowledge pattern? is that difference important in general?
  27. 27. Administrative frames Geographic frames Communication frames DBpedia When triplifying Wikipedia infoboxes, its designers lost the framing of boxes and internal sub-boxes
  28. 28. The case of Infoboxes • Infobox framing is missing in the DBpedia ontology too • If we mine the ontology to check what properties can be applied to what classes, the result is partial and often non-correspondent to the original frame • Scraping heuristics may be more cognitivelysound ...
  29. 29. Interaction semantics • Interfaces and interaction patterns convey frame semantics • Schema induction and triplification of databases can be improved by exploiting interfaces exposing data, cf. data.cnr.it ontology design • HTML pages and stylesheets contain a lot of framing knowledge, cf. Craig Knoblock’s work • Infographics can change the way we interpret the same data
  30. 30. Cognitive principles of KR on the Web Empirical Conservativeness Relevance in Modeling Structured Provenance To be added to LOD principles
  31. 31. Cognitive principles of KR on the Web Empirical Conservativeness Relevance in Modeling Structured Provenance To be added to LOD principles
  32. 32. Empirical conservativeness What is present with a function in (evident, extracted, emerging) empirical data should be preserved in its semantic representation
  33. 33. Empirical conservativeness • It is a measure against “oversimplification” • The case of Infobox framing loss is a sample violation of this principle
  34. 34. Special case • Keeping interaction boundaries is a special case of empirical conservativeness • Like neural binding (at the neural level) and linguistic framing (at the cognitive level), relevant boundaries of logical representations need to be represented Cf. original Marvin Minsky’s frames: “representations that mirror cognitive mechanisms”
  35. 35. Is there anything like that in OWL or RDF? Maybe ontology modules, classes, named graphs, hasKey axioms Not specific to the boundary problem, nor to framing or neural binding *Very recent: new spec for named graphs accepts typing
  36. 36. • With infoboxes, we are still discussing a case of basic semantics, since it is fully presented in data • More cases from social reality include slippery relations: counting as, intentionality, action schemes, normative constraints, emerging patterns, frames, metaphors, irony, stories, socio-technical task semantics
  37. 37. • Those relations are partly implicit, and the modeling practices we use on SW/LOD are not designed for that, contra Minsky
  38. 38. More semantics or more distinctions? • I am not advocating for “more semantics” in terms of complexity, rather for more distinctions • • Human knowledge is relational in nature • Classes are powerful primitives in logical languages, specially in description logics and triple-based languages We need n-ary and multigrade relations, but arbitrary relations would be too much in current KR scenarios, then we can use them with smart reification patterns
  39. 39. More semantics or more distinctions? • Fixation on classes goes with a trade-off • Classes need to be distinguished in terms of design • Class-oriented representation needs a “push-up” to partly recover the lost structure
  40. 40. Class types? Types of classes have been distinguished in the past • • AI: sorts and types • OWL2 punning: arbitrary typing of classes Formal Ontology: OntoClean metaclasses, based on formal criteria Solutions span between the two extremes: heavy principles (OntoClean) - no principle at all (punning)
  41. 41. Pragmatic meta-classes • How about distinguishing classes that implement interesting social relations? • This classification should produce “reference frames” when querying, reasoning, or reusing data and ontologies, as well as when aligning, extracting, and discovering data and ontologies
  42. 42. Relevance in modeling When modeling a class, its design motivation (relevance) must be explicit: it should be typed with the reason for that relevance
  43. 43. Relevance in modelling • “What’s special in that class?” • E.g., it’s central in the data, it’s a frame, it’s an n-ary reification mechanism, it’s the result of a discovery algorithm, etc. • A new vocabulary for metaclasses?
  44. 44. • Top-down principles do not work unless people adopt them or there are procedures to discover them in data • • Disseminating a good practice A research program of discovering and reusing class types for the common good
  45. 45. Structured provenance When merging RDF triples, they should come with their provenance data
  46. 46. The RDF mix case (sig.ma)
  47. 47. Some results from STLab
  48. 48. http://wit.istc.cnr.it/stlab-tools/ We (STLab and RCLN) are researching on Discovering Reengineering knowledge patterns as keys for accessing meaning on the Web Collecting Using
  49. 49. A broader vision: knowledge patterns and their façades Informal graphs • DL & Rule patterns Data patterns Cognitive patterns The Knowledge Pattern (KP) Model is a discovery and analogy structure <C,L,I,D,W,T,V,X,F,S,R,U> such that a KP emerges out of invariances across multiple façades: • • • • • • • • • • • • C: concept graphs L: logical forms (some axiomatization of multigrade predicates) Ontology design patterns I: local inference rules (either classical or approximate) D: (relational) data W: linguistic data T: social tagging data Lexical and NLP data patterns V: interaction/visualization/formatting structures Socio-patterns X: provenance data F: framing information S: sentic information R: relations to other KP Web and interaction patterns U: use cases Task patterns All façades can provide features for stochastic processes All façades should be encoded in RDF for interoperability and joint reasoning Aldo Gangemi,Valentina Presutti: Towards a pattern science for the Semantic Web. Semantic Web 1(1-2): 61-68 (2010)
  50. 50. Description: Compatibility scenario Assessment layer A pattern framework: Descriptions & Situations satisfies Situation: Entrenchment Case hasSetting hasSetting Description: Norm Normative layer hasSetting satisfies Situation: Case in point Social layer Description: Meta-Norm SETTING Description: Social norm satisfies hasSetting satisfies Situation: Jurisprudential Conflict Case Sample application to modeling legal cases with norms, conflicts, and entrenchment Aldo Gangemi, Peter Mika: Understanding the Semantic Web through Descriptions and Situations. ODBASE 2003: 689-706
  51. 51. Layered pattern morphisms An ontology design pattern describes a formal expression that can be exemplified, morphed, instantiated, and expressed in order to solve a domain modelling problem • • • • • owl:Class:_:x rdfs:subClassOf owl:Restriction:_:y Inflammation rdfs:subClassOf (localizedIn some BodyPart) Colitis rdfs:subClassOf (localizedIn some Colon) John’s_colitis isLocalizedIn John’s_colon “John’s colon is inflammated”, “John has got colitis”, “Colitis is the inflammation of colon” expressedAs Logical Pattern (MBox) Logic exemplifiedAs Generic Content Pattern (TBox) morphedAs Specific Content Pattern (TBox) Meaning instantiatedAs Data Pattern (ABox) Reference expressedAs Linguistic Pattern Expression Abstraction Aldo Gangemi,Valentina Presutti: Ontology Design Patterns. Handbook on Ontologies 2nd ed. (2009)
  52. 52. N-ary patterns in KR • Temporal indexing pattern – (R(a,b))+t sentence indexing • quads, external time stamps – R(a,b)+t relation indexing • reified n-ary relations (3D frames) – R(a+t,b+t) individual indexing • fluents, 4D, tropes, “context slices” (4D frames) – tR name nesting • ad hoc naming of binary relations • More indexes for additional arguments Aldo Gangemi,Valentina Presutti: A Multi-dimensional Comparison of Ontology Design Patterns for Representing n-ary Relations. SOFSEM 2013: 86-105 Andreas Scheuermann, Enrico Motta, Paul Mulholland, Aldo Gangemi and Valentina Presutti. An Empirical Perspective on Representing Time. K-CAP 2013
  53. 53. Centrality discovery in datasets mo:Track mo:Playlist mo:track mo:Record mo:available_as mo:MusicAr.st mo:ED2K mo:available_as foaf:maker mo:available_as tags:taggedWithTag mo:image dc:date dc:+tle dc:descrip+on tags:Tag rdfs:Literal mo:Torrent Valen&na  Presu,,  Lora  Aroyo,  Alessandro   Adamou,  Balthasar  Schopman,  Aldo  Gangemi,   Guus  Schreiber:  Extrac&ng  Core  Knowledge  from   Linked  Data.  COLD2011,  CEUR-­‐WS.org  Vol-­‐782.  
  54. 54. Encyclopedic Knowledge Patterns: example • • An Encyclopedic Knowledge Pattern (EKP) is discovered from the paths emerging from Wikipedia page link invariances They are represented as OWL2 ontologies Andrea  Giovanni  Nuzzolese,  Aldo  Gangemi,  Valen&na  Presu,,  Paolo  Ciancarini:  Encyclopedic   Knowledge  PaUerns  from  Wikipedia  Links.  Interna'onal  Seman'c  Web  Conference  (1)  2011:  520-­‐536
  55. 55. Encyclopedic KP: input data • Wikipedia page links generate 107.9M triples • Infobox-based triples are 13.6M, including data value triples (9.4M) • “Unmapped” object value triples are only 7% of page links
  56. 56. Encyclopedic KP: input data • Wikipedia page links generate 107.9M triples • Infobox-based triples are 13.6M, including data value triples (9.4M) • “Unmapped” object value triples are only 7% of page links dbpo:MusicalArtist dbpo:MusicalArtist dbpo:Organisation dbpo:Place
  57. 57. Encyclopedic KP: input data • Wikipedia page links generate 107.9M triples • Infobox-based triples are 13.6M, including data value triples (9.4M) • “Unmapped” object value triples are only 7% of page links • Paths are used to discover Encyclopedic Knowledge Patterns – Such patterns should make it emerge the most typical types of things that the Wikipedia crowd uses to describe a resource of a given type dbpo:MusicalArtist dbpo:MusicalArtist rtist usicalA nksToM li links dbpo:Place ToPl a linksToOrganisation dbpo:Organisation ce
  58. 58. Encyclopedic KP: input data • Wikipedia page links generate 107.9M triples • Infobox-based triples are 13.6M, including data value triples (9.4M) • “Unmapped” object value triples are only 7% of page links • Paths are used to discover Encyclopedic Knowledge Patterns – Such patterns should make it emerge the most typical types of things that the Wikipedia crowd uses to describe a resource of a given type Path à Pi,j= [Si, p, Oj] dbpo:MusicalArtist dbpo:MusicalArtist rtist usicalA nksToM li links dbpo:Place ToPl a linksToOrganisation dbpo:Organisation ce
  59. 59. k-means clustering on Path Popularity Sample distribution of pathPopularity for DBpedia paths. The y-axis indicates how many paths (on average) are above a certain value t for pathPopularity Encyclopedic Knowledge Patterns from Wikipedia Wikilinks (@ISWC2011) Andrea  Giovanni  Nuzzolese,  Aldo  Gangemi,  Valen&na  Presu,,  Paolo  Ciancarini:  Encyclopedic   Knowledge  PaUerns  from  Wikipedia  Links.  Interna&onal  Seman&c  Web  Conference  (1)  2011:  520-­‐536
  60. 60. k-means clustering on Path Popularity Sample distribution of pathPopularity for DBpedia paths. The y-axis indicates how many paths (on average) are above a certain value t for pathPopularity Encyclopedic Knowledge Patterns from Wikipedia Wikilinks (@ISWC2011) 1 big cluster (4-cluster) with ranks below 18.18% Andrea  Giovanni  Nuzzolese,  Aldo  Gangemi,  Valen&na  Presu,,  Paolo  Ciancarini:  Encyclopedic   Knowledge  PaUerns  from  Wikipedia  Links.  Interna&onal  Seman&c  Web  Conference  (1)  2011:  520-­‐536
  61. 61. k-means clustering on Path Popularity Sample distribution of pathPopularity for DBpedia paths. The y-axis indicates how many paths (on average) are above a certain value t for pathPopularity 3 small clusters with ranks above 22.67% 1 big cluster (4-cluster) with ranks below 18.18% Andrea  Giovanni  Nuzzolese,  Aldo  Gangemi,  Valen&na  Presu,,  Paolo  Ciancarini:  Encyclopedic   Knowledge  PaUerns  from  Wikipedia  Links.  Interna&onal  Seman&c  Web  Conference  (1)  2011:  520-­‐536 Encyclopedic Knowledge Patterns from Wikipedia Wikilinks (@ISWC2011)
  62. 62. k-means clustering on Path Popularity Sample distribution of pathPopularity for DBpedia paths. The y-axis indicates how many paths (on average) are above a certain value t for pathPopularity 3 small clusters with ranks above 22.67% 1 big cluster (4-cluster) with ranks below 18.18% Andrea  Giovanni  Nuzzolese,  Aldo  Gangemi,  Valen&na  Presu,,  Paolo  Ciancarini:  Encyclopedic   Knowledge  PaUerns  from  Wikipedia  Links.  Interna&onal  Seman&c  Web  Conference  (1)  2011:  520-­‐536 Encyclopedic Knowledge Patterns from Wikipedia Wikilinks (@ISWC2011)
  63. 63. k-means clustering on Path Popularity Sample distribution of pathPopularity for DBpedia paths. The y-axis indicates how many paths (on average) are above a certain value t for pathPopularity 3 small clusters with ranks above 22.67% Encyclopedic Knowledge Patterns from Wikipedia Wikilinks (@ISWC2011) 1 big cluster (4-cluster) with ranks below 18.18% 1 alternative cluster (6cluster) with ranks below 11.89% Andrea  Giovanni  Nuzzolese,  Aldo  Gangemi,  Valen&na  Presu,,  Paolo  Ciancarini:  Encyclopedic   Knowledge  PaUerns  from  Wikipedia  Links.  Interna&onal  Seman&c  Web  Conference  (1)  2011:  520-­‐536
  64. 64. Serendipity in exploratory browsing http://www.aemoo.org Andrea  Giovanni  Nuzzolese,  Valen&na  Presu,,  Aldo  Gangemi,  Alberto  Muse,,  Paolo  Ciancarini:  Aemoo:   exploring  knowledge  on  the  web.  WebSci  2013:  272-­‐275   Aemoo:  exploratory  search  based  on  EKP  -­‐  Seman'c  Web  Challenge  @ISWC  2011  –  Short  listed,  4th  place
  65. 65. Machine reading with FRED http://wit.istc.cnr.it/stlab-tools/fred/ Valen&na  Presu,,  Francesco  Draicchio,  Aldo  Gangemi:  Knowledge  Extrac&on  Based   on  Discourse  Representa&on  Theory  and  Linguis&c  Frames.  EKAW  2012:  114-­‐129
  66. 66. Event and Frames from text http://wit.istc.cnr.it/stlab-tools/fred/ The  New  York  Times  reported  that  John  McCarthy  died.  He  invented  the   programming  language  LISP. Custom  namespace Meta-­‐level Frames/events Seman&c  roles Vocabulary  alignment Co-­‐reference Resolu&on  and  linking Taxonomy Valen&na  Presu,,  Francesco  Draicchio,  Aldo  Gangemi:  Knowledge  Extrac&on  Based   on  Discourse  Representa&on  Theory  and  Linguis&c  Frames.  EKAW  2012:  114-­‐129 Types
  67. 67. Sentic frames from text http://wit.istc.cnr.it/stlab-tools/sentilo
  68. 68. Sentic frames from text http://wit.istc.cnr.it/stlab-tools/sentilo Paul  Newman  thinks  that  Barack  Obama  is  a  great  president!
  69. 69. But beware “patternicity”
  70. 70. Psychopathology of big data • Pattern recognition vs. patternicity: • • • • • Simple correlation fallacy Synchronicity Pareidolia Conspiracy theories Schizophrenic apophany
  71. 71. Web helps spreading patternicity • More information • Faster spread of information • More difficult provenance checking
  72. 72. Finally, back to the objective fiction problem Logic, ontologies, and data design practices construct a reality Similarly to what social institutions do since millennia by playing with the many layers at which communication means can be morphed
  73. 73. The reality currently built by triple-based languages is basic It looks more like a quiz-show than fullfledged social reality
  74. 74. • On the contrary, the reality constructed by current institutions and media is a glorious “objective fiction” • A powerful system feeding a giant, analogical Matrix that is partly opaque to most people
  75. 75. • Our use of semantics should try to approximate the level of sophistication that objective fiction has gathered to now • We have a responsibility of creating an added value: making objective fiction explicit
  76. 76. • When simple objective data are provided, its semantic representation is too simple to provide added value to users • We need to raise our semantic grasp to institutional relations, hidden knowledge patterns, action, situation, sentic and metaphoric frames, leading stories, socio-technical tasks • Semantic technologies should make us aware of real semantics, not just bare facts
  77. 77. That's very difficult, but the alternative is leaving real semantics to spin doctors
  78. 78. • Worse, simple triple-based semantics gives them more power • Transparency and distributed decision making being easily accessible on an open Semantic Web: the Holy Grail?
  79. 79. Thanks for your attention!

×