Meaning on the Web    An Empirical Design Perspective                          Aldo Gangemi                         aldo.g...
What is this?                2
Are these similar?                     3
What do you mean?• Mathieu did it slowly and deliberately, at  midnight, in the bathroom, with a knife                    ...
Killing?           5
Shaving?           6
Hanging puppets?                   7
Actually ...• Mathieu buttered the toast slowly and  deliberately, at midnight, in the bathroom,  with a knife  Donald	  D...
What is the key to access meaning?• When interpreting images? pattern recognition• When understanding text? parsing, resol...
Cognitive background                       10
Blink and “thin slicing”• Many organisms share the ability to gather a  snap judgment about a situation in a blink of an  ...
Blink and “thin slicing”• Many organisms share the ability to gather a  snap judgment about a situation in a blink of an  ...
Foreground and background• People tend to remember things because they  stick out from the background (“profiling”)  – but...
Schema-based memory• People tend to remember items that fit into a  schema. – Things that are associated through some func...
Origin of modern frames and knowledge                   patterns• «When one encounters a new situation (or makes a substan...
How many frames?• We (STLab) are collecting, reengineering, and  aligning knowledge patterns from different  knowledge for...
Limited keys totext/discourse understanding                               16
What is a window?• (i) opening in a wall   – An opening, usually covered by one or more panes     of clear glass, to allow...
Genres of polysemy in WordNet• Adam’s apple (metaphor)  – thyroid cartilage_1 (dul:PhysicalObject)  – crape jasmine_1 (dul...
Dot objects and co-predication               (Pustejovsky, Asher)• This book is heavy but interesting  – physical vs. info...
“Fictive motion” (Talmy)•   The path descended abruptly•   The road runs along the coast for two hours•   The fence zigzag...
Meaning as relation                      21
Meaning as relation• Event discovery and recognition   – { Water in pot ; pot on the hot stove ; egg in water }     ∴ ???•...
Meaning as relation• Event discovery and recognition   – { Water in pot ; pot on the hot stove ; egg in water }     ∴ ???•...
Meaning as relation• Event discovery and recognition   – { Water in pot ; pot on the hot stove ; egg in water }     ∴ ???•...
Limited keys tosocial tagging analysis                          22
Flickr, desire tag                     23
Flickr, desire tag                     24
Flickr, desire tag                     25
Similarly in other folksonomies• like in FB, G+, etc.  – like what? thing, comment to thing, content of    cited thing, im...
Limited keys tolinked data analysis                       27
Aggregated RDF data     for “Barack  Obama” (sig.ma)                      28
What about the Knowledge Graph?     however,	  where	  are	  the	  IRIs?	  named	  en11es	  are	  only	       searchable	 ...
Ontologies disagreeIs that a problem?  For many, yes                      30
schema.org• The use cases that motivated schema.org are  related to search engine requirements, probably  to meet populari...
Pragmatic issue: we can communicate with landforms                                             32
Pragmatic issue: we can communicate with landforms                                             32
Pragmatic issue: we can communicate with landforms                                             32
Pragmatic issue: we can communicate with landforms                                             32
Pragmatic issue: we can communicate with landforms                                             32
Application issue: bone pathologies, locations,            functions are all literals                                     ...
Application issue: bone pathologies, locations,            functions are all literals                                     ...
Application issue: bone pathologies, locations,            functions are all literals                                     ...
Application issue: bone pathologies, locations,            functions are all literals                                     ...
Creating keys for the Semantic Web                             34
Top-down: expertise patterns• Evidence that units of expertise are larger than what we have  from average linked data trip...
Layered pattern morphisms• A logical design pattern describes a formal expression that  can be exemplified, morphed, insta...
Bottom-up: schema extraction        (COLD2011)                           37
Results • A method for extracting the main knowledge components   of a LD dataset                                         ...
Path identification (length 3)                                                                      (Jamendo)     mo:Music...
Path identification (length 3)                                                                         (Jamendo)     mo:Mu...
Centrality (types)                                                                     (Jamendo)     mo:MusicAr?st        ...
Centrality (properties)                                                                               (Jamendo)     mo:Mus...
Centrality in Jamendo                                                                     betweenness                     ...
Emerging Knowledge Pattern              mo:Track                                                   mo:Playlist            ...
Bottom-up: Encyclopedic Knowledge        Patterns (EKP) (ISWC2011)• Improving knowledge exploration and  summarization by:...
Input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9...
Input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9...
Input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9...
Input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9...
Encyclopedic Knowledge Patterns• An Encyclopedic Knowledge Pattern (EKP) is discovered from the  paths emerging from Wikip...
Paths and indicators• Emerging paths are stored in RDF according  to the “Knowledge Architecture” vocabulary –Cf. our COLD...
nRes(dbpo:MusicalArtist) dbpo:MusicalArtist           Anthony_Kiedis         Michael_Jacksonrdf:type                      ...
nSubjectRes(Pi,j)                                           Jackson_5        Dave_Grohl           Michael_Jackson         ...
nSubjectRes(Pi,j)                                               Jackson_5          Dave_Grohl             Michael_Jackson ...
pathPopularity(Pi,j,Si)                                            Jackson_5        Dave_Grohl          Michael_Jackson   ...
Path Popularity distribution example                             Path            pathPopularity      [MusicFestival,Band] ...
Boundaries of EKPs• An EKP(Si) is a set of paths, such that        Pi,j ∈ EKP(Si) ! pathPopularity(Pi,j, Si) ≥ t• t is a t...
Boundary inductionStep                                       Description   1. For each path, calculate the path popularity...
k-means clustering   Sample distribution of pathPopularity for DBpedia                     paths. The x-axis indicates how...
k-means clustering       Sample distribution of pathPopularity for DBpedia                         paths. The x-axis indic...
k-means clustering        Sample distribution of pathPopularity for DBpedia                          paths. The x-axis ind...
k-means clustering        Sample distribution of pathPopularity for DBpedia                          paths. The x-axis ind...
k-means clustering        Sample distribution of pathPopularity for DBpedia                          paths. The x-axis ind...
What is the “agreement” between      DBpedia and our sample users?Average multiple correlation (Spearman ρ)  between users...
What is the “agreement” between        DBpedia and our sample users?  Average multiple correlation (Spearman ρ)    between...
What is the “agreement” between         DBpedia and our sample users?   Average multiple correlation (Spearman ρ)     betw...
Aemoo• http://aemoo.org exploratory search application based on  EKP
Comparing entities of the same type  Nicolas Sarkozy   Ronald Regan   Angela Merkel          Barack Obama      Silvio Berl...
Applying topic-sensitive EKP as lenses                                         59
Limited keys totext/discourse analysis                          60
Ontology learning: what structures?• State of art: what do we learn?  – instances (NER): BarackObama  – classes (sense tag...
Kinds of text analyses• Lead example: –“In early 1527, Cabeza De Vaca departed Spain  as the treasurer of the Narvaez roya...
Shallow parsing (Alchemy)                                   •   Language                                                  ...
Entity resolution (Wikifier)Named	  en?ty	  resolu?on   Wikipedia-­‐based	  topic	  detec?on   64
Multimedia enrichment                            (Zemanta)Related	  links	  and	  geo-­‐referencing   Related	  images    ...
Linked data positioning (RelFinder)                               66
Deep parsing• Usage of syntactic parse trees• Formal semantic interpretation on top of  syntax                            ...
Categorial parsingw(1,   1, In, in, IN, I-PP, O, (S[X]/S[X])/NP).w(1,   2, early, early, JJ, I-NP, I-DAT, N/N).w(1,   3, 1...
Relation extraction with OIE•   Cabeza De Vaca ___departed___Spain•   0.9059944442645228•   In early 1527 , Cabeza De Vaca...
Creating keys for NL                       70
DRT• Discourse Representation Theory  – Hans Kamp: “A theory of truth and semantic representation”, 1981• A sentence meani...
DRT notation               72
Boxer• Implementation of computational semantics (J. Bos)  with DRT output and Davidsonian predicates:  reification of n-a...
DRT semantic parsing +semantic role labellling in Boxer                                74
Issues when porting                   (Boxer) DRT to RDF• Discourse referent variables: implicit or explicit?• No terminol...
Using FRED (EKAW2012)http://wit.istc.cnr.it/stlab-tools/fred/                                           76
FRED RDF graph output                    77
Another exampleThe	  New	  York	  Times	  reported	  that	  John	  McCarthy	  died.	  He	  invented	  the	  programming	  ...
Heuristics to map                    Boxer DRT to RDF (1/2)• Default predicates   – H_pred1: special vocabulary for defaul...
Heuristics to map                    Boxer DRT to RDF (2/2)• Redundant variables   – H_red: unify variables based on local...
Wikipedia typing• Typing Wikipedia entities is not easy• A lot of resources, limited alignment, limited  coverage, differe...
Tìpalo: a FRED apphttp://wit.istc.cnr.it/stlab-tools/tipalo/                                                  82
Definition of chaise longueA chaise longue is an upholstered sofa in the shape of a chair that is long enough to support t...
Conclusions• Cognitive science is quite explicit on what meaning is  for humans: an activity of “framing reality for a  pu...
References•   Alchemy (shallow parsing): http://www.alchemyapi.com/api/demo.html•   Wikifier (entity resolution): http://w...
Upcoming SlideShare
Loading in …5
×

Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

1,931 views

Published on

Presentation from Aldo Gangemi at SSSW 2012

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,931
On SlideShare
0
From Embeds
0
Number of Embeds
1,143
Actions
Shares
0
Downloads
33
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

  1. 1. Meaning on the Web An Empirical Design Perspective Aldo Gangemi aldo.gangemi@cnr.it Semantic Technology Lab Institute for Cognitive Sciences and Technologies National Research Council, Rome, Italy Work described is by STLab people:Francesco Draicchio, Alberto Musetti, Andrea Nuzzolese, Valentina Presutti Alessandro Adamou, Eva Blomqvist, Enrico Daga 1
  2. 2. What is this? 2
  3. 3. Are these similar? 3
  4. 4. What do you mean?• Mathieu did it slowly and deliberately, at midnight, in the bathroom, with a knife 4
  5. 5. Killing? 5
  6. 6. Shaving? 6
  7. 7. Hanging puppets? 7
  8. 8. Actually ...• Mathieu buttered the toast slowly and deliberately, at midnight, in the bathroom, with a knife Donald  Davidson:  The  Logical  Form  of  Ac1on  Sentences,  1967 “the  it  ...  seems  to  refer  to  some  en?ty” Davidson’s  analysis  started  formal  analysis  of  language  in  a  modern  way,  and  gave  a   founda?on  to  frame  logics,  which  were  the  original  use  case  for  descrip?on  logics  ... That  ‘it’  is  key  to  meaning 8
  9. 9. What is the key to access meaning?• When interpreting images? pattern recognition• When understanding text? parsing, resolution, expectations (background knowledge), hypothesis formation• For querying databases? key discovery• For social data mashups? application design (e.g. FB)• For linked data mashups? sparql queries, ontologies• For event recognition, situation awareness? context ontologies, cqels queries• Such “keys” provide access by putting the pieces together. Can we make ontology design as an empirical science of putting pieces of meaning together? 9
  10. 10. Cognitive background 10
  11. 11. Blink and “thin slicing”• Many organisms share the ability to gather a snap judgment about a situation in a blink of an eye (cf. M. Gladwell, Blink) –making sense of situations based on thin slices of experience: few color spots, a quick movement, a hat, the length of hair, a facial expression, the relative position of two persons, ...• Positive and negative slicing –quick reaction and expectation building vs. stereotypes and prejudices 11
  12. 12. Blink and “thin slicing”• Many organisms share the ability to gather a snap judgment about a situation in a blink of an eye (cf. M. Gladwell, Blink) –making sense of situations based on thin slices of experience: few color spots, a quick movement, a hat, the length of hair, a facial expression, the relative position of two persons, ...• Positive and negative slicing –quick reaction and expectation building vs. stereotypes and prejudices 11
  13. 13. Foreground and background• People tend to remember things because they stick out from the background (“profiling”) – but what makes a background as such?• Expectations create scenarios – even things that are not there become part of the scenario if activated by an expectation• Cf. Gestalt psychology (Köhler, Langacker, etc.) 12
  14. 14. Schema-based memory• People tend to remember items that fit into a schema. – Things that are associated through some functional similarity (cf. Gibson’s affordances)• Schemata seem to be learnt mostly inductively –blocks world, repeated verbalization of invariant scenes, peek-a-boo, etc. Cf. Deb Roy’s TED talk• Schema similar to (conceptual) frame, script, knowledge pattern, etc. 13
  15. 15. Origin of modern frames and knowledge patterns• «When one encounters a new situation (or makes a substantial change in ones view of the present problem) one selects from memory a structure called a Frame. This is a remembered framework to be adapted to fit reality by changing details as necessary … a frame is a data-structure for representing a stereotyped situation» (Minsky 1975)• Frames, schemas, scripts … «These large-scale knowledge configurations supply top-down input for a wide range of communicative and interactive tasks … the availability of global patterns of knowledge cuts down on non-determinacy enough to offset idiosyncratic bottom-up input that might otherwise be confusing» (Beaugrande, 1980)
  16. 16. How many frames?• We (STLab) are collecting, reengineering, and aligning knowledge patterns from different knowledge formats into RDF and OWL – Web data: microformats, microdata, XML patterns – Linked data: induced schemas, data patterns – (Semantic web) ontologies: ontology design patterns, extracted modules, query patterns – Linguistics: FrameNet and VerbNet frames, NLP- extracted selectional restrictions, situations from parsed NL text 15
  17. 17. Limited keys totext/discourse understanding 16
  18. 18. What is a window?• (i) opening in a wall – An opening, usually covered by one or more panes of clear glass, to allow light and air from outside to enter a building or vehicle. (Wiktionary)• (ii) glass-filled frame fitting an opening in a wall – Welcome to Window World, Americas Largest Replacement Windows Company! (...) installs over 1 million replacement windows annually. (Ad)• (iii) glass-filled frame – A window is a transparent or translucent opening in a wall or door that allows the passage of light and, if not closed or sealed, air and sound. (Wikipedia)• (iv) glass from glass-filled frame – A pane of glass in a window. (WordNet)• (v) frame from glass-filled frame – A framework of wood or metal that contains a glass windowpane and is built into a wall or roof to admit light or air. (WordNet) 17
  19. 19. Genres of polysemy in WordNet• Adam’s apple (metaphor) – thyroid cartilage_1 (dul:PhysicalObject) – crape jasmine_1 (dul:Organism)• Bucket shop (diachronic metaphor) – bucket shop_2 (dul:PhysicalObject) • “(formerly) a cheap saloon selling liquor by the bucket” – bucket shop_1 (dul:Organization) • “an unethical or overly aggressive brokerage firm”• Stadium (established metonymy, “dot object”) – stadium_1 (dul:PhysicalObject) • “a large structure for open-air sports or entertainments” – arena_4 (d0:Location) • “a playing field where sports events take place” – do we need a relation between these senses, or a coarser concept? 18
  20. 20. Dot objects and co-predication (Pustejovsky, Asher)• This book is heavy but interesting – physical vs. information• Lunch was delicious but took forever – food vs. event• I saw the Coliseum in my tourist guide and wanted to go there – artifact vs. place• Actually relevant? – Power of ambiguity (“systematic polysemy”) – Minimal effort seems to count in human evolution of lexical knowledge, but only if we can easily reconstruct the context (or frame, relation, ...) • “The communicative function of ambiguity in language” (Piantadosi, Tilyb, Gibson): ambiguity allows for greater ease of processing by permitting efficient linguistic units to be re-used. “All efficient communication systems will be ambiguous, assuming that context is informative about meaning” – Also in science: inflammation has six interrelated meanings 19
  21. 21. “Fictive motion” (Talmy)• The path descended abruptly• The road runs along the coast for two hours• The fence zigzags from the plateau to the valley• The highway crawls through the city• The road leads us to Cercedilla• Need for “type coercion” to satisfy hidden frame – highway is actually a path that “can be crawled”, therefore crawling frame here is descriptive of a state, not of an action – fence is actually an object whose shape “can be followed by zigzaging” – road is actually an object that “can be followed as an indication” to our destination – Actually an inversion of roles: the path descends because it can be descended: another version of systematic polysemy 20
  22. 22. Meaning as relation 21
  23. 23. Meaning as relation• Event discovery and recognition – { Water in pot ; pot on the hot stove ; egg in water } ∴ ???• Meaning as a relational thing – Does “Egg” have a meaning independently of its relations? no, if meaning is what someone does (or might do) with something? – cf. Bartlett, Davidson, Fillmore, Minsky, Schank, Brian Smith, Gibson, ...• Dynamics (induction, abduction) of general and domain relations – Cook(Egg, Stove) as Absorb_heat(Entity, Heat_source) as Becoming(Entity, Cause) 21
  24. 24. Meaning as relation• Event discovery and recognition – { Water in pot ; pot on the hot stove ; egg in water } ∴ ???• Meaning as a relational thing – Does “Egg” have a meaning independently of its relations? no, if meaning is what someone does (or might do) with something? – cf. Bartlett, Davidson, Fillmore, Minsky, Schank, Brian Smith, Gibson, ...• Dynamics (induction, abduction) of general and domain relations – Cook(Egg, Stove) as Absorb_heat(Entity, Heat_source) as Becoming(Entity, Cause) FrameNet  frames have  a  par?al  order 21
  25. 25. Meaning as relation• Event discovery and recognition – { Water in pot ; pot on the hot stove ; egg in water } ∴ ???• Meaning as a relational thing – Does “Egg” have a meaning independently of its relations? no, if meaning is what someone does (or might do) with something? – cf. Bartlett, Davidson, Fillmore, Minsky, Schank, Brian Smith, Gibson, ...• Dynamics (induction, abduction) of general and domain relations – Cook(Egg, Stove) as Absorb_heat(Entity, Heat_source) as Becoming(Entity, Cause) FrameNet  frames have  a  par?al  order 21
  26. 26. Limited keys tosocial tagging analysis 22
  27. 27. Flickr, desire tag 23
  28. 28. Flickr, desire tag 24
  29. 29. Flickr, desire tag 25
  30. 30. Similarly in other folksonomies• like in FB, G+, etc. – like what? thing, comment to thing, content of cited thing, implicit negative attitude to thing?• positive/negative polarity in sentiment analysis – polarity of what? tweet, citation, irony, ... 26
  31. 31. Limited keys tolinked data analysis 27
  32. 32. Aggregated RDF data for “Barack Obama” (sig.ma) 28
  33. 33. What about the Knowledge Graph? however,  where  are  the  IRIs?  named  en11es  are  only   searchable  within  the  Google  engine,  content  is   completely  encrypted 29
  34. 34. Ontologies disagreeIs that a problem? For many, yes 30
  35. 35. schema.org• The use cases that motivated schema.org are related to search engine requirements, probably to meet popularity and advertisement• We received schemas from the “Sponsors”. As often happens, “sponsors” create meaning, because they have either authority or money• Will ontologies be swallowed by the Sponsors?• Anyway, schemas from schema.org seem to address keys more accurately, with some exceptions ... 31
  36. 36. Pragmatic issue: we can communicate with landforms 32
  37. 37. Pragmatic issue: we can communicate with landforms 32
  38. 38. Pragmatic issue: we can communicate with landforms 32
  39. 39. Pragmatic issue: we can communicate with landforms 32
  40. 40. Pragmatic issue: we can communicate with landforms 32
  41. 41. Application issue: bone pathologies, locations, functions are all literals 33
  42. 42. Application issue: bone pathologies, locations, functions are all literals 33
  43. 43. Application issue: bone pathologies, locations, functions are all literals 33
  44. 44. Application issue: bone pathologies, locations, functions are all literals 33
  45. 45. Creating keys for the Semantic Web 34
  46. 46. Top-down: expertise patterns• Evidence that units of expertise are larger than what we have from average linked data triples, or ontology learning – Cf. cognitive scientist Dedre Gentner: “uniform relational representation is a hallmark of expertise” – We need to create expertise-oriented boundaries unifying multiple triples – “Competency questions” are used to link ontology design patterns to requirements: • Which objects take part in a certain event? • Which tasks should be executed in order to achieve a certain goal? • What’s the function of that artifact? • What norms are applicable to a certain case? • What inflammation is active in what body part with what morphology? 35
  47. 47. Layered pattern morphisms• A logical design pattern describes a formal expression that can be exemplified, morphed, instantiated, and expressed in order to solve a domain modelling problem• owl:Class:_:x rdfs:subClassOf owl:Restriction:_:y• Inflammation rdfs:subClassOf (localizedIn some BodyPart)• Colitis rdfs:subClassOf (localizedIn some Colon)• John’s_colitis isLocalizedIn John’s_colon• “John’s colon is inflammated”, “John has got colitis”, “Colitis is the inflammation of colon” expres sedAs Generic Specific Logical exemplifiedAs morphedAs instan1atedAs Data expressedAs Content Content Linguis?c PaPern PaPern   PaPern   PaPern   PaPern (MBox) (ABox) (TBox) (TBox) Logic Meaning Reference Expression Abstrac1on
  48. 48. Bottom-up: schema extraction (COLD2011) 37
  49. 49. Results • A method for extracting the main knowledge components of a LD dataset mo:Track mo:track mo:Track Knowledge  PaPern mo:track mo:MusicAr1st mo:Record foaf:maker mo:available_as mo:Record foaf:namePath dc:1tle mo:format mo:available_as mo:Playlist mo:Playlist Cluster  of  paths mo:Record mo:track mo:Track mo:available_as mo:Playlist dc:format rdfs:Literal mo:Signal mo:published_as mo:Track mo:available_as mo:Playlist dc:format rdfs:Literal mo:MusicArtist foaf:made mo:Record mo:available_as mo:Torrent dc:format rdfs:Literal mo:MusicArtist foaf:made mo:Record mo:available_as mo:ED2K dc:format rdfs:Literal mo:MusicArtist foaf:made mo:Record mo:available_as mo:Playlist dc:format rdfs:Literal
  50. 50. Path identification (length 3) (Jamendo) mo:MusicAr?st mo:Signal mo:recorded_as mo:published_as rdfs:Resource rdfs:Literalmo:Record dc:1tle mo:Track foaf:Document mo:Playlist PREFIX mo : http://purl.org/ontology/mo/MusicArtist
  51. 51. Path identification (length 3) (Jamendo) mo:MusicAr?st mo:Signal mo:recorded_as Path  Element Posi?on  2 mo:published_as rdfs:Resource rdfs:Literalmo:Record dc:1tle mo:Track foaf:Document mo:Playlist PREFIX mo : http://purl.org/ontology/mo/MusicArtist
  52. 52. Centrality (types) (Jamendo) mo:MusicAr?st mo:Signal mo:published_as rdfs:Resource rdfs:Literalmo:Record mo:track dc:1tle mo:track_number mo:Track mo:available_as mo:license foaf:Document mo:Playlist PREFIX mo : http://purl.org/ontology/mo/MusicArtist
  53. 53. Centrality (properties) (Jamendo) mo:MusicAr?st mo:Signal mo:recorded_as mo:published_as rdfs:Resource rdfs:Literalmo:Record dc:1tle mo:track_number mo:Track mo:available_as mo:license foaf:Document mo:Playlist PREFIX mo : http://purl.org/ontology/mo/MusicArtist
  54. 54. Centrality in Jamendo betweenness mo:Lyricstags2:taggedWithTag tags2:Tag mo:time mo:Torrent mo:track mo:ED2K mo:published_as time:Interval mo:available_as mo:Signal foaf:maker mo:Playlist foaf:made mo:MusicArtist 0 2 4 6 8 10 12 mo:Track mo:Record 0 2 4 6 8 10 12 foaf:maker foaf:made mo:track mo:MusicArtist mo:time mo:Record mo:published_as mo:Lyricstags2:taggedWithTag tags2:Tag mo:available_as mo:Torrent 0 20000 40000 60000 80000 100000 120000 140000 mo:ED2K time:Interval mo:Signal mo:Track frequency mo:Playlist 0 20000 40000 60000 80000 100000 120000 42
  55. 55. Emerging Knowledge Pattern mo:Track mo:Playlist mo:track mo:Record mo:available_asmo:MusicAr.st mo:ED2K foaf:maker mo:available_as mo:available_as tags:taggedWithTag mo:image dc:date mo:Torrent dc:1tle dc:descrip1on tags:Tag rdfs:Literal
  56. 56. Bottom-up: Encyclopedic Knowledge Patterns (EKP) (ISWC2011)• Improving knowledge exploration and summarization by: – Empirically discovering invariances in conceptual organization of knowledge – encyclopedic knowledge patterns – from Wikipedia crowd-sourced page links – Understanding the most intuitive way of selecting relevant entities used to describe a given entity – Identifying the typical / atypical types of things that people use for describing other things – Enabling serendipitous search
  57. 57. Input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9.4M)• “Unmapped” object value triples are only 7% of page links
  58. 58. Input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9.4M)• “Unmapped” object value triples are only 7% of page links dbpo:MusicalArtistdbpo:MusicalArtist dbpo:Organisation dbpo:Place
  59. 59. Input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9.4M)• “Unmapped” object value triples are only 7% of page links• Paths are used to discover Encyclopedic Knowledge Patterns – Such patterns should make it emerge the most typical types of things that the Wikipedia crowd uses to describe a resource of a given type dbpo:MusicalArtist dbpo:MusicalArtist rtist links dbpo:Place usicalA ToPl li nksToM a ce linksToOrganisation dbpo:Organisation
  60. 60. Input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9.4M)• “Unmapped” object value triples are only 7% of page links• Paths are used to discover Encyclopedic Knowledge Patterns – Such patterns should make it emerge the most typical types of things that the Wikipedia crowd uses to describe a resource of a given type Path  Pi,j= [Si, p, Oj] dbpo:MusicalArtist dbpo:MusicalArtist rtist links dbpo:Place usicalA ToPl li nksToM a ce linksToOrganisation dbpo:Organisation
  61. 61. Encyclopedic Knowledge Patterns• An Encyclopedic Knowledge Pattern (EKP) is discovered from the paths emerging from Wikipedia page link invariances• They are represented as OWL2 ontologies
  62. 62. Paths and indicators• Emerging paths are stored in RDF according to the “Knowledge Architecture” vocabulary –Cf. our COLD2011 paper “Extracting core knowledge from linked data”• Paths and types are associated with a set of indicators
  63. 63. nRes(dbpo:MusicalArtist) dbpo:MusicalArtist Anthony_Kiedis Michael_Jacksonrdf:type Jackie_Jackson rdf:type rdf:type rdf:type Chad_Smith dbpo:MusicalArtist dbpo:MusicalArtistNumber of resources having type dbpo:MusicalArtist rdf:type John_Lennon Paul_McCartney rdf:type dbpo:MusicalArtistPREFIX dbpo : http://dbpedia.org/ontology/
  64. 64. nSubjectRes(Pi,j) Jackson_5 Dave_Grohl Michael_Jackson Jackie_Jackson Nirvana Si dbpo:wikiPageWikiLink Oj dbpo:MusicalArtist dbpo:BandFoo Fighters Beatles John_Lennon Paul_McCartneyPREFIX dbpo : http://dbpedia.org/ontology/
  65. 65. nSubjectRes(Pi,j) Jackson_5 Dave_Grohl Michael_Jackson Jackie_Jackson Nirvana Si dbpo:wikiPageWikiLink Oj dbpo:MusicalArtist dbpo:Band Foo Fighters Beatles Number of distinct resourcesthat participate in a path as subjects John_Lennon Paul_McCartney PREFIX dbpo : http://dbpedia.org/ontology/
  66. 66. pathPopularity(Pi,j,Si) Jackson_5 Dave_Grohl Michael_Jackson Jackie_Jackson Nirvana Madonna Prince Charlie_Parker Keith_JarrettFoo Fighters Beatles nSubjectRes(Pi,j)/nRes(Si) John_Lennon Paul_McCartney
  67. 67. Path Popularity distribution example Path pathPopularity [MusicFestival,Band] 82.33 [MusicFestival,MusicalArtist] 74.17 [MusicFestival,Country] 74.02 [MusicFestival,MusicGenre] 72.81 [MusicFestival,City] 38.37 [MusicFestival,AdministrativeRegion] 32.78 [MusicFestival,MusicFestival] 23.26 [MusicFestival,Album] 18.13 [MusicFestival,Film] 12.39 [MusicFestival,Stadium] 9.52 [MusicFestival,RadioStation] 9.52 [MusicFestival,Single] 8.76 [MusicFestival,Town] 8.61 [MusicFestival,Magazine] 8.46 [MusicFestival,Broadcast] 7.55 [MusicFestival,Newspaper] 6.95 [MusicFestival,TelevisionShow] 6.04 [MusicFestival,University] 5.74 [MusicFestival,Continent] 5.59 [MusicFestival,Comedian] 5.29 [MusicFestival,OfficeHolder] 4.98 [MusicFestival,Island] 4.98
  68. 68. Boundaries of EKPs• An EKP(Si) is a set of paths, such that Pi,j ∈ EKP(Si) ! pathPopularity(Pi,j, Si) ≥ t• t is a threshold, under which a path is not included in an EKP• How to get a good value for t?
  69. 69. Boundary inductionStep Description 1. For each path, calculate the path popularity  2. For each subject type, get the 40 top-ranked path popularity values*  3. Apply multiple correlation (Pearson ρ) between the paths of all subject types  .9 by rank, and check for homogeneity of ranks across subject types  4. For each of the 40 path popularity ranks, calculate its mean across all subject types 5. Apply k-means clustering on the 40 ranks  6. Decide threshold(s) based on k-means as well as other indicators (e.g.  FrameNet roles distribution) * 40 covers most “core” path popularity values, as well as many of the unusual ones.
  70. 70. k-means clustering Sample distribution of pathPopularity for DBpedia paths. The x-axis indicates how many paths (onon Path Popularity average) are above a certain value t for pathPopularity
  71. 71. k-means clustering Sample distribution of pathPopularity for DBpedia paths. The x-axis indicates how many paths (onon Path Popularity average) are above a certain value t for pathPopularity 1 big cluster (4-cluster) with ranks below 18.18%
  72. 72. k-means clustering Sample distribution of pathPopularity for DBpedia paths. The x-axis indicates how many paths (onon Path Popularity average) are above a certain value t for pathPopularity 3 small clusters with ranks above 22.67% 1 big cluster (4-cluster) with ranks below 18.18%
  73. 73. k-means clustering Sample distribution of pathPopularity for DBpedia paths. The x-axis indicates how many paths (onon Path Popularity average) are above a certain value t for pathPopularity 3 small clusters with ranks above 22.67% 1 big cluster (4-cluster) with ranks below 18.18%
  74. 74. k-means clustering Sample distribution of pathPopularity for DBpedia paths. The x-axis indicates how many paths (onon Path Popularity average) are above a certain value t for pathPopularity 3 small clusters with ranks above 22.67% 1 big cluster (4-cluster) with ranks below 18.18% 1 alternative cluster (6- cluster) with ranks below 11.89%
  75. 75. What is the “agreement” between DBpedia and our sample users?Average multiple correlation (Spearman ρ) between users’ assigned scores, and pathPopularityDBpedia based scores Spearman ρ range: [-1,1] −1 = no agreement +1 = complete agreement) Multiple correlation coefficient (Spearman ρ) between users’s assigned score, and pathPopularityDBpedia based score
  76. 76. What is the “agreement” between DBpedia and our sample users? Average multiple correlation (Spearman ρ) between users’ assigned scores, and pathPopularityDBpedia based scores Spearman ρ range: [-1,1] −1 = no agreement +1 = complete agreement) Multiple correlation coefficient (Spearman ρ) between users’s assigned score, and pathPopularityDBpedia based scoreGood correlation Satisfactory precision
  77. 77. What is the “agreement” between DBpedia and our sample users? Average multiple correlation (Spearman ρ) between users’ assigned scores, and pathPopularityDBpedia based scores Spearman ρ range: [-1,1] −1 = no agreement +1 = complete agreement)Pi,j ∈ EKP (Si) pathPopularity(Pi,j, Si) ≥ Multiple correlation coefficient (Spearman ρ) between users’s assigned score, and pathPopularityDBpedia based score11%correlation Good Satisfactory precision
  78. 78. Aemoo• http://aemoo.org exploratory search application based on EKP
  79. 79. Comparing entities of the same type Nicolas Sarkozy Ronald Regan Angela Merkel Barack Obama Silvio Berlusconi 58
  80. 80. Applying topic-sensitive EKP as lenses 59
  81. 81. Limited keys totext/discourse analysis 60
  82. 82. Ontology learning: what structures?• State of art: what do we learn? – instances (NER): BarackObama – classes (sense tagging): Person, or even TelevisionActor – relations between instances: CabezaDeVaca departed Spain – axioms: mostly taxonomic or disjointness, some work also on learning restrictions: TelevisionActor disjointWith TVSeries – entire ontologies, but only as collections of independently learnt axioms 61
  83. 83. Kinds of text analyses• Lead example: –“In early 1527, Cabeza De Vaca departed Spain as the treasurer of the Narvaez royal expedition to occupy the mainland of North America. After landing near Tampa Bay, Florida on April 15, 1528, Cabeza De Vaca and three other men would be the only survivors of the expedition party of 600 men.” 62
  84. 84. Shallow parsing (Alchemy) • Language Language  recogni?on Rela?on  (ac?on)  extrac?on – english • Person (1) – the only survivors • GeographicFeature (1) – Tampa Bay • Continent (1) – North America • Country (1) Named  en?ty  recogni?on – Spain • StateOrCounty (1) – Florida • Concept Tags (8) – United States Concept  tagging – Florida – Gulf of Mexico – North America – Tampa, Florida – Narváez expedition – American Civil War – Tampa Bay Subject   • Category classifica?on – Culture & Politics (confidence: 0.6598)
  85. 85. Entity resolution (Wikifier)Named  en?ty  resolu?on Wikipedia-­‐based  topic  detec?on 64
  86. 86. Multimedia enrichment (Zemanta)Related  links  and  geo-­‐referencing Related  images Related  news 65
  87. 87. Linked data positioning (RelFinder) 66
  88. 88. Deep parsing• Usage of syntactic parse trees• Formal semantic interpretation on top of syntax 67
  89. 89. Categorial parsingw(1, 1, In, in, IN, I-PP, O, (S[X]/S[X])/NP).w(1, 2, early, early, JJ, I-NP, I-DAT, N/N).w(1, 3, 1527, 1527, CD, I-NP, I-DAT, N).w(1, 4, ,, ,, ,, O, I-DAT, ,).w(1, 5, Cabeza, Cabeza, NNP, I-NP, I-ORG, N/N).w(1, 6, De, De, NNP, I-NP, I-ORG, N/N).w(1, 7, Vaca, Vaca, NNP, I-NP, I-ORG, N).w(1, 8, departed, depart, VBD, I-VP, O, ((S[dcl]NP)/PP)/NP).w(1, 9, Spain, Spain, NNP, I-NP, O, N).w(1, 10, as, as, IN, I-PP, O, PP/NP).w(1, 11, the, the, DT, I-NP, O, NP[nb]/N).w(1, 12, treasurer, treasurer, NN, I-NP, O, N).w(1, 13, of, of, IN, I-PP, O, (NPNP)/NP).w(1, 14, the, the, DT, I-NP, O, NP[nb]/N).w(1, 15, Narvaez, Narvaez, NNP, I-NP, I-ORG, N/N).w(1, 16, royal, royal, NN, I-NP, O, N/N).w(1, 17, expedition, expedition, NN, I-NP, O, N).w(1, 18, to, to, TO, I-VP, O, (S[to]NP)/(S[b]NP)).w(1, 19, occupy, occupy, VB, I-VP, O, (S[b]NP)/NP).w(1, 20, the, the, DT, I-NP, O, NP[nb]/N).w(1, 21, mainland, mainland, NN, I-NP, O, N).w(1, 22, of, of, IN, I-PP, O, (NPNP)/NP).w(1, 23, North, North, NNP, I-NP, I-LOC, N/N).w(1, 24, America, America, NNP, I-NP, I-LOC, N).w(1, 25, ., ., ., O, O, .).w(1, 26, After, after, IN, I-PP, O, (S[X]/S[X])/NP).w(1, 27, landing, landing, NN, I-NP, O, N).w(1, 28, near, near, IN, I-PP, O, (NPNP)/NP).w(1, 29, Tampa, Tampa, NNP, I-NP, I-LOC, N/N).w(1, 30, Bay, Bay, NNP, I-NP, I-LOC, N).w(1, 31, ,, ,, ,, O, O, ,).w(1, 32, Florida, Florida, NNP, I-NP, I-LOC, N).w(1, 33, on, on, IN, I-PP, O, (NPNP)/NP).w(1,w(1, 34, April, April, NNP, I-NP, I-DAT, N/N[num]). 35, 15, 15, CD, I-NP, I-DAT, N[num]). 68w(1, 36, ,, ,, ,, I-NP, I-DAT, ,).w(1, 37, 1528, 1528, CD, I-NP, I-DAT, NN).
  90. 90. Relation extraction with OIE• Cabeza De Vaca ___departed___Spain• 0.9059944442645228• In early 1527 , Cabeza De Vaca departed Spain as the treasurer of the Narvaez royal expedition to occupy the mainland of North America .• IN JJ CD , NNP NNP NNP VBD NNP IN DT NN IN DT NNP JJ NN TO VB DT NN IN NNP NNP .• B-PP B-NP I-NP O B-NP I-NP I-NP B-VP B-NP B-PP B-NP I-NP I-NP I-NP I-NP I-NP I-NP B-VP I- VP B-NP I-NP I-NP I-NP I-NP O• cabeza de vaca___depart___spain• three other men___would be the only survivors of___the expedition party of 600 men• 0.36592485864619345• After landing near Tampa Bay , Florida on April 15 , 1528 , Cabeza De Vaca and three other men would be the only survivors of the expedition party of 600 men .• IN NN IN NNP NNP , NNP IN NNP CD , CD , NNP NNP NNP CC CD JJ NNS MD VB DT JJ NNS IN DT NN NN IN CD NNS .• B-PP B-NP B-PP B-NP I-NP O B-NP B-PP B-NP I-NP I-NP I-NP O B-NP I-NP I-NP O B-NP I-NP I- NP B-VP I-VP B-NP I-NP I-NP I-NP I-NP I-NP I-NP I-NP I-NP I-NP O• # other men___be survivor of___the expedition party of # men 69
  91. 91. Creating keys for NL 70
  92. 92. DRT• Discourse Representation Theory – Hans Kamp: “A theory of truth and semantic representation”, 1981• A sentence meaning is taken to be an update operation on a context• DRT represents “the discourse context” as a discourse representation structure (or DRS). A DRS includes: – A set of referents: the entities which have been introduced into the context – A set of conditions: the predicates which are known to hold of these entities – Basically a fragment of FOL 71
  93. 93. DRT notation 72
  94. 94. Boxer• Implementation of computational semantics (J. Bos) with DRT output and Davidsonian predicates: reification of n-ary relations – E.g. preparing a coffee: agent, instrument, mix, place, time, method, ...• Semantic role labelling with VerbNet or FrameNet roles• Pragmatic grasp, with statistical NER and sense tagging, tense logic, co-reference, presupposition, sentence integration, entailment, ... 73
  95. 95. DRT semantic parsing +semantic role labellling in Boxer 74
  96. 96. Issues when porting (Boxer) DRT to RDF• Discourse referent variables: implicit or explicit?• No terminology recognition/extraction• No term compositionality• No periphrastic relations for properties (e.g. survivorOf)• Redundant variables• Missing definitional pragmatics• Implicit local restrictions• Many types of redundant “boxing” for embedded propositions, non- standard negation, etc.• How to map efficiently to RDF/OWL? 75
  97. 97. Using FRED (EKAW2012)http://wit.istc.cnr.it/stlab-tools/fred/ 76
  98. 98. FRED RDF graph output 77
  99. 99. Another exampleThe  New  York  Times  reported  that  John  McCarthy  died.  He  invented  the  programming  language  LISP. 78
  100. 100. Heuristics to map Boxer DRT to RDF (1/2)• Default predicates – H_pred1: special vocabulary for defaults, e.g. boxer:Per rdf:type owl:Class – H_pred2: mappings to existing vocabularies, e.g. foaf:Person ; dul:associatedWith ; time:Interval• Default roles (SRL binary predicates) – H_rol: VerbNet/FrameNet vocabularies, e.g. verbnet:agent rdf:type owl:ObjectProperty• Domain predicates – H_dom: default namespace, customizable, e.g. domain:Expedition ; domain:of• Discourse referent variables: implicit or explicit? – H_ref: only referents of predicates are materialized, e.g. domain:survivor_1 rdf:type domain:Survivor ; depart_1 rdf:type domain:Depart• No terminology recognition/extraction – H_term: co-referential predicate merging, e.g. domain:ExpeditionParty• No term compositionality – H_termc: create taxonomies by “unmerging” predicates, e.g. domain:ExpeditionParty rdfs:subClassOf domain:Party• No periphrastic relations – H_peri: generate “webby” properties, e.g. domain:survivorOf rdf:type owl:ObjectProperty 79
  101. 101. Heuristics to map Boxer DRT to RDF (2/2)• Redundant variables – H_red: unify variables based on local Unique Name Assumption• Boolean constructs – H_boo: special relations between domain predicates or individuals, e.g. domain:man_1 owl:differentFrom domain:man_2• Propositions as arguments – H_pro: special relation between events, e.g. domain:report_1 vn:theme domain:depart_1• Missing definitional pragmatics – H_def: bypassing discourse referents and forcing universal quantification, e.g. domain:WindInstrument rdfs:subClassOf domain:MusicalInstrument• Implicit local restrictions (optional) – H_res: inducing anonymous classes, e.g. domain:WindInstrument rdfs:subClassOf (locationOf some domain:Contain) 80
  102. 102. Wikipedia typing• Typing Wikipedia entities is not easy• A lot of resources, limited alignment, limited coverage, different granularity• We realized a fresh approach by – extracting NL definitions – producing RDF with FRED – refactoring FRED output to distill types – linking to DBpedia entities – linking to WordNet 3.0 – linking to SuperSenses and DOLCE+DnS (DUL) 81
  103. 103. Tìpalo: a FRED apphttp://wit.istc.cnr.it/stlab-tools/tipalo/ 82
  104. 104. Definition of chaise longueA chaise longue is an upholstered sofa in the shape of a chair that is long enough to support the legs 83
  105. 105. Conclusions• Cognitive science is quite explicit on what meaning is for humans: an activity of “framing reality for a purpose”• Frames can be keys to that relational meaning• The Semantic Web is now growing a lot of data: our tools are trying to make sense of them by using appropriate keys• Empirical extraction and discovery of frames/ knowledge patterns is feasible• But let’s join forces: meaning is what we do, and should be shared 84
  106. 106. References• Alchemy (shallow parsing): http://www.alchemyapi.com/api/demo.html• Wikifier (entity resolution): http://wit.istc.cnr.it/stlab-tools/wikifier• Zemanta (multimedia enrichment): http://www.zemanta.com/demo/• RelFinder (linked data visualization): http://www.visualdataweb.org/relfinder/demo.swf• ReVerb (relation extraction): http://openie.cs.washington.edu/• C&C (categorial grammar): http://svn.ask.it.usyd.edu.au/trac/candc/wiki/Demo• Boxer (DRT deep parsing): http://svn.ask.it.usyd.edu.au/trac/candc/wiki/boxer• FRED (DRT to RDF): http://wit.istc.cnr.it/stlab-tools/fred/• Tìpalo (Wikipedia definitions in RDF): http://wit.istc.cnr.it/stlab-tools/tipalo• Aemoo (serendipitous search): http://aemoo.org/• ontologydesignpatterns.org portal: http://www.ontologydesignpatterns.org• Talmy’s fictive motion: http://en.wikipedia.org/wiki/Fictive_motion• Pustejovsky’ dot objects: http://pages.cs.brandeis.edu/~jamesp/publications.php• Knowledge Architecture paper: http://ceur-ws.org/Vol-782/PresuttiEtAl_COLD2011.pdf• FRED paper: http://ekaw2012.ekaw.org/node/137• EKP paper: http://www.stlab.istc.cnr.it/documents/papers/wikilinkpatterns.pdf 85

×