,
Representing Texts as Contextualized
Entity-Centric Linked Data Graphs
Andr´e Freitas1 Jo˜ao C. P. da Silva2 Danilo S. C...
,
Outline
Motivation & Objective
Existing Approaches
Structured Discourse Graphs (SDGs)
Representation Requirements
Semant...
,
Motivation & Objective
Freitas, Silva, Carvalho, O’Riain, Curry 3/ 54
,
Motivation
Freitas, Silva, Carvalho, O’Riain, Curry 4/ 54
,
Motivation
Freitas, Silva, Carvalho, O’Riain, Curry 5/ 54
,
Motivation
Integration of unstructured information into the Linked Data Web
Natural Language Texts
Terminological variat...
,
Motivation
Traditional Information Extraction
Triples - Primary concerns
accuracy
consistency
lexical and structural nor...
,
Motivation
What is the relationship between Barack Obama and Indonesia?
Freitas, Silva, Carvalho, O’Riain, Curry 8/ 54
,
Objective
Structured Discourse Graph (SDG) Model
Improve semantic integration, representation and interpretation of
unst...
,
Objective
Features
Entity-Centric Data Model
Ontology/Vocabulary Agnostic Representation
Context Representation
Graph Ex...
,
Existing Approaches
Freitas, Silva, Carvalho, O’Riain, Curry 11/ 54
,
Existing Approaches
Simple Relations (Triples)
Discourse Representation Structure (DRS)
Ontology-based
Freitas, Silva, C...
,
Our Work
Focuses on:
Providing a principled description of a representation model
complementary to existing approaches
S...
,
Structured Discourse Graphs - SDGs
Representation Requirements
Freitas, Silva, Carvalho, O’Riain, Curry 14/ 54
,
SDGs - Representation Requirements
Entity-centric graph model
Entity pivot : named entity (subject or object)
Document C...
,
SDGs - Representation Requirements
Context Capture & Representation
Contextual information related to a triple statement...
,
SDGs - Representation Requirements
Pay-as-you-go semantic reference
Unstructured text may contain complex semantic depen...
,
Structured Discourse Graphs - SDGs
Semantic Model Elements & Graph Patterns
Freitas, Silva, Carvalho, O’Riain, Curry 18/...
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Text Segmentation: (subject,predicate,obj...
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Text Segmentation: (subject,predicate,obj...
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Text Segmentation: (subject,predicate,obj...
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Co-Referential Elements
Two types co-refe...
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Context Elements
A semantic interpretatio...
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Quantifiers & Generic Operators
Quantifier:...
,
SDGs - Semantic Model Elements & Graph Patterns
In late 1988, Obama entered Harvard Law School.
Text segmentation: (Obam...
,
SDGs - Semantic Model Elements & Graph Patterns
In late 1988, Obama entered Harvard Law School.
Resolved & Normalized En...
,
SDGs - Semantic Model Elements & Graph Patterns
Later in 2007, Obama sponsored an amendment to the
Defense Authorization...
,
SDGs - Semantic Model Elements & Graph Patterns
Freitas, Silva, Carvalho, O’Riain, Curry 28/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
He served three terms representing the 13th District in the
Illinois Sen...
,
SDGs - Semantic Model Elements & Graph Patterns
Following high school, Obama moved to Los Angeles in 1979
to attend Occi...
,
SDGs - Semantic Model Elements & Graph Patterns
He won election to the U.S. Senate in Illinois in November
2004.
Pronomi...
,
SDGs - Semantic Model Elements & Graph Patterns
On Agust 23, Obama announced his election of Delaware
Senator Joe Biden ...
,
SDGs - Semantic Model Elements & Graph Patterns
As a member of the Senate Foreign Relations Commitee,
Obama made official ...
,
SDGs - Semantic Model Elements & Graph Patterns
Freitas, Silva, Carvalho, O’Riain, Curry 34/ 54
,
Structured Discourse Graphs - SDGs
Semantic Model - Formalization
Freitas, Silva, Carvalho, O’Riain, Curry 35/ 54
,
Semantic Model - Formalization
Graph pattern: atomic graph structure which maps to a
discourse structure
Named and Non-n...
,
Semantic Model - Formalization
Reification Triple: trrei = (tr, reilink , reiobj )
tr - basic triples
reilink - relation
...
,
Semantic Model - Formalization
Quantifier Operators & Generic Operators:
opt = (eo, oplink , op)
eo - object element in a...
,
Semantic Model - Formalization
Conjunctive Co-Reference: ccr = n
i=0{(e, conjlinki
, nei )}
Interpretation:
[[ccr]] = {(...
,
Semantic Model - Formalization
Possessive/Reflexive/Demonstrative Co-Reference:
pcr = {(∼ nei , coreflink , pr), (pr, cor...
,
Semantic Model - Formalization
Extracted Graph: set of basic and reified triples and generic,
quantifier and co-reference ...
,
Semantic Model - Formalization
Context Triples: (tr, contextlink , ct) which indicates that a
basic triple tr can be ass...
,
Extraction
Graphia
Freitas, Silva, Carvalho, O’Riain, Curry 43/ 54
,
Graphia
http://graphia.dcc.ufrj.br
Graphia is an information extraction pipeline
Takes factual text as input and produce...
,
Graphia - Extraction Pipeline Architecture
Freitas, Silva, Carvalho, O’Riain, Curry 45/ 54
,
Graphia - Preliminary Evaluation
1033 relations (triples) from 150 sentences from 5 randomly
selected Wikipedia articles...
,
Graphia - Extraction Examples
In 2002 GE acquired the wind power assets of Enron during its
bankruptcy proceedings.
Frei...
,
Graphia - Extraction Examples
In 1935, GE was one of the top 30 companies traded
at the London Stock Exchange.
Freitas, ...
,
Graphia - Extraction Examples
The Radio Corporation of America (RCA) was founded by GE in
1919 to further international ...
,
Graphia - Extraction Examples
It acquired ScanWind in 2009.
Freitas, Silva, Carvalho, O’Riain, Curry 50/ 54
,
Graphia - Extraction Examples
The new company, named GXS, is based in Gaithersburg,
Maryland.
Freitas, Silva, Carvalho, ...
,
Conclusion & Future Work
Freitas, Silva, Carvalho, O’Riain, Curry 52/ 54
,
Conclusion & Future Work
Conclusion
SDGs provide a discourse representation as a set of
contextualized relationships
Sup...
,
Representing Texts as Contextualized
Entity-Centric Linked Data Graphs
Andr´e Freitas1 Jo˜ao C. P. da Silva2 Danilo S. C...
Upcoming SlideShare
Loading in …5
×

Representing Texts as contextualized Entity Centric Linked Data Graphs

884 views
797 views

Published on

The integration of a small fraction of the information present in the Web of Documents to the Linked Data
Web can provide a significant shift on the amount of information available to data consumers. However, information extracted from text does not easily fit into the usually highly normalized structure of ontology-based datasets. While the representation of structured data assumes a high level of regularity, relatively
simple and consistent conceptual models, the representation of information extracted from texts need to take into account large terminological variation, complex contextual/dependency patterns, and fuzzy or conflicting semantics. This work focuses on bridging the gap between structured and unstructured data, proposing the representation of text as structured discourse graphs (SDGs), targeting an RDF representation of unstructured
data. The representation focuses on a semantic best-effort information extraction scenario, where information from text is extracted under a pay-as-you-go data quality perspective, trading terminological normalization for domain-independency, context capture, wider representation scope and maximization of textual
information capture.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
884
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Representing Texts as contextualized Entity Centric Linked Data Graphs

  1. 1. , Representing Texts as Contextualized Entity-Centric Linked Data Graphs Andr´e Freitas1 Jo˜ao C. P. da Silva2 Danilo S. Carvalho2 Se´an O’Riain1 Edward Curry1 1DERI - Ireland 2Universidade Federal Rio de Janeiro - Brazil August 27, 2013 Freitas, Silva, Carvalho, O’Riain, Curry 1/ 54
  2. 2. , Outline Motivation & Objective Existing Approaches Structured Discourse Graphs (SDGs) Representation Requirements Semantic Model Elements & Graph Patterns Semantic Model - Formalization Extraction Graphia Extractor Preliminary Evaluation Extraction Examples Conclusion & Future Work Freitas, Silva, Carvalho, O’Riain, Curry 2/ 54
  3. 3. , Motivation & Objective Freitas, Silva, Carvalho, O’Riain, Curry 3/ 54
  4. 4. , Motivation Freitas, Silva, Carvalho, O’Riain, Curry 4/ 54
  5. 5. , Motivation Freitas, Silva, Carvalho, O’Riain, Curry 5/ 54
  6. 6. , Motivation Integration of unstructured information into the Linked Data Web Natural Language Texts Terminological variation Complex context patterns Ambiguous sentences Linked Data Terminological and structural regularity Consensual semantics - agreement between data consumers Freitas, Silva, Carvalho, O’Riain, Curry 6/ 54
  7. 7. , Motivation Traditional Information Extraction Triples - Primary concerns accuracy consistency lexical and structural normalization (schema/ontology) Complemented - Best-Effort Information Extraction domain-independency context capture maximization of the text semantics representation wider extraction scope Freitas, Silva, Carvalho, O’Riain, Curry 7/ 54
  8. 8. , Motivation What is the relationship between Barack Obama and Indonesia? Freitas, Silva, Carvalho, O’Riain, Curry 8/ 54
  9. 9. , Objective Structured Discourse Graph (SDG) Model Improve semantic integration, representation and interpretation of unstructured text within the context of Linked Data. Freitas, Silva, Carvalho, O’Riain, Curry 9/ 54
  10. 10. , Objective Features Entity-Centric Data Model Ontology/Vocabulary Agnostic Representation Context Representation Graph Extraction Interpretation (Navegational) Model Freitas, Silva, Carvalho, O’Riain, Curry 10/ 54
  11. 11. , Existing Approaches Freitas, Silva, Carvalho, O’Riain, Curry 11/ 54
  12. 12. , Existing Approaches Simple Relations (Triples) Discourse Representation Structure (DRS) Ontology-based Freitas, Silva, Carvalho, O’Riain, Curry 12/ 54
  13. 13. , Our Work Focuses on: Providing a principled description of a representation model complementary to existing approaches Supports context representation Defining a graph representation approach for texts which can be easily integrated to the Linked Data Web Freitas, Silva, Carvalho, O’Riain, Curry 13/ 54
  14. 14. , Structured Discourse Graphs - SDGs Representation Requirements Freitas, Silva, Carvalho, O’Riain, Curry 14/ 54
  15. 15. , SDGs - Representation Requirements Entity-centric graph model Entity pivot : named entity (subject or object) Document Centric ⇒ Entity Centric Maximized representation of text semantics Information in sentence ⇒ graph representation Resulting graph should support an algorithmic interpretation of the extracted SDG Conceptual model independency Ontology agnostic Freitas, Silva, Carvalho, O’Riain, Curry 15/ 54
  16. 16. , SDGs - Representation Requirements Context Capture & Representation Contextual information related to a triple statement should be represented in the extracted graph Contextual statements (such as temporality) define the context in which another statement holds Dependencies between sentences in the text should also be made explicit Freitas, Silva, Carvalho, O’Riain, Curry 16/ 54
  17. 17. , SDGs - Representation Requirements Pay-as-you-go semantic reference Unstructured text may contain complex semantic dependencies Should support the evolution and refinement of the semantic model Standardized representation compatibility Maximize compatibility with a standards-based data representation format, facilitating graph integration interoperability on the Web Freitas, Silva, Carvalho, O’Riain, Curry 17/ 54
  18. 18. , Structured Discourse Graphs - SDGs Semantic Model Elements & Graph Patterns Freitas, Silva, Carvalho, O’Riain, Curry 18/ 54
  19. 19. , SDGs - Semantic Model Elements In late 1988, Obama entered Harvard Law School. Text Segmentation: (subject,predicate,object) Named Entities Refer to the description of entities for which one or many rigid designators stands for the referent Rigid designators: people, places, organization, biological species, substances, etc Freitas, Silva, Carvalho, O’Riain, Curry 19/ 54
  20. 20. , SDGs - Semantic Model Elements In late 1988, Obama entered Harvard Law School. Text Segmentation: (subject,predicate,object) Named Entities Refer to the description of entities for which one or many rigid designators stands for the referent Rigid designators: people, places, organization, biological species, substances, etc Freitas, Silva, Carvalho, O’Riain, Curry 20/ 54
  21. 21. , SDGs - Semantic Model Elements In late 1988, Obama entered Harvard Law School. Text Segmentation: (subject,predicate,object) Named Entities Refer to the description of entities for which one or many rigid designators stands for the referent Rigid designators: people, places, organization, biological species, substances, etc Freitas, Silva, Carvalho, O’Riain, Curry 21/ 54
  22. 22. , SDGs - Semantic Model Elements In late 1988, Obama entered Harvard Law School. Co-Referential Elements Two types co-references: pronominal and non-pronominal Co-references can refer to either intra or inter sentences Substituting the co-referent term by the named entity Can corrupt the semantics of the representation Freitas, Silva, Carvalho, O’Riain, Curry 22/ 54
  23. 23. , SDGs - Semantic Model Elements In late 1988, Obama entered Harvard Law School. Context Elements A semantic interpretation may depend on different contexts (temporal context) The main contextual information is intra-sentence Intra-sentence context ⇒ reification Freitas, Silva, Carvalho, O’Riain, Curry 23/ 54
  24. 24. , SDGs - Semantic Model Elements In late 1988, Obama entered Harvard Law School. Quantifiers & Generic Operators Quantifier: one, two, (cardinal numbers), many (much), some, all, thousands of, one of, several, only, most of Negation: not Modal: could, may, shall, need to, have to, must, maybe, always, possibly Comparative: largest, smallest, most, largest, smallest, the same, is equal, like, similar to, more than, less than Freitas, Silva, Carvalho, O’Riain, Curry 24/ 54
  25. 25. , SDGs - Semantic Model Elements & Graph Patterns In late 1988, Obama entered Harvard Law School. Text segmentation: (Obama,entered,Harvard Law School) Named entities: Obama, Harvard Law School Resolve co-references: Barack Obama Context representation: time Freitas, Silva, Carvalho, O’Riain, Curry 25/ 54
  26. 26. , SDGs - Semantic Model Elements & Graph Patterns In late 1988, Obama entered Harvard Law School. Resolved & Normalized Entities Resolved entities: a node-substitution in the graph was made from a co-reference to a named entity Normalized entities: entities are transformed to a normalized form (September 1st of 2010 to 01/09/2010) Freitas, Silva, Carvalho, O’Riain, Curry 26/ 54
  27. 27. , SDGs - Semantic Model Elements & Graph Patterns Later in 2007, Obama sponsored an amendment to the Defense Authorization Act to add safeguards for personality-disorder military discharges. Freitas, Silva, Carvalho, O’Riain, Curry 27/ 54
  28. 28. , SDGs - Semantic Model Elements & Graph Patterns Freitas, Silva, Carvalho, O’Riain, Curry 28/ 54
  29. 29. , SDGs - Semantic Model Elements & Graph Patterns He served three terms representing the 13th District in the Illinois Senate from 1997 to 2004. Non-Named (Generic) Entities Non-named entities map to non-rigid designators Are more subject to vocabulary variation Have more complex compositional patterns Freitas, Silva, Carvalho, O’Riain, Curry 29/ 54
  30. 30. , SDGs - Semantic Model Elements & Graph Patterns Following high school, Obama moved to Los Angeles in 1979 to attend Occidental College. Triple Trees Not all facts extracted can be represented in one triple. Transformation from the syntactic tree to a set of triples The sentence subject defines the root node Interpretation: DFS traversal of the tree Freitas, Silva, Carvalho, O’Riain, Curry 30/ 54
  31. 31. , SDGs - Semantic Model Elements & Graph Patterns He won election to the U.S. Senate in Illinois in November 2004. Pronominal Co-Reference Freitas, Silva, Carvalho, O’Riain, Curry 31/ 54
  32. 32. , SDGs - Semantic Model Elements & Graph Patterns On Agust 23, Obama announced his election of Delaware Senator Joe Biden as his vice presidential running mate. Pronominal Co-Reference Freitas, Silva, Carvalho, O’Riain, Curry 32/ 54
  33. 33. , SDGs - Semantic Model Elements & Graph Patterns As a member of the Senate Foreign Relations Commitee, Obama made official trips to Eastern Europe, the Middle East, Central Asis and Africa. Conjunctive Co-Reference Freitas, Silva, Carvalho, O’Riain, Curry 33/ 54
  34. 34. , SDGs - Semantic Model Elements & Graph Patterns Freitas, Silva, Carvalho, O’Riain, Curry 34/ 54
  35. 35. , Structured Discourse Graphs - SDGs Semantic Model - Formalization Freitas, Silva, Carvalho, O’Riain, Curry 35/ 54
  36. 36. , Semantic Model - Formalization Graph pattern: atomic graph structure which maps to a discourse structure Named and Non-named Entities: [[ne]], [[∼ ne]] ∈ U, where U is a set of IRIs Basic Triple: tr = (es , p, eo) where es , eo represent named/non-named entities associated with the subject (s) and object (o), and p represents a relation between es and eo, interpreted by [[tr]] = ([[es ]], [[p]], [[eo ]]) ∈ U × U × U Core Triple: trc = (nes , p, neo) Freitas, Silva, Carvalho, O’Riain, Curry 36/ 54
  37. 37. , Semantic Model - Formalization Reification Triple: trrei = (tr, reilink , reiobj ) tr - basic triples reilink - relation reiobj - reification object Temporal Reification: special reification triple reilink : time reiobj : explicit or implicit data references Interpretation: [[trrei ]] = ([[tr]], [[reilink ]], [[reiobj ]]) ∈ U × U × U Freitas, Silva, Carvalho, O’Riain, Curry 37/ 54
  38. 38. , Semantic Model - Formalization Quantifier Operators & Generic Operators: opt = (eo, oplink , op) eo - object element in a basic triple tr op - specific operator of eo wrt oplink Interpretation: [[opt]] = ([[proj3(tr)]], [[oplink ]], [[op]]) = ([[eo]], [[oplink ]], [[op]]) where proj3(tr) = proj3((es , p, eo)) = eo Freitas, Silva, Carvalho, O’Riain, Curry 38/ 54
  39. 39. , Semantic Model - Formalization Conjunctive Co-Reference: ccr = n i=0{(e, conjlinki , nei )} Interpretation: [[ccr]] = {([[e]], [[conjlinki ]], [[nei ]]) ∈ U × U × U such that [[e]] = [[proj3(tr)]] and (∧n i=0[[nei ]] sameas [[e]])} Freitas, Silva, Carvalho, O’Riain, Curry 39/ 54
  40. 40. , Semantic Model - Formalization Possessive/Reflexive/Demonstrative Co-Reference: pcr = {(∼ nei , coreflink , pr), (pr, coreflink , ej )} Interpretation: [[pcr]] = {([[proj1(tr)]], [[coreflink ]], [[pr]]), ([[pr]], [[coreflink ]], [[proj3(tr)]]) ∈ U × U × U such that tr = (∼ nei , p, ej )} Freitas, Silva, Carvalho, O’Riain, Curry 40/ 54
  41. 41. , Semantic Model - Formalization Extracted Graph: set of basic and reified triples and generic, quantifier and co-reference operators. Paths Basic: sequence of basic triples Reified: basic paths with some reified triples associated Operational: basic paths with some operators associated Complex: contains both reified and operational paths Freitas, Silva, Carvalho, O’Riain, Curry 41/ 54
  42. 42. , Semantic Model - Formalization Context Triples: (tr, contextlink , ct) which indicates that a basic triple tr can be associated with a specific context ct [[context]] = ([[tr]], [[contextlink ]], [[ct]]) ∈ U3 × U × U Multi-Context Graphs: is an extracted graph with more than one context associated to its triples if all basic triples in a path belong to an unique (same) context, the path is an unique context (basic, reified, operational or complex) path otherwise, we call this path a multi-context path Freitas, Silva, Carvalho, O’Riain, Curry 42/ 54
  43. 43. , Extraction Graphia Freitas, Silva, Carvalho, O’Riain, Curry 43/ 54
  44. 44. , Graphia http://graphia.dcc.ufrj.br Graphia is an information extraction pipeline Takes factual text as input and produces SDGs as output Graphia’s modules combine state-of-art NLP tools with an efficient set of heuristics Can build graphs for sentences and entire documents. Freitas, Silva, Carvalho, O’Riain, Curry 44/ 54
  45. 45. , Graphia - Extraction Pipeline Architecture Freitas, Silva, Carvalho, O’Riain, Curry 45/ 54
  46. 46. , Graphia - Preliminary Evaluation 1033 relations (triples) from 150 sentences from 5 randomly selected Wikipedia articles Manually classified the graphs Correct Graphs: fully consistent with the semantic model Complete Graphs: correct graph which maps all the information of a sentence Interpretable Graphs: graph fragment which has the correct semantics of its basic triple paths Freitas, Silva, Carvalho, O’Riain, Curry 46/ 54
  47. 47. , Graphia - Extraction Examples In 2002 GE acquired the wind power assets of Enron during its bankruptcy proceedings. Freitas, Silva, Carvalho, O’Riain, Curry 47/ 54
  48. 48. , Graphia - Extraction Examples In 1935, GE was one of the top 30 companies traded at the London Stock Exchange. Freitas, Silva, Carvalho, O’Riain, Curry 48/ 54
  49. 49. , Graphia - Extraction Examples The Radio Corporation of America (RCA) was founded by GE in 1919 to further international radio. Freitas, Silva, Carvalho, O’Riain, Curry 49/ 54
  50. 50. , Graphia - Extraction Examples It acquired ScanWind in 2009. Freitas, Silva, Carvalho, O’Riain, Curry 50/ 54
  51. 51. , Graphia - Extraction Examples The new company, named GXS, is based in Gaithersburg, Maryland. Freitas, Silva, Carvalho, O’Riain, Curry 51/ 54
  52. 52. , Conclusion & Future Work Freitas, Silva, Carvalho, O’Riain, Curry 52/ 54
  53. 53. , Conclusion & Future Work Conclusion SDGs provide a discourse representation as a set of contextualized relationships Supports entity-centric integration between graphs Conceptual Model Independency Worth putting effort on enumerable patterns (timestamps, operators) Future Work More complex sentence patterns Freitas, Silva, Carvalho, O’Riain, Curry 53/ 54
  54. 54. , Representing Texts as Contextualized Entity-Centric Linked Data Graphs Andr´e Freitas1 Jo˜ao C. P. da Silva2 Danilo S. Carvalho2 Se´an O’Riain1 Edward Curry1 1DERI - Ireland 2Universidade Federal Rio de Janeiro - Brazil August 27, 2013 Freitas, Silva, Carvalho, O’Riain, Curry 54/ 54

×