SlideShare a Scribd company logo
1 of 54
Download to read offline
,
Representing Texts as Contextualized
Entity-Centric Linked Data Graphs
Andr´e Freitas1 Jo˜ao C. P. da Silva2 Danilo S. Carvalho2
Se´an O’Riain1 Edward Curry1
1DERI - Ireland
2Universidade Federal Rio de Janeiro - Brazil
August 27, 2013
Freitas, Silva, Carvalho, O’Riain, Curry 1/ 54
,
Outline
Motivation & Objective
Existing Approaches
Structured Discourse Graphs (SDGs)
Representation Requirements
Semantic Model Elements & Graph Patterns
Semantic Model - Formalization
Extraction
Graphia Extractor
Preliminary Evaluation
Extraction Examples
Conclusion & Future Work
Freitas, Silva, Carvalho, O’Riain, Curry 2/ 54
,
Motivation & Objective
Freitas, Silva, Carvalho, O’Riain, Curry 3/ 54
,
Motivation
Freitas, Silva, Carvalho, O’Riain, Curry 4/ 54
,
Motivation
Freitas, Silva, Carvalho, O’Riain, Curry 5/ 54
,
Motivation
Integration of unstructured information into the Linked Data Web
Natural Language Texts
Terminological variation
Complex context patterns
Ambiguous sentences
Linked Data
Terminological and structural regularity
Consensual semantics - agreement between data consumers
Freitas, Silva, Carvalho, O’Riain, Curry 6/ 54
,
Motivation
Traditional Information Extraction
Triples - Primary concerns
accuracy
consistency
lexical and structural normalization (schema/ontology)
Complemented - Best-Effort Information Extraction
domain-independency
context capture
maximization of the text semantics representation
wider extraction scope
Freitas, Silva, Carvalho, O’Riain, Curry 7/ 54
,
Motivation
What is the relationship between Barack Obama and Indonesia?
Freitas, Silva, Carvalho, O’Riain, Curry 8/ 54
,
Objective
Structured Discourse Graph (SDG) Model
Improve semantic integration, representation and interpretation of
unstructured text within the context of Linked Data.
Freitas, Silva, Carvalho, O’Riain, Curry 9/ 54
,
Objective
Features
Entity-Centric Data Model
Ontology/Vocabulary Agnostic Representation
Context Representation
Graph Extraction
Interpretation (Navegational) Model
Freitas, Silva, Carvalho, O’Riain, Curry 10/ 54
,
Existing Approaches
Freitas, Silva, Carvalho, O’Riain, Curry 11/ 54
,
Existing Approaches
Simple Relations (Triples)
Discourse Representation Structure (DRS)
Ontology-based
Freitas, Silva, Carvalho, O’Riain, Curry 12/ 54
,
Our Work
Focuses on:
Providing a principled description of a representation model
complementary to existing approaches
Supports context representation
Defining a graph representation approach for texts which can
be easily integrated to the Linked Data Web
Freitas, Silva, Carvalho, O’Riain, Curry 13/ 54
,
Structured Discourse Graphs - SDGs
Representation Requirements
Freitas, Silva, Carvalho, O’Riain, Curry 14/ 54
,
SDGs - Representation Requirements
Entity-centric graph model
Entity pivot : named entity (subject or object)
Document Centric ⇒ Entity Centric
Maximized representation of text semantics
Information in sentence ⇒ graph representation
Resulting graph should support an algorithmic interpretation of
the extracted SDG
Conceptual model independency
Ontology agnostic
Freitas, Silva, Carvalho, O’Riain, Curry 15/ 54
,
SDGs - Representation Requirements
Context Capture & Representation
Contextual information related to a triple statement should be
represented in the extracted graph
Contextual statements (such as temporality) define the context
in which another statement holds
Dependencies between sentences in the text should also be
made explicit
Freitas, Silva, Carvalho, O’Riain, Curry 16/ 54
,
SDGs - Representation Requirements
Pay-as-you-go semantic reference
Unstructured text may contain complex semantic dependencies
Should support the evolution and refinement of the semantic
model
Standardized representation compatibility
Maximize compatibility with a standards-based data
representation format, facilitating
graph integration
interoperability on the Web
Freitas, Silva, Carvalho, O’Riain, Curry 17/ 54
,
Structured Discourse Graphs - SDGs
Semantic Model Elements & Graph Patterns
Freitas, Silva, Carvalho, O’Riain, Curry 18/ 54
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Text Segmentation: (subject,predicate,object)
Named Entities
Refer to the description of entities for which one or many rigid
designators stands for the referent
Rigid designators: people, places, organization, biological
species, substances, etc
Freitas, Silva, Carvalho, O’Riain, Curry 19/ 54
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Text Segmentation: (subject,predicate,object)
Named Entities
Refer to the description of entities for which one or many rigid
designators stands for the referent
Rigid designators: people, places, organization, biological
species, substances, etc
Freitas, Silva, Carvalho, O’Riain, Curry 20/ 54
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Text Segmentation: (subject,predicate,object)
Named Entities
Refer to the description of entities for which one or many rigid
designators stands for the referent
Rigid designators: people, places, organization, biological
species, substances, etc
Freitas, Silva, Carvalho, O’Riain, Curry 21/ 54
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Co-Referential Elements
Two types co-references: pronominal and non-pronominal
Co-references can refer to either intra or inter sentences
Substituting the co-referent term by the named entity
Can corrupt the semantics of the representation
Freitas, Silva, Carvalho, O’Riain, Curry 22/ 54
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Context Elements
A semantic interpretation may depend on different contexts
(temporal context)
The main contextual information is intra-sentence
Intra-sentence context ⇒ reification
Freitas, Silva, Carvalho, O’Riain, Curry 23/ 54
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Quantifiers & Generic Operators
Quantifier: one, two, (cardinal numbers), many (much),
some, all, thousands of, one of, several, only, most of
Negation: not
Modal: could, may, shall, need to, have to, must, maybe,
always, possibly
Comparative: largest, smallest, most, largest, smallest, the
same, is equal, like, similar to, more than, less than
Freitas, Silva, Carvalho, O’Riain, Curry 24/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
In late 1988, Obama entered Harvard Law School.
Text segmentation: (Obama,entered,Harvard Law School)
Named entities: Obama, Harvard Law School
Resolve co-references: Barack Obama
Context representation: time
Freitas, Silva, Carvalho, O’Riain, Curry 25/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
In late 1988, Obama entered Harvard Law School.
Resolved & Normalized Entities
Resolved entities: a node-substitution in the graph was made
from a co-reference to a named entity
Normalized entities: entities are transformed to a normalized
form (September 1st of 2010 to 01/09/2010)
Freitas, Silva, Carvalho, O’Riain, Curry 26/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
Later in 2007, Obama sponsored an amendment to the
Defense Authorization Act to add safeguards for
personality-disorder military discharges.
Freitas, Silva, Carvalho, O’Riain, Curry 27/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
Freitas, Silva, Carvalho, O’Riain, Curry 28/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
He served three terms representing the 13th District in the
Illinois Senate from 1997 to 2004.
Non-Named (Generic) Entities
Non-named entities map to non-rigid designators
Are more subject to vocabulary variation
Have more complex compositional patterns
Freitas, Silva, Carvalho, O’Riain, Curry 29/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
Following high school, Obama moved to Los Angeles in 1979
to attend Occidental College.
Triple Trees
Not all facts extracted can be represented in one triple.
Transformation from the syntactic tree to a set of triples
The sentence subject defines the root node
Interpretation: DFS traversal of the tree
Freitas, Silva, Carvalho, O’Riain, Curry 30/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
He won election to the U.S. Senate in Illinois in November
2004.
Pronominal Co-Reference
Freitas, Silva, Carvalho, O’Riain, Curry 31/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
On Agust 23, Obama announced his election of Delaware
Senator Joe Biden as his vice presidential running mate.
Pronominal Co-Reference
Freitas, Silva, Carvalho, O’Riain, Curry 32/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
As a member of the Senate Foreign Relations Commitee,
Obama made official trips to Eastern Europe, the Middle
East, Central Asis and Africa.
Conjunctive Co-Reference
Freitas, Silva, Carvalho, O’Riain, Curry 33/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
Freitas, Silva, Carvalho, O’Riain, Curry 34/ 54
,
Structured Discourse Graphs - SDGs
Semantic Model - Formalization
Freitas, Silva, Carvalho, O’Riain, Curry 35/ 54
,
Semantic Model - Formalization
Graph pattern: atomic graph structure which maps to a
discourse structure
Named and Non-named Entities: [[ne]], [[∼ ne]] ∈ U,
where U is a set of IRIs
Basic Triple: tr = (es , p, eo) where es , eo represent
named/non-named entities associated with the subject (s) and
object (o), and p represents a relation between es and eo,
interpreted by [[tr]] = ([[es ]], [[p]], [[eo ]]) ∈ U × U × U
Core Triple: trc = (nes , p, neo)
Freitas, Silva, Carvalho, O’Riain, Curry 36/ 54
,
Semantic Model - Formalization
Reification Triple: trrei = (tr, reilink , reiobj )
tr - basic triples
reilink - relation
reiobj - reification object
Temporal Reification: special reification triple
reilink : time
reiobj : explicit or implicit data references
Interpretation:
[[trrei ]] = ([[tr]], [[reilink ]], [[reiobj ]]) ∈ U × U × U
Freitas, Silva, Carvalho, O’Riain, Curry 37/ 54
,
Semantic Model - Formalization
Quantifier Operators & Generic Operators:
opt = (eo, oplink , op)
eo - object element in a basic triple tr
op - specific operator of eo wrt oplink
Interpretation:
[[opt]] = ([[proj3(tr)]], [[oplink ]], [[op]]) = ([[eo]], [[oplink ]], [[op]])
where proj3(tr) = proj3((es , p, eo)) = eo
Freitas, Silva, Carvalho, O’Riain, Curry 38/ 54
,
Semantic Model - Formalization
Conjunctive Co-Reference: ccr = n
i=0{(e, conjlinki
, nei )}
Interpretation:
[[ccr]] = {([[e]], [[conjlinki
]], [[nei ]]) ∈ U × U × U such that
[[e]] = [[proj3(tr)]] and (∧n
i=0[[nei ]] sameas [[e]])}
Freitas, Silva, Carvalho, O’Riain, Curry 39/ 54
,
Semantic Model - Formalization
Possessive/Reflexive/Demonstrative Co-Reference:
pcr = {(∼ nei , coreflink , pr), (pr, coreflink , ej )}
Interpretation:
[[pcr]] =
{([[proj1(tr)]], [[coreflink ]], [[pr]]), ([[pr]], [[coreflink ]], [[proj3(tr)]]) ∈
U × U × U such that tr = (∼ nei , p, ej )}
Freitas, Silva, Carvalho, O’Riain, Curry 40/ 54
,
Semantic Model - Formalization
Extracted Graph: set of basic and reified triples and generic,
quantifier and co-reference operators.
Paths
Basic: sequence of basic triples
Reified: basic paths with some reified triples associated
Operational: basic paths with some operators associated
Complex: contains both reified and operational paths
Freitas, Silva, Carvalho, O’Riain, Curry 41/ 54
,
Semantic Model - Formalization
Context Triples: (tr, contextlink , ct) which indicates that a
basic triple tr can be associated with a specific context ct
[[context]] = ([[tr]], [[contextlink ]], [[ct]]) ∈ U3 × U × U
Multi-Context Graphs: is an extracted graph with more
than one context associated to its triples
if all basic triples in a path belong to an unique (same)
context, the path is an unique context (basic, reified,
operational or complex) path
otherwise, we call this path a multi-context path
Freitas, Silva, Carvalho, O’Riain, Curry 42/ 54
,
Extraction
Graphia
Freitas, Silva, Carvalho, O’Riain, Curry 43/ 54
,
Graphia
http://graphia.dcc.ufrj.br
Graphia is an information extraction pipeline
Takes factual text as input and produces SDGs as output
Graphia’s modules combine state-of-art NLP tools with an
efficient set of heuristics
Can build graphs for sentences and
entire documents.
Freitas, Silva, Carvalho, O’Riain, Curry 44/ 54
,
Graphia - Extraction Pipeline Architecture
Freitas, Silva, Carvalho, O’Riain, Curry 45/ 54
,
Graphia - Preliminary Evaluation
1033 relations (triples) from 150 sentences from 5 randomly
selected Wikipedia articles
Manually classified the graphs
Correct Graphs: fully consistent with the semantic model
Complete Graphs: correct graph which maps all the
information of a sentence
Interpretable Graphs: graph fragment which has the correct
semantics of its basic triple paths
Freitas, Silva, Carvalho, O’Riain, Curry 46/ 54
,
Graphia - Extraction Examples
In 2002 GE acquired the wind power assets of Enron during its
bankruptcy proceedings.
Freitas, Silva, Carvalho, O’Riain, Curry 47/ 54
,
Graphia - Extraction Examples
In 1935, GE was one of the top 30 companies traded
at the London Stock Exchange.
Freitas, Silva, Carvalho, O’Riain, Curry 48/ 54
,
Graphia - Extraction Examples
The Radio Corporation of America (RCA) was founded by GE in
1919 to further international radio.
Freitas, Silva, Carvalho, O’Riain, Curry 49/ 54
,
Graphia - Extraction Examples
It acquired ScanWind in 2009.
Freitas, Silva, Carvalho, O’Riain, Curry 50/ 54
,
Graphia - Extraction Examples
The new company, named GXS, is based in Gaithersburg,
Maryland.
Freitas, Silva, Carvalho, O’Riain, Curry 51/ 54
,
Conclusion & Future Work
Freitas, Silva, Carvalho, O’Riain, Curry 52/ 54
,
Conclusion & Future Work
Conclusion
SDGs provide a discourse representation as a set of
contextualized relationships
Supports entity-centric integration between graphs
Conceptual Model Independency
Worth putting effort on enumerable patterns (timestamps,
operators)
Future Work
More complex sentence patterns
Freitas, Silva, Carvalho, O’Riain, Curry 53/ 54
,
Representing Texts as Contextualized
Entity-Centric Linked Data Graphs
Andr´e Freitas1 Jo˜ao C. P. da Silva2 Danilo S. Carvalho2
Se´an O’Riain1 Edward Curry1
1DERI - Ireland
2Universidade Federal Rio de Janeiro - Brazil
August 27, 2013
Freitas, Silva, Carvalho, O’Riain, Curry 54/ 54

More Related Content

Viewers also liked

Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific Literature
eXascale Infolab
 
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Giannis Tsakonas
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Andre Freitas
 

Viewers also liked (17)

Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked Data
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLP
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific Literature
 
Semantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and RefinementSemantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and Refinement
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
 
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
 
Enhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER ModelsEnhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER Models
 
SPARQL - Basic and Federated Queries
SPARQL - Basic and Federated QueriesSPARQL - Basic and Federated Queries
SPARQL - Basic and Federated Queries
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through Entities
 
Semantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering SystemsSemantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering Systems
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 
Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.
 
Instant Question Answering System
Instant Question Answering SystemInstant Question Answering System
Instant Question Answering System
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
 
Ranking algorithms
Ranking algorithmsRanking algorithms
Ranking algorithms
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 

More from Andre Freitas

AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & TrendsAI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
Andre Freitas
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
Andre Freitas
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional Semantics
Andre Freitas
 

More from Andre Freitas (20)

AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & TrendsAI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ Manchester
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
 
Categorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsCategorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary Definitions
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology Classes
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering Systems
 
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
 
Semantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachSemantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional Approach
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
 
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
 
How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?
 
Towards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web StackTowards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web Stack
 
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional Semantics
 
On the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category DescriptorsOn the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category Descriptors
 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Representing Texts as contextualized Entity Centric Linked Data Graphs

  • 1. , Representing Texts as Contextualized Entity-Centric Linked Data Graphs Andr´e Freitas1 Jo˜ao C. P. da Silva2 Danilo S. Carvalho2 Se´an O’Riain1 Edward Curry1 1DERI - Ireland 2Universidade Federal Rio de Janeiro - Brazil August 27, 2013 Freitas, Silva, Carvalho, O’Riain, Curry 1/ 54
  • 2. , Outline Motivation & Objective Existing Approaches Structured Discourse Graphs (SDGs) Representation Requirements Semantic Model Elements & Graph Patterns Semantic Model - Formalization Extraction Graphia Extractor Preliminary Evaluation Extraction Examples Conclusion & Future Work Freitas, Silva, Carvalho, O’Riain, Curry 2/ 54
  • 3. , Motivation & Objective Freitas, Silva, Carvalho, O’Riain, Curry 3/ 54
  • 4. , Motivation Freitas, Silva, Carvalho, O’Riain, Curry 4/ 54
  • 5. , Motivation Freitas, Silva, Carvalho, O’Riain, Curry 5/ 54
  • 6. , Motivation Integration of unstructured information into the Linked Data Web Natural Language Texts Terminological variation Complex context patterns Ambiguous sentences Linked Data Terminological and structural regularity Consensual semantics - agreement between data consumers Freitas, Silva, Carvalho, O’Riain, Curry 6/ 54
  • 7. , Motivation Traditional Information Extraction Triples - Primary concerns accuracy consistency lexical and structural normalization (schema/ontology) Complemented - Best-Effort Information Extraction domain-independency context capture maximization of the text semantics representation wider extraction scope Freitas, Silva, Carvalho, O’Riain, Curry 7/ 54
  • 8. , Motivation What is the relationship between Barack Obama and Indonesia? Freitas, Silva, Carvalho, O’Riain, Curry 8/ 54
  • 9. , Objective Structured Discourse Graph (SDG) Model Improve semantic integration, representation and interpretation of unstructured text within the context of Linked Data. Freitas, Silva, Carvalho, O’Riain, Curry 9/ 54
  • 10. , Objective Features Entity-Centric Data Model Ontology/Vocabulary Agnostic Representation Context Representation Graph Extraction Interpretation (Navegational) Model Freitas, Silva, Carvalho, O’Riain, Curry 10/ 54
  • 11. , Existing Approaches Freitas, Silva, Carvalho, O’Riain, Curry 11/ 54
  • 12. , Existing Approaches Simple Relations (Triples) Discourse Representation Structure (DRS) Ontology-based Freitas, Silva, Carvalho, O’Riain, Curry 12/ 54
  • 13. , Our Work Focuses on: Providing a principled description of a representation model complementary to existing approaches Supports context representation Defining a graph representation approach for texts which can be easily integrated to the Linked Data Web Freitas, Silva, Carvalho, O’Riain, Curry 13/ 54
  • 14. , Structured Discourse Graphs - SDGs Representation Requirements Freitas, Silva, Carvalho, O’Riain, Curry 14/ 54
  • 15. , SDGs - Representation Requirements Entity-centric graph model Entity pivot : named entity (subject or object) Document Centric ⇒ Entity Centric Maximized representation of text semantics Information in sentence ⇒ graph representation Resulting graph should support an algorithmic interpretation of the extracted SDG Conceptual model independency Ontology agnostic Freitas, Silva, Carvalho, O’Riain, Curry 15/ 54
  • 16. , SDGs - Representation Requirements Context Capture & Representation Contextual information related to a triple statement should be represented in the extracted graph Contextual statements (such as temporality) define the context in which another statement holds Dependencies between sentences in the text should also be made explicit Freitas, Silva, Carvalho, O’Riain, Curry 16/ 54
  • 17. , SDGs - Representation Requirements Pay-as-you-go semantic reference Unstructured text may contain complex semantic dependencies Should support the evolution and refinement of the semantic model Standardized representation compatibility Maximize compatibility with a standards-based data representation format, facilitating graph integration interoperability on the Web Freitas, Silva, Carvalho, O’Riain, Curry 17/ 54
  • 18. , Structured Discourse Graphs - SDGs Semantic Model Elements & Graph Patterns Freitas, Silva, Carvalho, O’Riain, Curry 18/ 54
  • 19. , SDGs - Semantic Model Elements In late 1988, Obama entered Harvard Law School. Text Segmentation: (subject,predicate,object) Named Entities Refer to the description of entities for which one or many rigid designators stands for the referent Rigid designators: people, places, organization, biological species, substances, etc Freitas, Silva, Carvalho, O’Riain, Curry 19/ 54
  • 20. , SDGs - Semantic Model Elements In late 1988, Obama entered Harvard Law School. Text Segmentation: (subject,predicate,object) Named Entities Refer to the description of entities for which one or many rigid designators stands for the referent Rigid designators: people, places, organization, biological species, substances, etc Freitas, Silva, Carvalho, O’Riain, Curry 20/ 54
  • 21. , SDGs - Semantic Model Elements In late 1988, Obama entered Harvard Law School. Text Segmentation: (subject,predicate,object) Named Entities Refer to the description of entities for which one or many rigid designators stands for the referent Rigid designators: people, places, organization, biological species, substances, etc Freitas, Silva, Carvalho, O’Riain, Curry 21/ 54
  • 22. , SDGs - Semantic Model Elements In late 1988, Obama entered Harvard Law School. Co-Referential Elements Two types co-references: pronominal and non-pronominal Co-references can refer to either intra or inter sentences Substituting the co-referent term by the named entity Can corrupt the semantics of the representation Freitas, Silva, Carvalho, O’Riain, Curry 22/ 54
  • 23. , SDGs - Semantic Model Elements In late 1988, Obama entered Harvard Law School. Context Elements A semantic interpretation may depend on different contexts (temporal context) The main contextual information is intra-sentence Intra-sentence context ⇒ reification Freitas, Silva, Carvalho, O’Riain, Curry 23/ 54
  • 24. , SDGs - Semantic Model Elements In late 1988, Obama entered Harvard Law School. Quantifiers & Generic Operators Quantifier: one, two, (cardinal numbers), many (much), some, all, thousands of, one of, several, only, most of Negation: not Modal: could, may, shall, need to, have to, must, maybe, always, possibly Comparative: largest, smallest, most, largest, smallest, the same, is equal, like, similar to, more than, less than Freitas, Silva, Carvalho, O’Riain, Curry 24/ 54
  • 25. , SDGs - Semantic Model Elements & Graph Patterns In late 1988, Obama entered Harvard Law School. Text segmentation: (Obama,entered,Harvard Law School) Named entities: Obama, Harvard Law School Resolve co-references: Barack Obama Context representation: time Freitas, Silva, Carvalho, O’Riain, Curry 25/ 54
  • 26. , SDGs - Semantic Model Elements & Graph Patterns In late 1988, Obama entered Harvard Law School. Resolved & Normalized Entities Resolved entities: a node-substitution in the graph was made from a co-reference to a named entity Normalized entities: entities are transformed to a normalized form (September 1st of 2010 to 01/09/2010) Freitas, Silva, Carvalho, O’Riain, Curry 26/ 54
  • 27. , SDGs - Semantic Model Elements & Graph Patterns Later in 2007, Obama sponsored an amendment to the Defense Authorization Act to add safeguards for personality-disorder military discharges. Freitas, Silva, Carvalho, O’Riain, Curry 27/ 54
  • 28. , SDGs - Semantic Model Elements & Graph Patterns Freitas, Silva, Carvalho, O’Riain, Curry 28/ 54
  • 29. , SDGs - Semantic Model Elements & Graph Patterns He served three terms representing the 13th District in the Illinois Senate from 1997 to 2004. Non-Named (Generic) Entities Non-named entities map to non-rigid designators Are more subject to vocabulary variation Have more complex compositional patterns Freitas, Silva, Carvalho, O’Riain, Curry 29/ 54
  • 30. , SDGs - Semantic Model Elements & Graph Patterns Following high school, Obama moved to Los Angeles in 1979 to attend Occidental College. Triple Trees Not all facts extracted can be represented in one triple. Transformation from the syntactic tree to a set of triples The sentence subject defines the root node Interpretation: DFS traversal of the tree Freitas, Silva, Carvalho, O’Riain, Curry 30/ 54
  • 31. , SDGs - Semantic Model Elements & Graph Patterns He won election to the U.S. Senate in Illinois in November 2004. Pronominal Co-Reference Freitas, Silva, Carvalho, O’Riain, Curry 31/ 54
  • 32. , SDGs - Semantic Model Elements & Graph Patterns On Agust 23, Obama announced his election of Delaware Senator Joe Biden as his vice presidential running mate. Pronominal Co-Reference Freitas, Silva, Carvalho, O’Riain, Curry 32/ 54
  • 33. , SDGs - Semantic Model Elements & Graph Patterns As a member of the Senate Foreign Relations Commitee, Obama made official trips to Eastern Europe, the Middle East, Central Asis and Africa. Conjunctive Co-Reference Freitas, Silva, Carvalho, O’Riain, Curry 33/ 54
  • 34. , SDGs - Semantic Model Elements & Graph Patterns Freitas, Silva, Carvalho, O’Riain, Curry 34/ 54
  • 35. , Structured Discourse Graphs - SDGs Semantic Model - Formalization Freitas, Silva, Carvalho, O’Riain, Curry 35/ 54
  • 36. , Semantic Model - Formalization Graph pattern: atomic graph structure which maps to a discourse structure Named and Non-named Entities: [[ne]], [[∼ ne]] ∈ U, where U is a set of IRIs Basic Triple: tr = (es , p, eo) where es , eo represent named/non-named entities associated with the subject (s) and object (o), and p represents a relation between es and eo, interpreted by [[tr]] = ([[es ]], [[p]], [[eo ]]) ∈ U × U × U Core Triple: trc = (nes , p, neo) Freitas, Silva, Carvalho, O’Riain, Curry 36/ 54
  • 37. , Semantic Model - Formalization Reification Triple: trrei = (tr, reilink , reiobj ) tr - basic triples reilink - relation reiobj - reification object Temporal Reification: special reification triple reilink : time reiobj : explicit or implicit data references Interpretation: [[trrei ]] = ([[tr]], [[reilink ]], [[reiobj ]]) ∈ U × U × U Freitas, Silva, Carvalho, O’Riain, Curry 37/ 54
  • 38. , Semantic Model - Formalization Quantifier Operators & Generic Operators: opt = (eo, oplink , op) eo - object element in a basic triple tr op - specific operator of eo wrt oplink Interpretation: [[opt]] = ([[proj3(tr)]], [[oplink ]], [[op]]) = ([[eo]], [[oplink ]], [[op]]) where proj3(tr) = proj3((es , p, eo)) = eo Freitas, Silva, Carvalho, O’Riain, Curry 38/ 54
  • 39. , Semantic Model - Formalization Conjunctive Co-Reference: ccr = n i=0{(e, conjlinki , nei )} Interpretation: [[ccr]] = {([[e]], [[conjlinki ]], [[nei ]]) ∈ U × U × U such that [[e]] = [[proj3(tr)]] and (∧n i=0[[nei ]] sameas [[e]])} Freitas, Silva, Carvalho, O’Riain, Curry 39/ 54
  • 40. , Semantic Model - Formalization Possessive/Reflexive/Demonstrative Co-Reference: pcr = {(∼ nei , coreflink , pr), (pr, coreflink , ej )} Interpretation: [[pcr]] = {([[proj1(tr)]], [[coreflink ]], [[pr]]), ([[pr]], [[coreflink ]], [[proj3(tr)]]) ∈ U × U × U such that tr = (∼ nei , p, ej )} Freitas, Silva, Carvalho, O’Riain, Curry 40/ 54
  • 41. , Semantic Model - Formalization Extracted Graph: set of basic and reified triples and generic, quantifier and co-reference operators. Paths Basic: sequence of basic triples Reified: basic paths with some reified triples associated Operational: basic paths with some operators associated Complex: contains both reified and operational paths Freitas, Silva, Carvalho, O’Riain, Curry 41/ 54
  • 42. , Semantic Model - Formalization Context Triples: (tr, contextlink , ct) which indicates that a basic triple tr can be associated with a specific context ct [[context]] = ([[tr]], [[contextlink ]], [[ct]]) ∈ U3 × U × U Multi-Context Graphs: is an extracted graph with more than one context associated to its triples if all basic triples in a path belong to an unique (same) context, the path is an unique context (basic, reified, operational or complex) path otherwise, we call this path a multi-context path Freitas, Silva, Carvalho, O’Riain, Curry 42/ 54
  • 44. , Graphia http://graphia.dcc.ufrj.br Graphia is an information extraction pipeline Takes factual text as input and produces SDGs as output Graphia’s modules combine state-of-art NLP tools with an efficient set of heuristics Can build graphs for sentences and entire documents. Freitas, Silva, Carvalho, O’Riain, Curry 44/ 54
  • 45. , Graphia - Extraction Pipeline Architecture Freitas, Silva, Carvalho, O’Riain, Curry 45/ 54
  • 46. , Graphia - Preliminary Evaluation 1033 relations (triples) from 150 sentences from 5 randomly selected Wikipedia articles Manually classified the graphs Correct Graphs: fully consistent with the semantic model Complete Graphs: correct graph which maps all the information of a sentence Interpretable Graphs: graph fragment which has the correct semantics of its basic triple paths Freitas, Silva, Carvalho, O’Riain, Curry 46/ 54
  • 47. , Graphia - Extraction Examples In 2002 GE acquired the wind power assets of Enron during its bankruptcy proceedings. Freitas, Silva, Carvalho, O’Riain, Curry 47/ 54
  • 48. , Graphia - Extraction Examples In 1935, GE was one of the top 30 companies traded at the London Stock Exchange. Freitas, Silva, Carvalho, O’Riain, Curry 48/ 54
  • 49. , Graphia - Extraction Examples The Radio Corporation of America (RCA) was founded by GE in 1919 to further international radio. Freitas, Silva, Carvalho, O’Riain, Curry 49/ 54
  • 50. , Graphia - Extraction Examples It acquired ScanWind in 2009. Freitas, Silva, Carvalho, O’Riain, Curry 50/ 54
  • 51. , Graphia - Extraction Examples The new company, named GXS, is based in Gaithersburg, Maryland. Freitas, Silva, Carvalho, O’Riain, Curry 51/ 54
  • 52. , Conclusion & Future Work Freitas, Silva, Carvalho, O’Riain, Curry 52/ 54
  • 53. , Conclusion & Future Work Conclusion SDGs provide a discourse representation as a set of contextualized relationships Supports entity-centric integration between graphs Conceptual Model Independency Worth putting effort on enumerable patterns (timestamps, operators) Future Work More complex sentence patterns Freitas, Silva, Carvalho, O’Riain, Curry 53/ 54
  • 54. , Representing Texts as Contextualized Entity-Centric Linked Data Graphs Andr´e Freitas1 Jo˜ao C. P. da Silva2 Danilo S. Carvalho2 Se´an O’Riain1 Edward Curry1 1DERI - Ireland 2Universidade Federal Rio de Janeiro - Brazil August 27, 2013 Freitas, Silva, Carvalho, O’Riain, Curry 54/ 54