NED with two-stage
coherence optimization
Filip Ilievski, Marieke van Erp, Piek Vossen,
Wouter Beek & Stefan Schlobach
or
How I am teaching my bottle of Jack Daniel’s not to turn into a
168-years-old person with a net income of $120.000.000
Context
... is being persistently
avoided when processing
language by machines. No wonder. The context
is hard to quantify.
but the context lies in
the basis of the human
communication!
The burden of context in language
● The language is context-dependent
● Verbal context
○ Ford fell from a tree.
■ What is “Ford” ?
● Social context
○ What is “2+2” ?
■ In mathematics it is 4
■ In the car domain it is a car configuration: 2 front + 2
back seats
■ In psychology it is a family with 2 parents and 2 children
Lincoln increased the annual vehicle
sales to 300.000.
y was born in Lincoln.
Lincoln fell from a tree.
Lincoln was standing on the shelf. It
was covered in leather.
Shallow processing
Motivation
The shallow approaches can do only this much.
Claim #1: we need to deepen the processing.
Claim #2: context is a limitless inspiration
- verbal
- social
- domain
- spatial
- temporal
- discourse
- (you-name-it)
Shall we go a step further?
How to go about it
Combine many pieces (algorithms) in a puzzle (solution)
Use as extensive and global knowledge as possible:
Semantic Web
Natural Language
Processing
Lexical resources
Approach
Optimize the semantic coherence of the disambiguated
entities, while still excluding the verbally incorrect
options and skewing towards the domain and the
popularity of the entities.
Components
- Verb-based knowledge from NLP, VerbNet, FrameNet
and a domain ontology
- Domain skew (based on corpus analysis)
- Popularity of the candidates (from DBpedia)
- Semantic connectivity and similarity (based on DBpedia
information)
No module or knowledge source is perfect,
but >1 of both will be helpful !
System design
The background knowledge
Data
Annotated WikiNews articles
3 subcorpora:
- Airbus Boeing (30)
- General Motors (30)
- Stock Market (30)
Results
FrameNet+Domain ontology filter
Airbus GM Stock market
# links filtered 3 21 22
# incorrect
links filtered
3 13 19
# correct links
filtered
0 0 3
# not in GS
filtered
0 8 0
“Trading on Russia’s stock markets ...”
predicate: markets, Commerce_sell@Seller: Russia
Combinations
Conclusions
Context is useful
Semantic Web can help to
model background knowledge
We are still finding new
puzzle pieces
Thank You !
Appendices
Future
Get rid of the boring pipeline approach.
Use full-blown optimization system!
Resources
Grammatical structure
and meaning of words
Background knowledgeStructured linguistic
information
Semantic Web
Natural Language
Processing
Lexical resources
Example
“The United States transferred six detainees
from the Guantánamo Bay prison to Uruguay
this weekend, the Defense Department
announced early Sunday.”
State-of-the-art:
United States Guantanamo Bay Uruguay Defence Department
Geographical region GB detention camp Geographical region US Dept. of Defence
Fed. Government Place Football team Ministry of Defence of
Rep. of Korea
Men’s soccer team The naval base River
Women’s soccer team Battle of GB Rugby union team
Rugby union team U20 football team
Men’s ice hockey team U17 football team
Men’s basketball team
Secondary education in
US
VN: send-11.1
transferred
A0 is Animate or Organization
A0:United States
United States is Animate or
Organization
∏
A1: from
Guantanamo Bay
A2: to Uruguay
A1 is Location
A2 is Location
Guantanamo Bay is a location Uruguay is a location
VN: say-37.7
announced
A0 is Animate or OrganizationA0:the Defence
Department
∏
The Defence Department is an Animate or an Organization
After VerbNet
United States Guantanamo Bay Uruguay Defence Department
Geographical region GB detention camp Geographical region US Dept. of Defence
Fed. Government Place Football team Ministry of Defence of
Rep. of Korea
Men’s soccer team The naval base River
Women’s soccer team Battle of GB Rugby union team
Rugby union team U20 football team
Men’s ice hockey team U17 football team
Men’s basketball team
Secondary education in
US
Results
VN: send-11.1
transferred
A0 is Animate or Organization
A0:United States
United States is Animate or
Organization
∏
A1: from
Guantanamo Bay
A2: to Uruguay
A1 is Location
A2 is Location
Guantanamo Bay is a location Uruguay is a location
VN: say-37.7
announced
A0 is Animate or OrganizationA0:the Defence
Department
∏
The Defence Department is an Animate or an Organization
After VerbNet
United States Guantanamo Bay Uruguay Defence Department
Geographical region GB detention camp Geographical region US Dept. of Defence
Fed. Government Place Football team Ministry of Defence of
Rep. of Korea
Men’s soccer team The naval base River
Women’s soccer team Battle of GB Rugby union team
Rugby union team U20 football team
Men’s ice hockey team U17 football team
Men’s basketball team
Secondary education in
US

CLiN 25: NED with two-stage coherence optimization

  • 1.
    NED with two-stage coherenceoptimization Filip Ilievski, Marieke van Erp, Piek Vossen, Wouter Beek & Stefan Schlobach or How I am teaching my bottle of Jack Daniel’s not to turn into a 168-years-old person with a net income of $120.000.000
  • 2.
    Context ... is beingpersistently avoided when processing language by machines. No wonder. The context is hard to quantify. but the context lies in the basis of the human communication!
  • 3.
    The burden ofcontext in language ● The language is context-dependent ● Verbal context ○ Ford fell from a tree. ■ What is “Ford” ? ● Social context ○ What is “2+2” ? ■ In mathematics it is 4 ■ In the car domain it is a car configuration: 2 front + 2 back seats ■ In psychology it is a family with 2 parents and 2 children
  • 4.
    Lincoln increased theannual vehicle sales to 300.000. y was born in Lincoln. Lincoln fell from a tree. Lincoln was standing on the shelf. It was covered in leather. Shallow processing
  • 5.
    Motivation The shallow approachescan do only this much. Claim #1: we need to deepen the processing. Claim #2: context is a limitless inspiration - verbal - social - domain - spatial - temporal - discourse - (you-name-it)
  • 6.
    Shall we goa step further?
  • 7.
    How to goabout it Combine many pieces (algorithms) in a puzzle (solution) Use as extensive and global knowledge as possible: Semantic Web Natural Language Processing Lexical resources
  • 8.
    Approach Optimize the semanticcoherence of the disambiguated entities, while still excluding the verbally incorrect options and skewing towards the domain and the popularity of the entities.
  • 9.
    Components - Verb-based knowledgefrom NLP, VerbNet, FrameNet and a domain ontology - Domain skew (based on corpus analysis) - Popularity of the candidates (from DBpedia) - Semantic connectivity and similarity (based on DBpedia information) No module or knowledge source is perfect, but >1 of both will be helpful !
  • 10.
  • 11.
  • 12.
    Data Annotated WikiNews articles 3subcorpora: - Airbus Boeing (30) - General Motors (30) - Stock Market (30)
  • 13.
  • 14.
    FrameNet+Domain ontology filter AirbusGM Stock market # links filtered 3 21 22 # incorrect links filtered 3 13 19 # correct links filtered 0 0 3 # not in GS filtered 0 8 0 “Trading on Russia’s stock markets ...” predicate: markets, Commerce_sell@Seller: Russia
  • 15.
  • 16.
    Conclusions Context is useful SemanticWeb can help to model background knowledge We are still finding new puzzle pieces
  • 18.
  • 19.
  • 20.
    Future Get rid ofthe boring pipeline approach. Use full-blown optimization system!
  • 21.
    Resources Grammatical structure and meaningof words Background knowledgeStructured linguistic information Semantic Web Natural Language Processing Lexical resources
  • 22.
    Example “The United Statestransferred six detainees from the Guantánamo Bay prison to Uruguay this weekend, the Defense Department announced early Sunday.”
  • 23.
    State-of-the-art: United States GuantanamoBay Uruguay Defence Department Geographical region GB detention camp Geographical region US Dept. of Defence Fed. Government Place Football team Ministry of Defence of Rep. of Korea Men’s soccer team The naval base River Women’s soccer team Battle of GB Rugby union team Rugby union team U20 football team Men’s ice hockey team U17 football team Men’s basketball team Secondary education in US
  • 25.
    VN: send-11.1 transferred A0 isAnimate or Organization A0:United States United States is Animate or Organization ∏ A1: from Guantanamo Bay A2: to Uruguay A1 is Location A2 is Location Guantanamo Bay is a location Uruguay is a location
  • 26.
    VN: say-37.7 announced A0 isAnimate or OrganizationA0:the Defence Department ∏ The Defence Department is an Animate or an Organization
  • 27.
    After VerbNet United StatesGuantanamo Bay Uruguay Defence Department Geographical region GB detention camp Geographical region US Dept. of Defence Fed. Government Place Football team Ministry of Defence of Rep. of Korea Men’s soccer team The naval base River Women’s soccer team Battle of GB Rugby union team Rugby union team U20 football team Men’s ice hockey team U17 football team Men’s basketball team Secondary education in US
  • 29.
  • 31.
    VN: send-11.1 transferred A0 isAnimate or Organization A0:United States United States is Animate or Organization ∏ A1: from Guantanamo Bay A2: to Uruguay A1 is Location A2 is Location Guantanamo Bay is a location Uruguay is a location
  • 32.
    VN: say-37.7 announced A0 isAnimate or OrganizationA0:the Defence Department ∏ The Defence Department is an Animate or an Organization
  • 33.
    After VerbNet United StatesGuantanamo Bay Uruguay Defence Department Geographical region GB detention camp Geographical region US Dept. of Defence Fed. Government Place Football team Ministry of Defence of Rep. of Korea Men’s soccer team The naval base River Women’s soccer team Battle of GB Rugby union team Rugby union team U20 football team Men’s ice hockey team U17 football team Men’s basketball team Secondary education in US