Entity Typing and Event Extraction
Marieke van Erp

http://mariekevanerp.com
VU WP3 team
Isa Maks Antske Fokkens Marieke van Erp Piek Vossen
Entity typing
What is entity typing?
• Entity typing is the task of classifying an entity mention
• An entity mention is a recognised name in a text that
refers to a real world person, location, organisation or
other interesting ‘thing’
What is the added value of entity typing?
• It allows you to query for fine-grained entity types: give
me all electricians in the dataset, give me all historic
buildings
• Entity typing often includes linking an entity to
background knowledge
• The background knowledge provides additional filters:
give me all politicians born after 1900 in the dataset
• Caveat: the background knowledge is not complete
New synonym/concept lists are easy to plug in
New synonym/concept lists are easy to plug in
Brouwers:
concept-100350 (ponstypiste)
isRelatedTo class-Schrijfkunst
concept-100343 (tachygraaf)
isRelatedTo class-Schrijfkunst
concept-100313 (schrijver)
isRelatedTo class-Schrijfkunst
Brouwers:
concept-100350 (ponstypiste)
isRelatedTo class-Schrijfkunst
concept-100343 (tachygraaf)
isRelatedTo class-Schrijfkunst
concept-100313 (schrijver)
isRelatedTo class-Schrijfkunst
Brouwers:
concept-100350 (ponstypiste)
isRelatedTo class-Schrijfkunst
concept-100343 (tachygraaf)
isRelatedTo class-Schrijfkunst
concept-100313 (schrijver)
isRelatedTo class-Schrijfkunst
Named Entity Recognition & Linking
• We are creating links between HISCO and Brouwers
• We are building on entity and concept linkers that can recognise
concepts from HISCO and Brouwers in texts
• We are developing a new general purpose entity linker that allows for
use of datasets other than DBpedia and is less sensitive to general
entity popularity
• Discovering more about Dark and NIL entities is also ongoing work
(cf. Van Erp & Vossen (2016) Entity Typing using Distributional
Semantics and DBpedia. To appear in: Proceedings of the 4th
NLP&DBpedia workshop. Kobe, Japan 18 October 2016)
Event Extraction
Event Extraction
• Event Extraction is the task of recognising and classifying
mentions of ‘things that happen’ in text
• Events are multifaceted: they take place at a certain time
and place and have participants involved
• By recognising participants, times and places, we can
generate event descriptions and compare events
From words to concepts
• Linking terms to synonyms to obtain a higher level of
abstraction
• Word-sense disambiguation + WordNet + Multilingual
Central Repository + Framenet + PropBank
• Stop, quit, leave, relinquish, bow out -> all linked to the
concept wn:leave_office
Why link to WordNet/ConceptNet/etc?
• It allows you to query for types rather than instances: give
me all lawsuits in the dataset
• In the context of CLARIAH, we are converting various
diachronous lexicons to Linked Data
• integrate resources
• tag interesting concepts in text
• query expansion
Semantic Role Labelling
• Detecting the agent, patient, recipient and theme of a
sentence
• Mary sold the book to John
• Agent: Mary
• Recipient: John
• Theme: the book
Event12
buy/sell fn:Seller
fn:Commerce_money_transfer
fn:Goods fn:Money
fn:Buyer
dbp:Porsche_fa
mily
dbp:QatarHolding
?Entity23
10% stake
type
Qatar Holding sells 10% stake in Porsche to
founding families
Porsche family buys back 10pc stake
from Qatar
http://english.alarabiya.net http://www.telegraph.co.uk
2013-06-17
sem:hasTime
2013-06-17
Event abstractions
• Enable searches such as: Give me all lawsuits in which a
politician was involved between 1990 and 2000.
• Current developments: expand resources to the historic
domain, devise new crystallisation strategies for
aggregating event information
Find out more
• All modules and evaluations are described in: http://
kyoto.let.vu.nl/newsreader_deliverables/NWR-D4-2-3.pdf
(158 pages!)
• Selection to be adapted within CLARIAH: https://
github.com/CLARIAH/wp3-semantic-parsing-Dutch
• New developments: http://www.clariah.nl & https://
github.com/clariah
Discussion
• It’s research software (no fancy interface)
• Currently not adapted to deal with old spelling variants/OCR/
etc
• NLP isn’t perfect (but humans don’t always agree either!)
• What would it take for you to start using such tools?
• What types of analyses are most interesting to the community?
• What use cases are most useful to the community at this point
in time?

Entity Typing and Event Extraction

  • 1.
    Entity Typing andEvent Extraction Marieke van Erp http://mariekevanerp.com
  • 2.
    VU WP3 team IsaMaks Antske Fokkens Marieke van Erp Piek Vossen
  • 3.
  • 4.
    What is entitytyping? • Entity typing is the task of classifying an entity mention • An entity mention is a recognised name in a text that refers to a real world person, location, organisation or other interesting ‘thing’
  • 5.
    What is theadded value of entity typing? • It allows you to query for fine-grained entity types: give me all electricians in the dataset, give me all historic buildings • Entity typing often includes linking an entity to background knowledge • The background knowledge provides additional filters: give me all politicians born after 1900 in the dataset • Caveat: the background knowledge is not complete
  • 9.
    New synonym/concept listsare easy to plug in
  • 10.
    New synonym/concept listsare easy to plug in
  • 12.
    Brouwers: concept-100350 (ponstypiste) isRelatedTo class-Schrijfkunst concept-100343(tachygraaf) isRelatedTo class-Schrijfkunst concept-100313 (schrijver) isRelatedTo class-Schrijfkunst
  • 13.
    Brouwers: concept-100350 (ponstypiste) isRelatedTo class-Schrijfkunst concept-100343(tachygraaf) isRelatedTo class-Schrijfkunst concept-100313 (schrijver) isRelatedTo class-Schrijfkunst
  • 14.
    Brouwers: concept-100350 (ponstypiste) isRelatedTo class-Schrijfkunst concept-100343(tachygraaf) isRelatedTo class-Schrijfkunst concept-100313 (schrijver) isRelatedTo class-Schrijfkunst
  • 15.
    Named Entity Recognition& Linking • We are creating links between HISCO and Brouwers • We are building on entity and concept linkers that can recognise concepts from HISCO and Brouwers in texts • We are developing a new general purpose entity linker that allows for use of datasets other than DBpedia and is less sensitive to general entity popularity • Discovering more about Dark and NIL entities is also ongoing work (cf. Van Erp & Vossen (2016) Entity Typing using Distributional Semantics and DBpedia. To appear in: Proceedings of the 4th NLP&DBpedia workshop. Kobe, Japan 18 October 2016)
  • 16.
  • 17.
    Event Extraction • EventExtraction is the task of recognising and classifying mentions of ‘things that happen’ in text • Events are multifaceted: they take place at a certain time and place and have participants involved • By recognising participants, times and places, we can generate event descriptions and compare events
  • 18.
    From words toconcepts • Linking terms to synonyms to obtain a higher level of abstraction • Word-sense disambiguation + WordNet + Multilingual Central Repository + Framenet + PropBank • Stop, quit, leave, relinquish, bow out -> all linked to the concept wn:leave_office
  • 19.
    Why link toWordNet/ConceptNet/etc? • It allows you to query for types rather than instances: give me all lawsuits in the dataset • In the context of CLARIAH, we are converting various diachronous lexicons to Linked Data • integrate resources • tag interesting concepts in text • query expansion
  • 20.
    Semantic Role Labelling •Detecting the agent, patient, recipient and theme of a sentence • Mary sold the book to John • Agent: Mary • Recipient: John • Theme: the book
  • 21.
    Event12 buy/sell fn:Seller fn:Commerce_money_transfer fn:Goods fn:Money fn:Buyer dbp:Porsche_fa mily dbp:QatarHolding ?Entity23 10%stake type Qatar Holding sells 10% stake in Porsche to founding families Porsche family buys back 10pc stake from Qatar http://english.alarabiya.net http://www.telegraph.co.uk 2013-06-17 sem:hasTime 2013-06-17
  • 23.
    Event abstractions • Enablesearches such as: Give me all lawsuits in which a politician was involved between 1990 and 2000. • Current developments: expand resources to the historic domain, devise new crystallisation strategies for aggregating event information
  • 24.
    Find out more •All modules and evaluations are described in: http:// kyoto.let.vu.nl/newsreader_deliverables/NWR-D4-2-3.pdf (158 pages!) • Selection to be adapted within CLARIAH: https:// github.com/CLARIAH/wp3-semantic-parsing-Dutch • New developments: http://www.clariah.nl & https:// github.com/clariah
  • 25.
    Discussion • It’s researchsoftware (no fancy interface) • Currently not adapted to deal with old spelling variants/OCR/ etc • NLP isn’t perfect (but humans don’t always agree either!) • What would it take for you to start using such tools? • What types of analyses are most interesting to the community? • What use cases are most useful to the community at this point in time?