Your SlideShare is downloading. ×
Multilingual Event Extraction and Semi-automatic acquisition of related resources
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Multilingual Event Extraction and Semi-automatic acquisition of related resources

587
views

Published on

How to create a multilingual event extraction system

How to create a multilingual event extraction system


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
587
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Multilingual Event Extraction and Semi-automatic A cquisition of R elated R esources Hristo Tanev Joint Research Centre Ispra, Italy
  • 2. NEXUS N ews E vent e X traction U sing language S tructures
  • 3. Event Extraction
    • Event extraction was introduced as a language processing task at MUC-2 in 1989
    • Event is something that happens, event description is a template which describes an event
    • The goal of automatic event extraction is automatic filling of an event description template from a text or a set of texts
    • Event description usually includes:
      • Event type
      • Time and place of the event
      • Participating entities which have specific roles and which depend on the event type, e.g. perpetrator, victim , instrument etc.
      • Cause
  • 4. Event Extraction in the Context of EMM
    • The purpose of the automatic event extraction from online news is to facilitate the crisis-management efforts of the European Commission and other related political institutions
    • NEXUS detects security-related events and disasters
    • NEXUS monitors in nearly real time online news in English, French, Spanish, Italian, Russian, Portuguese, and Arabic (after automatic translation into English)
    • Medical NEXUS detects news about disease outbreaks in English (soon to be deployed in French)
  • 5. EMM Event Extraction from Online News
    • News cluster: Car bomb kills 50 in Iraq HindustanTimes Wednesday, June 18, 2008 5:07:00 AM CEST A car bomb blast in northern Baghdad left more than 50 people dead and 80 wounded on Tuesday, a police source said…
    • Biggest blast in months leaves at least 50 dead in Iraq reliefWeb Wednesday, June 18, 2008 5:05:00 AM CEST A car bomb blast in northern Baghdad, the largest in months, left more than 50 people dead and 80 wounded on Tuesday, a police source said. ..  
  • 6. EMM Event Extraction from Online News
    • Event Description
    • Date: 18 June 2008
    • Place: Baghdad, Iraq
    • Event type: terrorist attack
    • Number killed: 50
    • Number wounded: 80
    • Number kidnapped: 0
    • Perpetrators: not reported
    • Weapons: car bomb
  • 7. EMM Event Extraction Architecture NEXUS News Entity Match Geo-Tagging Clustering Text Processing NER, Parsing, Pattern Matching Information Aggregation Visualization Events
  • 8. Partial Parsing
    • Example for a multilingual rule, which recognizes NP like: "a French volunteer and an Italian military "
    • coordination_rule :> ( person_group & [NAME :#name1 , AMOUNT:"1" #amount1 ]
    • (token & [SURFACE: ","]?
    • person_group & [NAME :#name2 , AMOUNT:"1" #amount2 ])?
    • (token & [SURFACE: ","]?
    • person_group & [NAME :#name3 , AMOUNT:"1" #amount3 ])?
    • conjunction
    • person_group & [NAME :#name4 , AMOUNT:"1" #amount4 ]):c
    • c: person_group & [NAME :#final , AMOUNT :#amount , NUMBER:"p“]
    • & #final := ConcForSum ( #name1 , #name2 , #name3 , #name4 )
    • & #amount := ConcForSum ( #amount1 , #amount2 , #amount3 , #amount4 ).
  • 9. Annotating Participa ting Entities
    • This is one of the most important tasks – to label the person groups and other phrases with event specific semantic roles, e.g. Perpetrator, Dead victim, Displaced people, Weapons used, etc.
    • Linear patterns – work well for English
    • We use linear patterns also for Russian
    • More elaborated event extraction grammars for Arabic, Italian, French, Spanish and Portuguese
  • 10. Event-specific Grammars
    • Rule: <person-group> [introduce-passive] Verb[baseform: rimanere ]? Adv? Verb[sem: injured-obj, passive-voice]  <person-group> : injured
    • Cinque persone sono state ferite
    • Cinque persone sono state gravemente ferite
    • Cinque persone sono rimaste ferite
    • For details see [Zavarella et.al. Event Extraction for Italian, Using a Cascade of Finite State Grammars, FSMNLP 2008]
  • 11.  
  • 12. Multilingual Lexical Acquisition
  • 13. Multilingual Lexical Acquisition
    • Automatic learning of language-specific lexical resources
    • Statistical approaches, weakly supervised, make use of large quantities of unannotated news
    • Learning of patterns, keywords and keyphrases, which can be manually validated, rather than statistical models like SVM
    • Pattern learning
    • Learning domain-specific lexica
    • Learning semantic classes
  • 14. Linear Pattern Learning
    • For English we use the linear patterns, as the algorithm learns them
    • We learned more 3000 linear patterns for English
    • For Italian and other languages, linear patterns are staring point for grammar development
  • 15. Learning Semantic Classes
    • Sometimes, it is necessary to learn specific semantic classes, e.g. vehicles , disasters, weapons, facilities
    • We built a stastical system for automatic acquisition of semantic classes
    • The system is language-independent, only a list of language-specific stop words is used
  • 16. Ontopopulis
    • INPUT:
      • feelings: hatred, love, fear, sadness
      • contrasting classes: taste, (style, outlook), character, thoughts
  • 17. Extracting New Terms
    • Newly learnt terms are ordered and next given to the user for evaluation
    • Top 20 terms from the category feelings grief sorrow sadness condolences fear disappointment regret sympathy shock hatred gratitude frustration anger deep sorrow profound dismay condolence satisfaction profound grief deep grief
  • 18. Using Learnt Semantic Classes for Event Extraction
    • We use Ontopopulis to learn terms, which we next put into our domain-specific dictionaries
    • Some rules which require a domain specific dictionary:
      • Rules for parsing person reference noun phrases, such as two engineers
      • Rules which detect weapons used: killed with a [ WEAPON ] ( killed with a gun )
      • Detection of vehicles used: [PEOPLE] in a [ VEHICLE ] were stopped ( three men in a boat were stopped )
  • 19. NEXUS Evaluation for English 61% Geo-tagging (place name) 90% Geo-tagging (country) 80% Event classification 57% Injured counting 70% Dead counting Accuracy Detection Task
  • 20. NEXUS Multilingual Evaluation 0.47 0.67 0.51 0.69 Portuguese 0.67 - 0.62 0.87 Italian Arrested Kidnapped Wounded Dead F1 measure
  • 21. Evaluation of Ontopopulis - - - - - - 60 95 Spanish 75 85 20 70 85 75 60 90 Portuguese Building Crime Edged weapon Watercraft Vehicle Politician Weapon Person Accuracy (%) top 20
  • 22.