Multilingual Event Extraction and  Semi-automatic  A cquisition of  R elated  R esources Hristo Tanev Joint Research Centr...
NEXUS  N ews  E vent e X traction  U sing language  S tructures
Event Extraction <ul><li>Event extraction was introduced as a language processing task at MUC-2 in 1989 </li></ul><ul><li>...
Event Extraction in the Context of EMM <ul><li>The purpose of the automatic event extraction from online news is to facili...
EMM Event Extraction from Online News <ul><li>News cluster: Car bomb kills 50 in Iraq HindustanTimes Wednesday, June 18, 2...
EMM Event Extraction  from Online News <ul><li>Event Description </li></ul><ul><li>Date:  18 June 2008 </li></ul><ul><li>P...
EMM Event Extraction Architecture NEXUS News Entity Match Geo-Tagging Clustering Text Processing NER, Parsing, Pattern Mat...
Partial Parsing <ul><li>Example for a multilingual rule, which recognizes NP like:  &quot;a French volunteer and an Italia...
Annotating Participa ting Entities <ul><li>This is one of the most important tasks – to label the person groups and other ...
Event-specific Grammars <ul><li>Rule:  <person-group> [introduce-passive]  Verb[baseform:  rimanere ]? Adv?  Verb[sem: inj...
 
Multilingual Lexical Acquisition
Multilingual Lexical Acquisition <ul><li>Automatic learning of language-specific lexical resources </li></ul><ul><li>Stati...
Linear Pattern Learning <ul><li>For English we use the linear patterns, as the algorithm learns them </li></ul><ul><li>We ...
Learning Semantic Classes <ul><li>Sometimes, it is necessary to learn specific semantic classes, e.g.  vehicles ,  disaste...
Ontopopulis <ul><li>INPUT:  </li></ul><ul><ul><li>feelings:  hatred, love, fear, sadness </li></ul></ul><ul><ul><li>contra...
Extracting New Terms <ul><li>Newly learnt terms are ordered and next given to the user for evaluation </li></ul><ul><li>To...
Using Learnt Semantic Classes for Event Extraction <ul><li>We use Ontopopulis to learn terms, which we next put into our d...
NEXUS Evaluation for English 61% Geo-tagging (place name) 90% Geo-tagging (country) 80% Event classification  57% Injured ...
NEXUS Multilingual Evaluation  0.47 0.67 0.51 0.69 Portuguese 0.67 - 0.62 0.87 Italian Arrested Kidnapped Wounded Dead F1 ...
Evaluation of Ontopopulis - - - - - - 60 95 Spanish 75 85 20 70 85 75 60 90 Portuguese Building Crime Edged weapon Watercr...
 
Upcoming SlideShare
Loading in …5
×

Multilingual Event Extraction and Semi-automatic acquisition of related resources

786 views

Published on

How to create a multilingual event extraction system

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
786
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Multilingual Event Extraction and Semi-automatic acquisition of related resources

  1. 1. Multilingual Event Extraction and Semi-automatic A cquisition of R elated R esources Hristo Tanev Joint Research Centre Ispra, Italy
  2. 2. NEXUS N ews E vent e X traction U sing language S tructures
  3. 3. Event Extraction <ul><li>Event extraction was introduced as a language processing task at MUC-2 in 1989 </li></ul><ul><li>Event is something that happens, event description is a template which describes an event </li></ul><ul><li>The goal of automatic event extraction is automatic filling of an event description template from a text or a set of texts </li></ul><ul><li>Event description usually includes: </li></ul><ul><ul><li>Event type </li></ul></ul><ul><ul><li>Time and place of the event </li></ul></ul><ul><ul><li>Participating entities which have specific roles and which depend on the event type, e.g. perpetrator, victim , instrument etc. </li></ul></ul><ul><ul><li>Cause </li></ul></ul>
  4. 4. Event Extraction in the Context of EMM <ul><li>The purpose of the automatic event extraction from online news is to facilitate the crisis-management efforts of the European Commission and other related political institutions </li></ul><ul><li>NEXUS detects security-related events and disasters </li></ul><ul><li>NEXUS monitors in nearly real time online news in English, French, Spanish, Italian, Russian, Portuguese, and Arabic (after automatic translation into English) </li></ul><ul><li>Medical NEXUS detects news about disease outbreaks in English (soon to be deployed in French) </li></ul>
  5. 5. EMM Event Extraction from Online News <ul><li>News cluster: Car bomb kills 50 in Iraq HindustanTimes Wednesday, June 18, 2008 5:07:00 AM CEST A car bomb blast in northern Baghdad left more than 50 people dead and 80 wounded on Tuesday, a police source said… </li></ul><ul><li>Biggest blast in months leaves at least 50 dead in Iraq reliefWeb Wednesday, June 18, 2008 5:05:00 AM CEST A car bomb blast in northern Baghdad, the largest in months, left more than 50 people dead and 80 wounded on Tuesday, a police source said. ..   </li></ul>
  6. 6. EMM Event Extraction from Online News <ul><li>Event Description </li></ul><ul><li>Date: 18 June 2008 </li></ul><ul><li>Place: Baghdad, Iraq </li></ul><ul><li>Event type: terrorist attack </li></ul><ul><li>Number killed: 50 </li></ul><ul><li>Number wounded: 80 </li></ul><ul><li>Number kidnapped: 0 </li></ul><ul><li>Perpetrators: not reported </li></ul><ul><li>Weapons: car bomb </li></ul>
  7. 7. EMM Event Extraction Architecture NEXUS News Entity Match Geo-Tagging Clustering Text Processing NER, Parsing, Pattern Matching Information Aggregation Visualization Events
  8. 8. Partial Parsing <ul><li>Example for a multilingual rule, which recognizes NP like: &quot;a French volunteer and an Italian military &quot; </li></ul><ul><li>coordination_rule :> ( person_group & [NAME :#name1 , AMOUNT:&quot;1&quot; #amount1 ] </li></ul><ul><li>(token & [SURFACE: &quot;,&quot;]? </li></ul><ul><li>person_group & [NAME :#name2 , AMOUNT:&quot;1&quot; #amount2 ])? </li></ul><ul><li>(token & [SURFACE: &quot;,&quot;]? </li></ul><ul><li>person_group & [NAME :#name3 , AMOUNT:&quot;1&quot; #amount3 ])? </li></ul><ul><li>conjunction </li></ul><ul><li>person_group & [NAME :#name4 , AMOUNT:&quot;1&quot; #amount4 ]):c </li></ul><ul><li>c: person_group & [NAME :#final , AMOUNT :#amount , NUMBER:&quot;p“] </li></ul><ul><li>& #final := ConcForSum ( #name1 , #name2 , #name3 , #name4 ) </li></ul><ul><li>& #amount := ConcForSum ( #amount1 , #amount2 , #amount3 , #amount4 ). </li></ul>
  9. 9. Annotating Participa ting Entities <ul><li>This is one of the most important tasks – to label the person groups and other phrases with event specific semantic roles, e.g. Perpetrator, Dead victim, Displaced people, Weapons used, etc. </li></ul><ul><li>Linear patterns – work well for English </li></ul><ul><li>We use linear patterns also for Russian </li></ul><ul><li>More elaborated event extraction grammars for Arabic, Italian, French, Spanish and Portuguese </li></ul>
  10. 10. Event-specific Grammars <ul><li>Rule: <person-group> [introduce-passive] Verb[baseform: rimanere ]? Adv? Verb[sem: injured-obj, passive-voice]  <person-group> : injured </li></ul><ul><li>Cinque persone sono state ferite </li></ul><ul><li>Cinque persone sono state gravemente ferite </li></ul><ul><li>Cinque persone sono rimaste ferite </li></ul><ul><li>For details see [Zavarella et.al. Event Extraction for Italian, Using a Cascade of Finite State Grammars, FSMNLP 2008] </li></ul>
  11. 12. Multilingual Lexical Acquisition
  12. 13. Multilingual Lexical Acquisition <ul><li>Automatic learning of language-specific lexical resources </li></ul><ul><li>Statistical approaches, weakly supervised, make use of large quantities of unannotated news </li></ul><ul><li>Learning of patterns, keywords and keyphrases, which can be manually validated, rather than statistical models like SVM </li></ul><ul><li>Pattern learning </li></ul><ul><li>Learning domain-specific lexica </li></ul><ul><li>Learning semantic classes </li></ul>
  13. 14. Linear Pattern Learning <ul><li>For English we use the linear patterns, as the algorithm learns them </li></ul><ul><li>We learned more 3000 linear patterns for English </li></ul><ul><li>For Italian and other languages, linear patterns are staring point for grammar development </li></ul>
  14. 15. Learning Semantic Classes <ul><li>Sometimes, it is necessary to learn specific semantic classes, e.g. vehicles , disasters, weapons, facilities </li></ul><ul><li>We built a stastical system for automatic acquisition of semantic classes </li></ul><ul><li>The system is language-independent, only a list of language-specific stop words is used </li></ul>
  15. 16. Ontopopulis <ul><li>INPUT: </li></ul><ul><ul><li>feelings: hatred, love, fear, sadness </li></ul></ul><ul><ul><li>contrasting classes: taste, (style, outlook), character, thoughts </li></ul></ul>
  16. 17. Extracting New Terms <ul><li>Newly learnt terms are ordered and next given to the user for evaluation </li></ul><ul><li>Top 20 terms from the category feelings grief sorrow sadness condolences fear disappointment regret sympathy shock hatred gratitude frustration anger deep sorrow profound dismay condolence satisfaction profound grief deep grief </li></ul>
  17. 18. Using Learnt Semantic Classes for Event Extraction <ul><li>We use Ontopopulis to learn terms, which we next put into our domain-specific dictionaries </li></ul><ul><li>Some rules which require a domain specific dictionary: </li></ul><ul><ul><li>Rules for parsing person reference noun phrases, such as two engineers </li></ul></ul><ul><ul><li>Rules which detect weapons used: killed with a [ WEAPON ] ( killed with a gun ) </li></ul></ul><ul><ul><li>Detection of vehicles used: [PEOPLE] in a [ VEHICLE ] were stopped ( three men in a boat were stopped ) </li></ul></ul>
  18. 19. NEXUS Evaluation for English 61% Geo-tagging (place name) 90% Geo-tagging (country) 80% Event classification 57% Injured counting 70% Dead counting Accuracy Detection Task
  19. 20. NEXUS Multilingual Evaluation 0.47 0.67 0.51 0.69 Portuguese 0.67 - 0.62 0.87 Italian Arrested Kidnapped Wounded Dead F1 measure
  20. 21. Evaluation of Ontopopulis - - - - - - 60 95 Spanish 75 85 20 70 85 75 60 90 Portuguese Building Crime Edged weapon Watercraft Vehicle Politician Weapon Person Accuracy (%) top 20

×