TETI: a TimeML Compliant TimEx Tagger for Italian

611 views
513 views

Published on

Presentation and demo held at the Computational Linguistic Application Workshop @ IMCSIT, Mragowo, Poland

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
611
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

TETI: a TimeML Compliant TimEx Tagger for Italian

  1. 1. TETI: a TimeML Compliant TimEx Tagger for Italian Tommaso Caselli, Felice dell'Orletta and Irina Prodanof Istituto di Linguistica Computazionale “A. Zampolli” - ILC-CNR Pisa {firstName.secondName@ilc.cnr.it} IMCSIT 2009 – CL-A09, Mragawo, October, 13
  2. 2. Outline:  Motivations  Extracting Temporal expression and the TIMEX3 tag  TETI: − System architecture − Demo  Evaluation  Conclusions & Future Work
  3. 3. Motivations  Recovering temporal relations in text/discourse is essential to improve the performance of many NLP systems (O.D-Q.A., Text Mining, Summarization, Reasoning)  Most temporal information in text/discourse is only IMPLICITLY stated  Need to develop procedures to maximize the role of the various sources of information  Temporal expressions represent a source of explicit temporal knowledge which can: − Locate an eventuality in time, and thus used for inferencing for temporal relations between eventualities − Measure the duration of an eventuality
  4. 4. Extracting Temporal Expressions  The extraction of timexes can be divide into 4 subtasks: − Recognizing and bracketing the timex − Feature extraction (type of time unit, referential status, presence of modifiers) − Computing the interval of reference on the time line − Resolving the timex, i.e. normalize the value to a standard output format
  5. 5. Extracting Temporal Expressions  The extraction of timexes can be divide into 4 subtasks: − Recognizing and bracketing the timex − Feature extraction (type of time unit, referential status, presence of modifiers) − Computing the interval of reference on the time line − Resolving the timex, i.e. normalize the value to a standard output format
  6. 6. Temporal Expressions in TimeML: The TIMEX3 tag  TIMEX3 tag extends and improves previous tags for this task, namely TIMEX, TIDES TIMEX2  TIMEX3 tag is used to mark any time word i.e. both absolute and relative timexes such as day time (midnight..), dates of different granularity (yesterday, last spring..), calendar dates (01/12/1980..), durations (three hours, two years..), set of time (yearly, every day..)  The annotation process is based on: − the constituent structure (NP, AdjP, AdvP, Time/Date Pattern) − the granularity of the time units − the relations between the timexes
  7. 7. TETI: Temporal Expression Tagger for Italian  Rule-based system  Main components: Chunked text TIMEX DETECTOR & TIMEX TAGGER  Two external resources: TimEx Trigger Dictionary and a Modifier Dictionary
  8. 8. TETI: Temporal Expression Tagger for Italian (2) Chunked text
  9. 9. TETI: Temporal Expression Tagger for Italian (2)
  10. 10. TETI: Temporal Expression Tagger for Italian (2)  Chunker output approximate TIMEX3 tag extent  Extent of timexes corresponds to regolar patterns of combination of chunks
  11. 11. TETI: Temporal Expression Tagger for Italian (3)  Analysis of the chuncked text Chunked text  Lookout in the TimeEx Trigger dictionary  Extraction of the necessary features for the bracketing
  12. 12. TETI: Temporal Expression Tagger for Italian (3)
  13. 13. TETI: Temporal Expression Tagger for Italian (4)  Core element of the tagger Chunked text  A general condition + set of local conditions  If the conditions are true, the tagger activates the related rules and brackets the timex with TIMEX3 
  14. 14. TETI: Temporal Expression Tagger for Italian (4) COND (and (or (POTGOV_CHUNK equals N_C) (POTGOV_CHUNK equals ADV_C) (POTGOV_CHUNK equals ADJ_C)) (not (POTGOV_CHUNK has PREMODIF)) (not (POTGOV_lemma CHUNK-1 equals modiftrigger)) (or (not(POTGOV_lemma CHUNK+1 equals lextrigger)) (not (POTGOV_lemma CHUNK+1 equals modiftrigger))) ) then CREATE TIMEX3_tag (and(BEGIN_AT B_CHUNK) (END_AT E_CHUNK))
  15. 15. TETI: Temporal Expression Tagger for Italian (4)
  16. 16. TETI: Temporal Expression Tagger for Italian (4)
  17. 17. TETI: Temporal Expression Tagger for Italian (5)  More complex timexes require a Chunked text further lookup in the TimEx Trigger Dictionary to extract further features (sematic relations) for the correct bracketing
  18. 18. TETI: Temporal Expression Tagger for Italian (5)
  19. 19. Evaluation  42 newpaper articles manually annotated  367 timexes TAG TOT CORR. MISSING INCORR. P R F TIMEX3 367 321 35 66 82.95 90.17 86.41 TIMEX3: 90 55 12 23 82.09 70.51 75.86 modificatori
  20. 20. Conclusion & Future Work • Reduction of the number of false positives • Implemetation of the normalization phase → rule based • Re-wrting of the rules to be compliant with the KAF format (KYOTO Project) • Release of the tool via web service
  21. 21. Acknowlegments Thanks to Roberto Bartolini for his help in the development of the demo
  22. 22. Thank You!
  23. 23. Complex Rule 1 COND (and (not (POTGOV_lemma CHUNK-1 equals modiftrigger)) ((POTGOV_lemma CHUNK+1 equals lextrigger) then (GET GRAN GET DEFAULT TYPE)) (COND ((PREMODIF_POTGOV_CHUNK equals modiftrigger) then (GET INFO_NORMALIZATION GET TIMEML_MOD_ATTRIBUTE GET TIMEML_BEGINPOINT_ATTRIBUTE GET TIMEML_ENDPOINT_ATTRIBUTE GET TR_RESPECT_TO ANCHOR)) T) (or (POTGOV_CHUNK+1 equals N_C) (POTGOV_CHUNK+1 equals ADV_C) (POTGOV_lemma CHUNK+1 equals DATE PATTERN)) (not (POTGOV_CHUNK+1 has PREMODIF)) (POTGOV_CHUNK equals N_C)
  24. 24. Complex Rule 1b (COND 1((and (equals (SEM_RELATION POTGOV_CHUNK) (has_as_part (LEXTRIG_CIBLE POTGOV_CHUNK+1)) (equals (DEFAULT_TYPE POTGOV_CHUNK)DATE)) (or (equals (DEFAULT_TYPE POTGOV_CHUNK+1) DATE)) (equals (DEFAULT TYPE POTGOV_CHUNK+1) TIME))) then CREATE TIMEX3 (and (BEGIN_AT B_POTGOV_CHUNK) (END_AT E_POTGOV_CHUNK+1))) 2 (( and (CREATE TIMEX3 (and (BEGIN_AT B_POTGOV_CHUNK) (END_AT E_POTGOV_CHUNK)) (and (BEGIN_AT B_POTGOV_CHUNK+1) (END_AT E_POTGOV_CHUNK+1)) )))

×