Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
TIMENAn Open Temporal Expression   Normalisation ResourceH.Llorens, L.Derczynski, R.Gaizauskas, E. Saquete
Outline●   Introduction: Timex normalisation●   Related work●   Problem: reinventing the wheel once and again●   Proposal:...
Timex NormalisationTemporal information extraction subtask.Timex: linguistic expression of a time point or interval.Normal...
Timex Normalisation (II)Useful for a variety of NLP applications: IR, QA,Summarization, etc.           I went to the cinem...
Related WorkThere are many approaches to timex normalisation● Pre TempEval-2  ○ TempEx (2000), GUTime (2005), Chronos (200...
Similarities and differences● Approaches have slightly different architectures and   show slightly different performances ...
The problemReinventing the wheel once and again● Implementation of high-performance approaches is  costly and it is done a...
Proposal: TIMENCharacteristics: ● Open philosophy: meant to be reused and refined (even   across languages) ●   Not only m...
TIMEN Library ArchitectureExample:timex: three days agoDCT:2012-05-24normtext: 3_day_agopattern: Num_TUnit_agoonly 1 rule ...
Rule base sample (English)
TIMEN integration
TIMEN community● Open-source software:    http://code.google.com/p/timen/● Crowd extension of the rule set (interactive  w...
EvaluationExperiments:● Normalization accuracy of TIMEN● Performance gain in s-o-a approaches by  integrating TIMENDataset...
Normalisation accuracy        gold timexes                   normalisation        yesterday                      2012-05-2...
Normalisation accuracy         TEST SET          NORMALISAION ACC         TempEval-2               0.90         TimenEval ...
Performance gain                    built-in                                   Original                normalisationApproa...
Performance gain(TempEval-2) "known data"   System       built-in norm.   TIMEN norm.   Err. Redution   TIPSemB           ...
Performance gain(TimenEval) "new data"   System       built-in norm.   TIMEN norm.   Err. Redution   TIPSemB           0.5...
Conclusions● We presented an open tool for timex normalisation:  TIMEN.● ADVANTAGES:  ○ High performance (above recent app...
Further Work● Community-based extension and refinement  of TIMEN (rulebase).● Extensive evaluation of TIMEN in various  la...
TIMEN: An Open TIMEX Normalisation Resource              THANK YOU!                   QUESTIONS?                   http://...
Upcoming SlideShare
Loading in …5
×

TIMEN: An Open Temporal Expression Normalisation Resource

1,600 views

Published on

We present TIMEN, a resource for building and sharing knowledge and rules for TimeML temporal expression normalization subtask - that is, the generation of a TIMEX3 annotation from a linguistic temporal expression. This sets a strong basis built from current best approaches which is independent from the rest of temporal expression processing subtasks. Therefore, it can be easily integrated as a module in temporal information processing systems.

Since it is open it can be used, improved and extended by the community, in contrast to closed tools, which must be replicated from scratch as the field advances. Furthermore, TIMEN eases the development of normalization knowledge and rules for low-resourced languages since the normalization process is partially shared between languages.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

TIMEN: An Open Temporal Expression Normalisation Resource

  1. 1. TIMENAn Open Temporal Expression Normalisation ResourceH.Llorens, L.Derczynski, R.Gaizauskas, E. Saquete
  2. 2. Outline● Introduction: Timex normalisation● Related work● Problem: reinventing the wheel once and again● Proposal: TIMEN● Evaluation● Conclusions● Further Work
  3. 3. Timex NormalisationTemporal information extraction subtask.Timex: linguistic expression of a time point or interval.Normalisation: semantic interpretation of timexes.Temporal Expression (TIMEX) Timex normalizationLinguistics/Variability/Relativity ISO 8601/Invariable interpretationJune 2012, next month, 06/2012 2012-06this morning 7 a.m. 2012-05-24T07:003 days and 3 hours PT3D3Hweekly XXXX-XX-WXX
  4. 4. Timex Normalisation (II)Useful for a variety of NLP applications: IR, QA,Summarization, etc. I went to the cinema yesterday. event timex Value: 2012-05-23 When did he go to the cinema? 2012-05-23The main advantage of normalisation is having timexes instandard time representations (e.g., gregorian calendar).
  5. 5. Related WorkThere are many approaches to timex normalisation● Pre TempEval-2 ○ TempEx (2000), GUTime (2005), Chronos (2004), TERSEO (2005), TimexTag (2005), TEA (2006), DANTE (2007)...● TempEval-2 (2010) ○ HeidelTime, TRIPS/TRIOS, TIPSem/TIPSemB...
  6. 6. Similarities and differences● Approaches have slightly different architectures and show slightly different performances on tests.● But all the approaches are rule-based and in general they use the same normalization strategies.● & also require the same parameters to perform the task. ○ DCT: document creation time (deictic) (2 days ago: 2012-05-22) ○ Reference time: time talked about (anaphoric) (2 days before: 2012-05-20) ○ Tense: Resolution direction (October) Past (2011-10), Present/Future (2012-10)
  7. 7. The problemReinventing the wheel once and again● Implementation of high-performance approaches is costly and it is done all the times from the scratch.● all the approaches are similar: rule-based with similar normalization rules and strategies.● none is meant to be reused and refined by others.
  8. 8. Proposal: TIMENCharacteristics: ● Open philosophy: meant to be reused and refined (even across languages) ● Not only meant for computer scientists: ○ the algorithms (source code) and normalisation rules (db of user- friendly rules with a documented syntax) are separated. ● Independent from other timex processing tasks ● Multi-platform and easy integration
  9. 9. TIMEN Library ArchitectureExample:timex: three days agoDCT:2012-05-24normtext: 3_day_agopattern: Num_TUnit_agoonly 1 rule matches.normalized value: 2012-05-21Example2:timex: October 202 rules matchingdisambiguation20 probably a dayrather than a yearbecause <32
  10. 10. Rule base sample (English)
  11. 11. TIMEN integration
  12. 12. TIMEN community● Open-source software: http://code.google.com/p/timen/● Crowd extension of the rule set (interactive web interface to upload and check new rules): http//timen.org* new rules only accepted if they improve the performance on the currentdataset or new examples (human reviewed). Eg: New Years Eve
  13. 13. EvaluationExperiments:● Normalization accuracy of TIMEN● Performance gain in s-o-a approaches by integrating TIMENDatasets:● TempEval-2 test-set (already known for approaches, mainly common dates and duration)● TimenEval dataset (new, unknown for appr., balanced among different timex types)
  14. 14. Normalisation accuracy gold timexes normalisation yesterday 2012-05-23 2012 correct 2012 correct October 2012-10 daily incorrect TIMEN xxxx-xx-xx correct morning 2011 incorrect ... ... ...e.g. TOTAL: 100 timexes to normalise e.g. TOTAL: 90 correct normalizations RESULT: 90/100 --> 90% ACCURACY
  15. 15. Normalisation accuracy TEST SET NORMALISAION ACC TempEval-2 0.90 TimenEval 0.68● TIMEN shows a high performance even in this first version (only 76 rules).● TimenEval accuracy is lower. This corpus is more heterogeneous (times/sets) and normalization is more difficult.
  16. 16. Performance gain built-in Original normalisationApproach X normalisation of Approach Xrecognizedtimexes New TIMEN normalisationPerformance gain = New accuracy - Original accuracy
  17. 17. Performance gain(TempEval-2) "known data" System built-in norm. TIMEN norm. Err. Redution TIPSemB 0.83 0.89 35% HeidelTime 0.94 0.94 0% TERNIP 0.76 0.92 66%● Replacing built-in normalization approaches of the systems by TIMEN generally improves their performance in TE2 testset.● Tested (current) versions of the systems may have been developed/updated being aware of this data. What does it happen with data which is new for them?
  18. 18. Performance gain(TimenEval) "new data" System built-in norm. TIMEN norm. Err. Redution TIPSemB 0.57 0.67 23% HeidelTime 0.72 0.74 7% TERNIP 0.70 0.72 66%● Using new data, the built-in approaches performance decreases in general.● TIMEN favours the normalization performance for all the systems.
  19. 19. Conclusions● We presented an open tool for timex normalisation: TIMEN.● ADVANTAGES: ○ High performance (above recent approaches). ○ Easily integrated in any timex recognition approach. ○ Can be improved by the community (open philosophy), and avoids re-development from scratch. ○ Available: http://timen.org and Google code
  20. 20. Further Work● Community-based extension and refinement of TIMEN (rulebase).● Extensive evaluation of TIMEN in various languages (Spanish, Chinese, Italian and Danish).
  21. 21. TIMEN: An Open TIMEX Normalisation Resource THANK YOU! QUESTIONS? http://timen.org H.Llorens, L.Derczynski, R.Gaizauskas, E. Saquete

×