We present TIMEN, a resource for building and sharing knowledge and rules for TimeML temporal expression normalization subtask - that is, the generation of a TIMEX3 annotation from a linguistic temporal expression. This sets a strong basis built from current best approaches which is independent from the rest of temporal expression processing subtasks. Therefore, it can be easily integrated as a module in temporal information processing systems.
Since it is open it can be used, improved and extended by the community, in contrast to closed tools, which must be replicated from scratch as the field advances. Furthermore, TIMEN eases the development of normalization knowledge and rules for low-resourced languages since the normalization process is partially shared between languages.
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
TIMEN: An Open Temporal Expression Normalisation Resource
1. TIMEN
An Open Temporal Expression
Normalisation Resource
H.Llorens, L.Derczynski, R.Gaizauskas, E. Saquete
2. Outline
● Introduction: Timex normalisation
● Related work
● Problem: reinventing the wheel once and again
● Proposal: TIMEN
● Evaluation
● Conclusions
● Further Work
3. Timex Normalisation
Temporal information extraction subtask.
Timex: linguistic expression of a time point or interval.
Normalisation: semantic interpretation of timexes.
Temporal Expression (TIMEX) Timex normalization
Linguistics/Variability/Relativity ISO 8601/Invariable interpretation
June 2012, next month, 06/2012 2012-06
this morning 7 a.m. 2012-05-24T07:00
3 days and 3 hours PT3D3H
weekly XXXX-XX-WXX
4. Timex Normalisation (II)
Useful for a variety of NLP applications: IR, QA,
Summarization, etc.
I went to the cinema yesterday.
event timex
Value: 2012-05-23
When did he go to the cinema? 2012-05-23
The main advantage of normalisation is having timexes in
standard time representations (e.g., gregorian calendar).
5. Related Work
There are many approaches to timex normalisation
● Pre TempEval-2
○ TempEx (2000), GUTime (2005), Chronos (2004),
TERSEO (2005), TimexTag (2005), TEA (2006),
DANTE (2007)...
● TempEval-2 (2010)
○ HeidelTime, TRIPS/TRIOS, TIPSem/TIPSemB...
6. Similarities and differences
● Approaches have slightly different architectures and
show slightly different performances on tests.
● But all the approaches are rule-based and in general
they use the same normalization strategies.
● & also require the same parameters to perform the task.
○ DCT: document creation time (deictic) (2 days ago: 2012-05-22)
○ Reference time: time talked about (anaphoric)
(2 days before: 2012-05-20)
○ Tense: Resolution direction (October)
Past (2011-10), Present/Future (2012-10)
7. The problem
Reinventing the wheel once and again
● Implementation of high-performance approaches is
costly and it is done all the times from the scratch.
● all the approaches are similar: rule-based with similar
normalization rules and strategies.
● none is meant to be reused and refined by others.
8. Proposal: TIMEN
Characteristics:
● Open philosophy: meant to be reused and refined (even
across languages)
● Not only meant for computer scientists:
○ the algorithms (source code) and normalisation rules (db of user-
friendly rules with a documented syntax) are separated.
● Independent from other timex processing tasks
● Multi-platform and easy integration
9. TIMEN Library Architecture
Example:
timex: three days ago
DCT:2012-05-24
normtext: 3_day_ago
pattern: Num_TUnit_ago
only 1 rule matches.
normalized value: 2012-05-21
Example2:
timex: October 20
2 rules matching
disambiguation
20 probably a day
rather than a year
because <32
12. TIMEN community
● Open-source software:
http://code.google.com/p/timen/
● Crowd extension of the rule set (interactive
web interface to upload and check new
rules): http//timen.org
* new rules only accepted if they improve the performance on the current
dataset or new examples (human reviewed). Eg: New Year's Eve
13. Evaluation
Experiments:
● Normalization accuracy of TIMEN
● Performance gain in s-o-a approaches by
integrating TIMEN
Datasets:
● TempEval-2 test-set
(already known for approaches, mainly common dates and duration)
● TimenEval dataset
(new, unknown for appr., balanced among different timex types)
15. Normalisation accuracy
TEST SET NORMALISAION ACC
TempEval-2 0.90
TimenEval 0.68
● TIMEN shows a high performance even in this first
version (only 76 rules).
● TimenEval accuracy is lower. This corpus is more
heterogeneous (times/sets) and normalization is more
difficult.
16. Performance gain
built-in
Original
normalisation
Approach X normalisation
of Approach X
recognized
timexes New
TIMEN
normalisation
Performance gain = New accuracy - Original accuracy
17. Performance gain
(TempEval-2) "known data"
System built-in norm. TIMEN norm. Err. Redution
TIPSemB 0.83 0.89 35%
HeidelTime 0.94 0.94 0%
TERNIP 0.76 0.92 66%
● Replacing built-in normalization approaches of the
systems by TIMEN generally improves their
performance in TE2 testset.
● Tested (current) versions of the systems may have
been developed/updated being aware of this data. What
does it happen with data which is new for them?
18. Performance gain
(TimenEval) "new data"
System built-in norm. TIMEN norm. Err. Redution
TIPSemB 0.57 0.67 23%
HeidelTime 0.72 0.74 7%
TERNIP 0.70 0.72 66%
● Using new data, the built-in approaches performance
decreases in general.
● TIMEN favours the normalization performance for all the
systems.
19. Conclusions
● We presented an open tool for timex normalisation:
TIMEN.
● ADVANTAGES:
○ High performance (above recent approaches).
○ Easily integrated in any timex recognition
approach.
○ Can be improved by the community (open philosophy),
and avoids re-development from scratch.
○ Available: http://timen.org and Google code
20. Further Work
● Community-based extension and refinement
of TIMEN (rulebase).
● Extensive evaluation of TIMEN in various
languages (Spanish, Chinese, Italian and Danish).
21. TIMEN: An Open TIMEX Normalisation Resource
THANK YOU!
QUESTIONS?
http://timen.org
H.Llorens, L.Derczynski, R.Gaizauskas, E. Saquete