TETI: a TimeML Compliant TimEx Tagger for Italian

TETI: a TimeML Compliant TimEx
Tagger for Italian

Tommaso Caselli, Felice dell'Orletta and Irina Prodanof

Istituto di Linguistica Computazionale “A. Zampolli” - ILC-CNR Pisa
{firstName.secondName@ilc.cnr.it}

IMCSIT 2009 – CL-A09, Mragawo, October, 13

Outline:


Motivations

Extracting Temporal expression and the TIMEX3
tag

TETI:
− System architecture
− Demo

Evaluation

Conclusions & Future Work

Motivations

Recovering temporal relations in text/discourse is essential to
improve the performance of many NLP systems (O.D-Q.A., Text
Mining, Summarization, Reasoning)

Most temporal information in text/discourse is only IMPLICITLY
stated

Need to develop procedures to maximize the role of the various
sources of information

Temporal expressions represent a source of explicit temporal
knowledge which can:
− Locate an eventuality in time, and thus used for
inferencing for temporal relations between eventualities
− Measure the duration of an eventuality

Extracting Temporal Expressions


The extraction of timexes can be divide into 4
subtasks:
− Recognizing and bracketing the timex
− Feature extraction (type of time unit, referential
status, presence of modifiers)
− Computing the interval of reference on the time
line
− Resolving the timex, i.e. normalize the value to a
standard output format

Extracting Temporal Expressions


The extraction of timexes can be divide into 4
subtasks:
− Recognizing and bracketing the timex
− Feature extraction (type of time unit,
referential status, presence of modifiers)
− Computing the interval of reference on the time
line
− Resolving the timex, i.e. normalize the value to a
standard output format

Temporal Expressions in TimeML:
The TIMEX3 tag

TIMEX3 tag extends and improves previous tags for this task,
namely TIMEX, TIDES TIMEX2

TIMEX3 tag is used to mark any time word i.e. both absolute
and relative timexes such as day time (midnight..), dates of
different granularity (yesterday, last spring..), calendar dates
(01/12/1980..), durations (three hours, two years..), set of time
(yearly, every day..)

The annotation process is based on:
− the constituent structure (NP, AdjP, AdvP, Time/Date
Pattern)
− the granularity of the time units
− the relations between the timexes

TETI: Temporal Expression Tagger
for Italian

Rule-based system

Main components:
Chunked text
TIMEX
DETECTOR &
TIMEX TAGGER

Two external
resources: TimEx
Trigger Dictionary
and a Modifier
Dictionary

for Italian (2)

Chunked text

for Italian (2)

for Italian (2)

Chunker output
approximate
TIMEX3 tag
extent

Extent of timexes
corresponds to
regolar patterns of
combination of
chunks

for Italian (3)

Analysis of the
chuncked text

Chunked text

Lookout in the
TimeEx Trigger
dictionary

Extraction of the
necessary features
for the bracketing

for Italian (3)

for Italian (4)

Core element of
the tagger
Chunked text

A general
condition + set of
local conditions

If the conditions
are true, the tagger
activates the
related rules and
brackets the timex
with TIMEX3

for Italian (4)
COND
(and
(or (POTGOV_CHUNK equals N_C)
(POTGOV_CHUNK equals ADV_C)
(POTGOV_CHUNK equals ADJ_C))
(not (POTGOV_CHUNK has PREMODIF))
(not (POTGOV_lemma CHUNK-1 equals modiftrigger))
(or (not(POTGOV_lemma CHUNK+1 equals lextrigger))
(not (POTGOV_lemma CHUNK+1 equals modiftrigger)))
)
then
CREATE TIMEX3_tag
(and(BEGIN_AT B_CHUNK)
(END_AT E_CHUNK))

for Italian (4)

for Italian (5)

More complex
timexes require a
Chunked text further lookup in
the TimEx Trigger
Dictionary to
extract further
features (sematic
relations) for the
correct bracketing

for Italian (5)

Evaluation

42 newpaper articles manually annotated

367 timexes

TAG TOT CORR. MISSING INCORR. P R F

TIMEX3 367 321 35 66 82.95 90.17 86.41

TIMEX3: 90 55 12 23 82.09 70.51 75.86
modificatori

Conclusion & Future Work

• Reduction of the number of false positives
• Implemetation of the normalization phase → rule
based
• Re-wrting of the rules to be compliant with the
KAF format (KYOTO Project)
• Release of the tool via web service

Acknowlegments

Thanks to Roberto Bartolini for his help in the
development of the demo

Complex Rule 1
COND
(and
(not (POTGOV_lemma CHUNK-1 equals modiftrigger))
((POTGOV_lemma CHUNK+1 equals lextrigger)
then
(GET GRAN
GET DEFAULT TYPE))
(COND
((PREMODIF_POTGOV_CHUNK equals modiftrigger)
then
(GET INFO_NORMALIZATION
GET TIMEML_MOD_ATTRIBUTE
GET TIMEML_BEGINPOINT_ATTRIBUTE
GET TIMEML_ENDPOINT_ATTRIBUTE
GET TR_RESPECT_TO ANCHOR))
T)
(or (POTGOV_CHUNK+1 equals N_C)
(POTGOV_CHUNK+1 equals ADV_C)
(POTGOV_lemma CHUNK+1 equals DATE PATTERN))
(not (POTGOV_CHUNK+1 has PREMODIF))
(POTGOV_CHUNK equals N_C)

Complex Rule 1b
(COND
1((and (equals (SEM_RELATION POTGOV_CHUNK)
(has_as_part (LEXTRIG_CIBLE POTGOV_CHUNK+1))

(equals (DEFAULT_TYPE POTGOV_CHUNK)DATE))
(or (equals (DEFAULT_TYPE POTGOV_CHUNK+1) DATE))
(equals (DEFAULT TYPE POTGOV_CHUNK+1) TIME)))

then
CREATE TIMEX3
(and (BEGIN_AT B_POTGOV_CHUNK)
(END_AT E_POTGOV_CHUNK+1)))
2 (( and (CREATE TIMEX3
(and (BEGIN_AT B_POTGOV_CHUNK)
(END_AT E_POTGOV_CHUNK))
(and (BEGIN_AT B_POTGOV_CHUNK+1)
(END_AT E_POTGOV_CHUNK+1))
)))

TETI: a TimeML Compliant TimEx Tagger for Italian

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Similar to TETI: a TimeML Compliant TimEx Tagger for Italian

Similar to TETI: a TimeML Compliant TimEx Tagger for Italian (20)

TETI: a TimeML Compliant TimEx Tagger for Italian