Temporal Relations with Signals: the Case of Italian Temporal Prepositions Tommaso Caselli , Felice dell’Orletta and Irina Prodanof {firstname.lastname@ilc.cnr.it} ILC-CNR, Pisa 16 th  International Symposium on Temporal Representation and Reasoning  TIME 2009  Bressanone/Brixen, July 24 2009
Different approach Application oriented NLP techniques Focus on: intuitions, knowledge and strategies people use in order to  place events in time  order events (encoding and decoding) Query texts (corpora) and NOT structured knowledge Introduction
Outline: Motivations Temporal Signals in Italian:  Theoretical background Methodology Corpus Study  A Maximum Entropy Model Feature Identification Evaluation and Results Conclusion and Future Work
Motivations Recovering temporal relations in text/discourse is essential to improve the performance of many NLP systems (O.D-Q.A., Text Mining, Summarization, Reasoning) Most temporal information in text/discourse is only IMPLICITLY stated Need to develop procedures to maximize the role of the various sources of information Temporal prepositions are a partially explicit source of information. Determinig their meaning is part of a strategy to improve the extraction of temporal information
Motivations (2)
Theoretical Background  SIGNAL = cover term for a homogeneous class of words which express  relations  between textual entities EXPLICIT  = self-evident and stable meaning;  Rel (X, Y) IMPLICIT  = abstract meaning which gets specialized in the co-text;  Rel ( λ (X),  λ (Y)) Temporal signals express temporal relations. Temporal signals can occur in 3 types of constructions involving temporal entities: temporal expression – temporal expression eventuality – temporal expression eventuality - eventuality
Corpus Study: Data To identify a large set of temporal signals realized by prepositions we have conducted a corpus study: 5 million shallow parsed word corpus (from the PAROLE corpus) all PP chunks with their left and right contexts have been automatically extracted and imported into a database structure automatically generated DB   augmented with ontological information from the SIMPLE/CLIPS Ontology, by associating the head noun of each PP chunk to its ontological type extraction of the noun head corresponding to type TIME + postprocessing to exclude false positives (e.g.  incubation ,  school …)
Corpus Study (2) Temporal relations coded by implicit signals: annotation of temporal relations by means of paraphrase tests e.g. [sono stato sposato]  per  [4 anni] ( I’ve been married for four years ) The state of “being married” EQUALS four years 499 occurrences of construction of the type “eventuality + signal + temporal expressions” 9 temporal relations (compliant with TimeML and ISO-TimeML):  overlap , simultaneous, before, after, no tlink, begin, end,  before_ending ,  equals the   most frequent temporal relation/implicit signal is assumed to be the prototypical meaning of the signal
Feature Identification The corpus study together with theoretical statements have led to the identification of 16 features: PREP: the signal lemma 3 sets of co-textual feature: information about temporal expression information about the eventuality local contextual information
Feature Identification – Temporal Expressions Temporal expression features: Ontological status: INSTANT, INTERVAL Type of temporal expressions (TIMEX):  DATE:  August 3 ;  1968 ;  01/12/1980 … DURATION:  3 hours ;  the last quarter … SET:  once every year … TIME:  3 o’ clock ;  (in) the morning … Presence of a quantifier: QUANTIFIER
Feature Identification - Eventuality Eventuality features: Lemma (POTGOV_head); POS of the eventuality: VERB, NOUN  Presence of negations (NEGATION) Verb diatesis (DIATESIS) Tense: PRESENT, IMPERFECT, FUTURE, PAST, INFINITIVE (Viewpoint) Aspect: IMPERFECTIVE, PERFECTIVE, PROGRESSIVE, NONE Lexical Aspect (AKTIONSAART): TRANSITION, PROCESS, STATE
Feature Identification – Local context Local context features: features which accounts for the presence of further signals in the local context which influence the identification of the  Rel  value of the signal in analysis FOLLOWED_SIGNAL+TIMEX PRECEED_SIGNAL+TIMEX FOLLOWED_SIGNAL+EVENT
Building a M.E. Model Feature annotation: manually conducted by one annotator + one of the author. 1000 instances of constructions of the type “eventuality + signal + timex” two interlinked criteria: semantic transparency of the signal + relative frequency of the signal in the 5 million shallow parsed corpus Assigning the right temporal relation is (in essence) a tagging task.    Maximum Entropy algorithm:  it provides a suitable solution to identify the set of possible values for each signal on the basis of the conditional probability distribution. No a priori constraints must be met other than those related to a set of features  f i (a, c) of a context C, whose distribution is derived from the training data.
Evaluation  The data set has been split in test (100) and training (900) data 8 different models have been created to discover the most salient features. 10- cross fold validation/model.  All models outperforms the baseline    relevance of the features
Evaluation (2) PREP  INTERVAL INSTANT POTGOV_head VERB NOUN DIATESIS NEGATION AKTIONSAART FOLLOWED_SIGNAL+TIMEX PRECEED_SIGNAL+TIMEX FOLLOWED_SIGNAL+EVENT TENSE ASPECT TIMEX QUANTIFIER  10 Feature Model Performance = 90% surface-based features good performance without the AKTIONSAART feature
Evaluation (3) PREP, INTERVAL,  INSTANT,  FOLLOWED_SIGNAL+TIMEX, PRECEED_SIGNAL+TIMEX, FOLLOWED_SIGNAL+EVENT, TIMEX,  QUANTIFIER 89.8% 8 features   PREP, INTERVAL,  INSTANT,  AKTIONSAART , FOLLOWED_SIGNAL+TIMEX, PRECEED_SIGNAL+TIMEX, FOLLOWED_SIGNAL+EVENT, TIMEX, QUANTIFIER 89.8% 9 features Features Performance Model
Evaluation (3) PREP, INTERVAL,  INSTANT, TIMEX, QUANTIFIER 87.6%  5 features PREP, INTERVAL,  INSTANT,  FOLLOWED_SIGNAL+TIMEX, PRECEED_SIGNAL+TIMEX, FOLLOWED_SIGNAL+EVENT, TIMEX, QUANTIFIER 86.8% 7 features   PREP, INTERVAL,  INSTANT, AKTIONSAART, FOLLOWED_SIGNAL+TIMEX, PRECEED_SIGNAL+TIMEX, FOLLOWED_SIGNAL+EVENT, TIMEX 85% 8 features (No QUANTIFIER) Features Performance Model
Conclusion & Future Work Mismatch between linguistic theory and features salience Observations on the features:  5 core features: PREP, INSTANT, INTERVAL, TIMEX, QUANTIFIER (5 feature model) AKTIONSAART influence in this task is almost null. It could be reduced with a set of features more surface-based e.g. presence of D.O., definiteness, cardinality, type of subject… the remaining features could be activated in particular linguistic context and with particular signals; e.g. TENSE, ASPECT and AKTIONSAART (ot its subsitutes) with the signal IN; the local context features with the signals DA, A and TRA. Integration of the M.E. Model into a complete automatic system for temporal processing of text/discourse
Thanks

Temporal Relations with Signals: the case of Italian Temporal Prepositions

  • 1.
    Temporal Relations withSignals: the Case of Italian Temporal Prepositions Tommaso Caselli , Felice dell’Orletta and Irina Prodanof {firstname.lastname@ilc.cnr.it} ILC-CNR, Pisa 16 th International Symposium on Temporal Representation and Reasoning TIME 2009 Bressanone/Brixen, July 24 2009
  • 2.
    Different approach Applicationoriented NLP techniques Focus on: intuitions, knowledge and strategies people use in order to place events in time order events (encoding and decoding) Query texts (corpora) and NOT structured knowledge Introduction
  • 3.
    Outline: Motivations TemporalSignals in Italian: Theoretical background Methodology Corpus Study A Maximum Entropy Model Feature Identification Evaluation and Results Conclusion and Future Work
  • 4.
    Motivations Recovering temporalrelations in text/discourse is essential to improve the performance of many NLP systems (O.D-Q.A., Text Mining, Summarization, Reasoning) Most temporal information in text/discourse is only IMPLICITLY stated Need to develop procedures to maximize the role of the various sources of information Temporal prepositions are a partially explicit source of information. Determinig their meaning is part of a strategy to improve the extraction of temporal information
  • 5.
  • 6.
    Theoretical Background SIGNAL = cover term for a homogeneous class of words which express relations between textual entities EXPLICIT = self-evident and stable meaning; Rel (X, Y) IMPLICIT = abstract meaning which gets specialized in the co-text; Rel ( λ (X), λ (Y)) Temporal signals express temporal relations. Temporal signals can occur in 3 types of constructions involving temporal entities: temporal expression – temporal expression eventuality – temporal expression eventuality - eventuality
  • 7.
    Corpus Study: DataTo identify a large set of temporal signals realized by prepositions we have conducted a corpus study: 5 million shallow parsed word corpus (from the PAROLE corpus) all PP chunks with their left and right contexts have been automatically extracted and imported into a database structure automatically generated DB augmented with ontological information from the SIMPLE/CLIPS Ontology, by associating the head noun of each PP chunk to its ontological type extraction of the noun head corresponding to type TIME + postprocessing to exclude false positives (e.g. incubation , school …)
  • 8.
    Corpus Study (2)Temporal relations coded by implicit signals: annotation of temporal relations by means of paraphrase tests e.g. [sono stato sposato] per [4 anni] ( I’ve been married for four years ) The state of “being married” EQUALS four years 499 occurrences of construction of the type “eventuality + signal + temporal expressions” 9 temporal relations (compliant with TimeML and ISO-TimeML): overlap , simultaneous, before, after, no tlink, begin, end, before_ending , equals the most frequent temporal relation/implicit signal is assumed to be the prototypical meaning of the signal
  • 9.
    Feature Identification Thecorpus study together with theoretical statements have led to the identification of 16 features: PREP: the signal lemma 3 sets of co-textual feature: information about temporal expression information about the eventuality local contextual information
  • 10.
    Feature Identification –Temporal Expressions Temporal expression features: Ontological status: INSTANT, INTERVAL Type of temporal expressions (TIMEX): DATE: August 3 ; 1968 ; 01/12/1980 … DURATION: 3 hours ; the last quarter … SET: once every year … TIME: 3 o’ clock ; (in) the morning … Presence of a quantifier: QUANTIFIER
  • 11.
    Feature Identification -Eventuality Eventuality features: Lemma (POTGOV_head); POS of the eventuality: VERB, NOUN Presence of negations (NEGATION) Verb diatesis (DIATESIS) Tense: PRESENT, IMPERFECT, FUTURE, PAST, INFINITIVE (Viewpoint) Aspect: IMPERFECTIVE, PERFECTIVE, PROGRESSIVE, NONE Lexical Aspect (AKTIONSAART): TRANSITION, PROCESS, STATE
  • 12.
    Feature Identification –Local context Local context features: features which accounts for the presence of further signals in the local context which influence the identification of the Rel value of the signal in analysis FOLLOWED_SIGNAL+TIMEX PRECEED_SIGNAL+TIMEX FOLLOWED_SIGNAL+EVENT
  • 13.
    Building a M.E.Model Feature annotation: manually conducted by one annotator + one of the author. 1000 instances of constructions of the type “eventuality + signal + timex” two interlinked criteria: semantic transparency of the signal + relative frequency of the signal in the 5 million shallow parsed corpus Assigning the right temporal relation is (in essence) a tagging task.  Maximum Entropy algorithm: it provides a suitable solution to identify the set of possible values for each signal on the basis of the conditional probability distribution. No a priori constraints must be met other than those related to a set of features f i (a, c) of a context C, whose distribution is derived from the training data.
  • 14.
    Evaluation Thedata set has been split in test (100) and training (900) data 8 different models have been created to discover the most salient features. 10- cross fold validation/model. All models outperforms the baseline  relevance of the features
  • 15.
    Evaluation (2) PREP INTERVAL INSTANT POTGOV_head VERB NOUN DIATESIS NEGATION AKTIONSAART FOLLOWED_SIGNAL+TIMEX PRECEED_SIGNAL+TIMEX FOLLOWED_SIGNAL+EVENT TENSE ASPECT TIMEX QUANTIFIER 10 Feature Model Performance = 90% surface-based features good performance without the AKTIONSAART feature
  • 16.
    Evaluation (3) PREP,INTERVAL, INSTANT, FOLLOWED_SIGNAL+TIMEX, PRECEED_SIGNAL+TIMEX, FOLLOWED_SIGNAL+EVENT, TIMEX, QUANTIFIER 89.8% 8 features PREP, INTERVAL, INSTANT, AKTIONSAART , FOLLOWED_SIGNAL+TIMEX, PRECEED_SIGNAL+TIMEX, FOLLOWED_SIGNAL+EVENT, TIMEX, QUANTIFIER 89.8% 9 features Features Performance Model
  • 17.
    Evaluation (3) PREP,INTERVAL, INSTANT, TIMEX, QUANTIFIER 87.6% 5 features PREP, INTERVAL, INSTANT, FOLLOWED_SIGNAL+TIMEX, PRECEED_SIGNAL+TIMEX, FOLLOWED_SIGNAL+EVENT, TIMEX, QUANTIFIER 86.8% 7 features PREP, INTERVAL, INSTANT, AKTIONSAART, FOLLOWED_SIGNAL+TIMEX, PRECEED_SIGNAL+TIMEX, FOLLOWED_SIGNAL+EVENT, TIMEX 85% 8 features (No QUANTIFIER) Features Performance Model
  • 18.
    Conclusion & FutureWork Mismatch between linguistic theory and features salience Observations on the features: 5 core features: PREP, INSTANT, INTERVAL, TIMEX, QUANTIFIER (5 feature model) AKTIONSAART influence in this task is almost null. It could be reduced with a set of features more surface-based e.g. presence of D.O., definiteness, cardinality, type of subject… the remaining features could be activated in particular linguistic context and with particular signals; e.g. TENSE, ASPECT and AKTIONSAART (ot its subsitutes) with the signal IN; the local context features with the signals DA, A and TRA. Integration of the M.E. Model into a complete automatic system for temporal processing of text/discourse
  • 19.