SlideShare a Scribd company logo
TIMEN
An Open Temporal Expression
   Normalisation Resource




H.Llorens, L.Derczynski, R.Gaizauskas, E. Saquete
Outline
●   Introduction: Timex normalisation
●   Related work
●   Problem: reinventing the wheel once and again

●   Proposal: TIMEN
●   Evaluation
●   Conclusions
●   Further Work
Timex Normalisation
Temporal information extraction subtask.

Timex: linguistic expression of a time point or interval.

Normalisation: semantic interpretation of timexes.
Temporal Expression (TIMEX)          Timex normalization
Linguistics/Variability/Relativity   ISO 8601/Invariable interpretation
June 2012, next month, 06/2012       2012-06
this morning 7 a.m.                  2012-05-24T07:00
3 days and 3 hours                   PT3D3H
weekly                               XXXX-XX-WXX
Timex Normalisation (II)
Useful for a variety of NLP applications: IR, QA,
Summarization, etc.

           I went to the cinema yesterday.
             event                    timex
                                 Value: 2012-05-23

     When did he go to the cinema? 2012-05-23

The main advantage of normalisation is having timexes in
standard time representations (e.g., gregorian calendar).
Related Work
There are many approaches to timex normalisation

● Pre TempEval-2
  ○ TempEx (2000), GUTime (2005), Chronos (2004),
     TERSEO (2005), TimexTag (2005), TEA (2006),
     DANTE (2007)...
● TempEval-2 (2010)
  ○ HeidelTime, TRIPS/TRIOS, TIPSem/TIPSemB...
Similarities and differences
● Approaches have slightly different architectures and
   show slightly different performances on tests.

● But all the approaches are rule-based and in general
   they use the same normalization strategies.

● & also require the same parameters to perform the task.
   ○   DCT: document creation time (deictic) (2 days ago: 2012-05-22)
   ○   Reference time: time talked about (anaphoric)
       (2 days before: 2012-05-20)
   ○   Tense: Resolution direction (October)
       Past (2011-10), Present/Future (2012-10)
The problem
Reinventing the wheel once and again
● Implementation of high-performance approaches is
  costly and it is done all the times from the scratch.
● all the approaches are similar: rule-based with similar
  normalization rules and strategies.
● none is meant to be reused and refined by others.
Proposal: TIMEN
Characteristics:
 ● Open philosophy: meant to be reused and refined (even
   across languages)

 ●   Not only meant for computer scientists:
      ○   the algorithms (source code) and normalisation rules (db of user-
          friendly rules with a documented syntax) are separated.

 ●   Independent from other timex processing tasks

 ●   Multi-platform and easy integration
TIMEN Library Architecture
Example:
timex: three days ago
DCT:2012-05-24
normtext: 3_day_ago
pattern: Num_TUnit_ago
only 1 rule matches.
normalized value: 2012-05-21




Example2:
timex: October 20
2 rules matching
disambiguation
20 probably a day
rather than a year
because <32
Rule base sample (English)
TIMEN integration
TIMEN community
● Open-source software:
    http://code.google.com/p/timen/



● Crowd extension of the rule set (interactive
  web interface to upload and check new
  rules): http//timen.org

* new rules only accepted if they improve the performance on the current
dataset or new examples (human reviewed). Eg: New Year's Eve
Evaluation
Experiments:
● Normalization accuracy of TIMEN
● Performance gain in s-o-a approaches by
  integrating TIMEN
Datasets:
● TempEval-2 test-set
  (already known for approaches, mainly common dates and duration)
● TimenEval dataset
  (new, unknown for appr., balanced among different timex types)
Normalisation accuracy

        gold timexes                   normalisation
        yesterday                      2012-05-23
        2012                                             correct
                                       2012              correct
        October                        2012-10
        daily                                            incorrect
                         TIMEN         xxxx-xx-xx        correct
        morning                        2011              incorrect
        ...                            ...               ...


e.g. TOTAL: 100 timexes to normalise   e.g. TOTAL: 90 correct normalizations


         RESULT: 90/100 --> 90% ACCURACY
Normalisation accuracy
         TEST SET          NORMALISAION ACC
         TempEval-2               0.90
         TimenEval                0.68


● TIMEN shows a high performance even in this first
  version (only 76 rules).

● TimenEval accuracy is lower. This corpus is more
  heterogeneous (times/sets) and normalization is more
  difficult.
Performance gain
                    built-in
                                   Original
                normalisation
Approach X                         normalisation
                of Approach X
recognized
timexes                             New
                   TIMEN
                                    normalisation



Performance gain = New accuracy - Original accuracy
Performance gain
(TempEval-2) "known data"
   System       built-in norm.   TIMEN norm.   Err. Redution
   TIPSemB           0.83            0.89           35%
   HeidelTime        0.94            0.94           0%
   TERNIP            0.76            0.92           66%

● Replacing built-in normalization approaches of the
  systems by TIMEN generally improves their
  performance in TE2 testset.
● Tested (current) versions of the systems may have
  been developed/updated being aware of this data. What
  does it happen with data which is new for them?
Performance gain
(TimenEval) "new data"
   System       built-in norm.   TIMEN norm.   Err. Redution
   TIPSemB           0.57            0.67           23%
   HeidelTime        0.72            0.74           7%
   TERNIP            0.70            0.72           66%



● Using new data, the built-in approaches performance
  decreases in general.
● TIMEN favours the normalization performance for all the
  systems.
Conclusions
● We presented an open tool for timex normalisation:
  TIMEN.

● ADVANTAGES:
  ○ High performance (above recent approaches).
  ○ Easily integrated in any timex recognition
    approach.
  ○ Can be improved by the community (open philosophy),
    and avoids re-development from scratch.
  ○ Available: http://timen.org and Google code
Further Work

● Community-based extension and refinement
  of TIMEN (rulebase).

● Extensive evaluation of TIMEN in various
  languages (Spanish, Chinese, Italian and Danish).
TIMEN: An Open TIMEX Normalisation Resource

              THANK YOU!
                   QUESTIONS?

                   http://timen.org

       H.Llorens, L.Derczynski, R.Gaizauskas, E. Saquete

More Related Content

Viewers also liked

Normalization
NormalizationNormalization
Normalization
lingesan
 
Normalization in databases
Normalization in databasesNormalization in databases
Normalisation
NormalisationNormalisation
Normalisation
Forrester High School
 
Dbms and sqlpptx
Dbms and sqlpptxDbms and sqlpptx
Dbms and sqlpptx
thesupermanreturns
 
Normalisation - 2nd normal form
Normalisation - 2nd normal formNormalisation - 2nd normal form
Normalisation - 2nd normal form
college
 
Normalization
NormalizationNormalization
Normalization
ochesing
 
DBMS - Normalization
DBMS - NormalizationDBMS - Normalization
DBMS - Normalization
Jitendra Tomar
 
Databases: Normalisation
Databases: NormalisationDatabases: Normalisation
Databases: Normalisation
Damian T. Gordon
 
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NFDatabase Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Oum Saokosal
 
Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)
Jargalsaikhan Alyeksandr
 

Viewers also liked (10)

Normalization
NormalizationNormalization
Normalization
 
Normalization in databases
Normalization in databasesNormalization in databases
Normalization in databases
 
Normalisation
NormalisationNormalisation
Normalisation
 
Dbms and sqlpptx
Dbms and sqlpptxDbms and sqlpptx
Dbms and sqlpptx
 
Normalisation - 2nd normal form
Normalisation - 2nd normal formNormalisation - 2nd normal form
Normalisation - 2nd normal form
 
Normalization
NormalizationNormalization
Normalization
 
DBMS - Normalization
DBMS - NormalizationDBMS - Normalization
DBMS - Normalization
 
Databases: Normalisation
Databases: NormalisationDatabases: Normalisation
Databases: Normalisation
 
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NFDatabase Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
 
Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)
 

Similar to TIMEN: An Open Temporal Expression Normalisation Resource

fpbm- pg subject in Construction Managament
fpbm- pg subject in Construction Managamentfpbm- pg subject in Construction Managament
fpbm- pg subject in Construction Managament
deepika977036
 
Crating a Robust Performance Strategy
Crating a Robust Performance StrategyCrating a Robust Performance Strategy
Crating a Robust Performance Strategy
Guatemala User Group
 
Temporal Data
Temporal DataTemporal Data
Temporal Data
Command Prompt., Inc
 
Scheduling
SchedulingScheduling
Tale-of-math-and-scalability.pdf
Tale-of-math-and-scalability.pdfTale-of-math-and-scalability.pdf
Tale-of-math-and-scalability.pdf
Bartłomiej Żyliński
 
The Prediction Of Time Trending Techniques. Is It A Reasonable Estimate?
The Prediction Of Time Trending Techniques. Is It A Reasonable Estimate?The Prediction Of Time Trending Techniques. Is It A Reasonable Estimate?
The Prediction Of Time Trending Techniques. Is It A Reasonable Estimate?
Gan Chun Chet
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.ppt
Arumugam90
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
Sunghoon Joo
 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
Afaq Mansoor Khan
 
Design of Work Systems
Design of Work SystemsDesign of Work Systems
Design of Work Systems
Kris Marie Laserna
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Universitat Politècnica de Catalunya
 
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to Prometheus
Bol.com Techlab
 
Prometheus monitoring
Prometheus monitoringPrometheus monitoring
Prometheus monitoring
Hien Nguyen Van
 
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to Prometheus
Bol.com Techlab
 
Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.
Wuhyun Rico Shin
 
Module Owb Tuning
Module Owb TuningModule Owb Tuning
Module Owb Tuning
Nicholas Goodman
 
Industrial plant optimization in reduced dimensional spaces
Industrial plant optimization in reduced dimensional spacesIndustrial plant optimization in reduced dimensional spaces
Industrial plant optimization in reduced dimensional spaces
Capstone
 
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Chris Ohk
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Ahmed Yousry
 
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
jaumebp
 

Similar to TIMEN: An Open Temporal Expression Normalisation Resource (20)

fpbm- pg subject in Construction Managament
fpbm- pg subject in Construction Managamentfpbm- pg subject in Construction Managament
fpbm- pg subject in Construction Managament
 
Crating a Robust Performance Strategy
Crating a Robust Performance StrategyCrating a Robust Performance Strategy
Crating a Robust Performance Strategy
 
Temporal Data
Temporal DataTemporal Data
Temporal Data
 
Scheduling
SchedulingScheduling
Scheduling
 
Tale-of-math-and-scalability.pdf
Tale-of-math-and-scalability.pdfTale-of-math-and-scalability.pdf
Tale-of-math-and-scalability.pdf
 
The Prediction Of Time Trending Techniques. Is It A Reasonable Estimate?
The Prediction Of Time Trending Techniques. Is It A Reasonable Estimate?The Prediction Of Time Trending Techniques. Is It A Reasonable Estimate?
The Prediction Of Time Trending Techniques. Is It A Reasonable Estimate?
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.ppt
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
 
Design of Work Systems
Design of Work SystemsDesign of Work Systems
Design of Work Systems
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to Prometheus
 
Prometheus monitoring
Prometheus monitoringPrometheus monitoring
Prometheus monitoring
 
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to Prometheus
 
Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.
 
Module Owb Tuning
Module Owb TuningModule Owb Tuning
Module Owb Tuning
 
Industrial plant optimization in reduced dimensional spaces
Industrial plant optimization in reduced dimensional spacesIndustrial plant optimization in reduced dimensional spaces
Industrial plant optimization in reduced dimensional spaces
 
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
 
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
 

More from Leon Derczynski

Joint Rumour Stance and Veracity
Joint Rumour Stance and VeracityJoint Rumour Stance and Veracity
Joint Rumour Stance and Veracity
Leon Derczynski
 
State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018
Leon Derczynski
 
RumourEval
RumourEvalRumourEval
RumourEval
Leon Derczynski
 
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition ResourceBroad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Leon Derczynski
 
Handling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGCHandling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGC
Leon Derczynski
 
Efficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingEfficient named entity annotation through pre-empting
Efficient named entity annotation through pre-empting
Leon Derczynski
 
Leveraging the Power of Social Media
Leveraging the Power of Social MediaLeveraging the Power of Social Media
Leveraging the Power of Social Media
Leon Derczynski
 
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesCorpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Leon Derczynski
 
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Leon Derczynski
 
Starting to Process Social Media
Starting to Process Social MediaStarting to Process Social Media
Starting to Process Social Media
Leon Derczynski
 
Christmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I doChristmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I do
Leon Derczynski
 
Recognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal ExpressionsRecognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal Expressions
Leon Derczynski
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
Leon Derczynski
 
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
 Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
Leon Derczynski
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Leon Derczynski
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in Discourse
Leon Derczynski
 
Microblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracyMicroblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracy
Leon Derczynski
 
Empirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense FrameworkEmpirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense Framework
Leon Derczynski
 
Towards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media DataTowards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media Data
Leon Derczynski
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in Discourse
Leon Derczynski
 

More from Leon Derczynski (20)

Joint Rumour Stance and Veracity
Joint Rumour Stance and VeracityJoint Rumour Stance and Veracity
Joint Rumour Stance and Veracity
 
State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018
 
RumourEval
RumourEvalRumourEval
RumourEval
 
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition ResourceBroad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
 
Handling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGCHandling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGC
 
Efficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingEfficient named entity annotation through pre-empting
Efficient named entity annotation through pre-empting
 
Leveraging the Power of Social Media
Leveraging the Power of Social MediaLeveraging the Power of Social Media
Leveraging the Power of Social Media
 
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesCorpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
 
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
 
Starting to Process Social Media
Starting to Process Social MediaStarting to Process Social Media
Starting to Process Social Media
 
Christmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I doChristmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I do
 
Recognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal ExpressionsRecognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal Expressions
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
 
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
 Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in Discourse
 
Microblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracyMicroblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracy
 
Empirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense FrameworkEmpirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense Framework
 
Towards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media DataTowards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media Data
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in Discourse
 

Recently uploaded

Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 

Recently uploaded (20)

Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 

TIMEN: An Open Temporal Expression Normalisation Resource

  • 1. TIMEN An Open Temporal Expression Normalisation Resource H.Llorens, L.Derczynski, R.Gaizauskas, E. Saquete
  • 2. Outline ● Introduction: Timex normalisation ● Related work ● Problem: reinventing the wheel once and again ● Proposal: TIMEN ● Evaluation ● Conclusions ● Further Work
  • 3. Timex Normalisation Temporal information extraction subtask. Timex: linguistic expression of a time point or interval. Normalisation: semantic interpretation of timexes. Temporal Expression (TIMEX) Timex normalization Linguistics/Variability/Relativity ISO 8601/Invariable interpretation June 2012, next month, 06/2012 2012-06 this morning 7 a.m. 2012-05-24T07:00 3 days and 3 hours PT3D3H weekly XXXX-XX-WXX
  • 4. Timex Normalisation (II) Useful for a variety of NLP applications: IR, QA, Summarization, etc. I went to the cinema yesterday. event timex Value: 2012-05-23 When did he go to the cinema? 2012-05-23 The main advantage of normalisation is having timexes in standard time representations (e.g., gregorian calendar).
  • 5. Related Work There are many approaches to timex normalisation ● Pre TempEval-2 ○ TempEx (2000), GUTime (2005), Chronos (2004), TERSEO (2005), TimexTag (2005), TEA (2006), DANTE (2007)... ● TempEval-2 (2010) ○ HeidelTime, TRIPS/TRIOS, TIPSem/TIPSemB...
  • 6. Similarities and differences ● Approaches have slightly different architectures and show slightly different performances on tests. ● But all the approaches are rule-based and in general they use the same normalization strategies. ● & also require the same parameters to perform the task. ○ DCT: document creation time (deictic) (2 days ago: 2012-05-22) ○ Reference time: time talked about (anaphoric) (2 days before: 2012-05-20) ○ Tense: Resolution direction (October) Past (2011-10), Present/Future (2012-10)
  • 7. The problem Reinventing the wheel once and again ● Implementation of high-performance approaches is costly and it is done all the times from the scratch. ● all the approaches are similar: rule-based with similar normalization rules and strategies. ● none is meant to be reused and refined by others.
  • 8. Proposal: TIMEN Characteristics: ● Open philosophy: meant to be reused and refined (even across languages) ● Not only meant for computer scientists: ○ the algorithms (source code) and normalisation rules (db of user- friendly rules with a documented syntax) are separated. ● Independent from other timex processing tasks ● Multi-platform and easy integration
  • 9. TIMEN Library Architecture Example: timex: three days ago DCT:2012-05-24 normtext: 3_day_ago pattern: Num_TUnit_ago only 1 rule matches. normalized value: 2012-05-21 Example2: timex: October 20 2 rules matching disambiguation 20 probably a day rather than a year because <32
  • 10. Rule base sample (English)
  • 12. TIMEN community ● Open-source software: http://code.google.com/p/timen/ ● Crowd extension of the rule set (interactive web interface to upload and check new rules): http//timen.org * new rules only accepted if they improve the performance on the current dataset or new examples (human reviewed). Eg: New Year's Eve
  • 13. Evaluation Experiments: ● Normalization accuracy of TIMEN ● Performance gain in s-o-a approaches by integrating TIMEN Datasets: ● TempEval-2 test-set (already known for approaches, mainly common dates and duration) ● TimenEval dataset (new, unknown for appr., balanced among different timex types)
  • 14. Normalisation accuracy gold timexes normalisation yesterday 2012-05-23 2012 correct 2012 correct October 2012-10 daily incorrect TIMEN xxxx-xx-xx correct morning 2011 incorrect ... ... ... e.g. TOTAL: 100 timexes to normalise e.g. TOTAL: 90 correct normalizations RESULT: 90/100 --> 90% ACCURACY
  • 15. Normalisation accuracy TEST SET NORMALISAION ACC TempEval-2 0.90 TimenEval 0.68 ● TIMEN shows a high performance even in this first version (only 76 rules). ● TimenEval accuracy is lower. This corpus is more heterogeneous (times/sets) and normalization is more difficult.
  • 16. Performance gain built-in Original normalisation Approach X normalisation of Approach X recognized timexes New TIMEN normalisation Performance gain = New accuracy - Original accuracy
  • 17. Performance gain (TempEval-2) "known data" System built-in norm. TIMEN norm. Err. Redution TIPSemB 0.83 0.89 35% HeidelTime 0.94 0.94 0% TERNIP 0.76 0.92 66% ● Replacing built-in normalization approaches of the systems by TIMEN generally improves their performance in TE2 testset. ● Tested (current) versions of the systems may have been developed/updated being aware of this data. What does it happen with data which is new for them?
  • 18. Performance gain (TimenEval) "new data" System built-in norm. TIMEN norm. Err. Redution TIPSemB 0.57 0.67 23% HeidelTime 0.72 0.74 7% TERNIP 0.70 0.72 66% ● Using new data, the built-in approaches performance decreases in general. ● TIMEN favours the normalization performance for all the systems.
  • 19. Conclusions ● We presented an open tool for timex normalisation: TIMEN. ● ADVANTAGES: ○ High performance (above recent approaches). ○ Easily integrated in any timex recognition approach. ○ Can be improved by the community (open philosophy), and avoids re-development from scratch. ○ Available: http://timen.org and Google code
  • 20. Further Work ● Community-based extension and refinement of TIMEN (rulebase). ● Extensive evaluation of TIMEN in various languages (Spanish, Chinese, Italian and Danish).
  • 21. TIMEN: An Open TIMEX Normalisation Resource THANK YOU! QUESTIONS? http://timen.org H.Llorens, L.Derczynski, R.Gaizauskas, E. Saquete