Recognising and Interpreting Named Temporal Expressions
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Recognising and Interpreting Named Temporal Expressions

on

  • 583 views

Paper: http://derczynski.com/sheffield/papers/named_timex.pdf ...

Paper: http://derczynski.com/sheffield/papers/named_timex.pdf

This paper introduces a new class of temporal expression – named temporal expressions – and methods for recognising and interpreting its members. The commonest temporal expressions typically contain date and time words, like April or hours. Research into recognising and interpreting these typical expressions is mature in many languages. However, there is a class of expressions that are less typical, very varied, and difficult to automatically interpret. These indicate dates and times, but are harder to detect because they often do not contain time words and are not used frequently enough to appear in conventional temporally-annotated corpora – for example Michaelmas or Vasant Panchami.

Statistics

Views

Total Views
583
Views on SlideShare
557
Embed Views
26

Actions

Likes
0
Downloads
2
Comments
0

1 Embed 26

https://twitter.com 26

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NoDerivs LicenseCC Attribution-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Recognising and Interpreting Named Temporal Expressions Presentation Transcript

  • 1. Recognising and Interpreting Named Temporal Expressions Matteo Brucato Leon Derczynski Hector Llorens Kalina Bontcheva Christian S. Jensen
  • 2. How do we talk about times? ● Calendar ● Closed class of terms – tomorrow | today | yesterday – [next | last ] [ week | month | year] – [1 - 31] [January – December] ● Really deterministic
  • 3. Wow, it's super-deterministic!
  • 4. Wow, it's super-deterministic! Credit: Kevin Knight
  • 5. … sometimes ● TempEval-2 timex recall: 66 – 88 % ● TempEval-2 normalisation: 55 – 85 % ● ~150 rules needed to get to 81% (Angeli & Uszkoreit '13) ● We can get the structured expressions OK ● But what about the rest?
  • 6. Unstructured time mentions – Christmas – Michelmas – Halloween – Easter ● Can we learn how to recognise these?
  • 7. Time expression diversity ● Current corpora too small to hold much linguistic variation ● Note characteristic knee in distribution (cf. Montemurro)
  • 8. Named Temporal Expressions ● New class of timexes – Doesn't look like a timex – Doesn't sound like a timex – … is, in fact, a timex X
  • 9. How can we mine and extract NTEs? ● Expensive to annotate and hope they appear ● Prefer an automated approach – > Let's mine Wikipedia! ● 432 English NTEs found
  • 10. NTEs in Wikipedia ● Gives term and text description ● Problem: no good as a gazetteer, some entries are polysemous (e.g. Carnival) ● Problem: recall limited with gazetteers ● Solution: build statistical tagger
  • 11. Building statistical NTE tagger ● Use list of NTEs to annotate sentences – CoNLL format, I/O binary labels ● Only use monosemous expressions ● Visit linked data searching for expressions ● If many entities found, expression is polysemous – SELECT DISTINCT ?r {?r rdfs:label "carnival"@en} – Not monosemous
  • 12. Building statistical NTE tagger ● If a sentence contains a monosemous NTE, also annotate any polysemous NTEs ● Assume that they will occur in temporal sense While it might not have the retail significance of Christmas, Halloween or Secretary's Day, Groundhog Day remains perhaps the weirdest American holiday.
  • 13. NTE recognition results ● Baseline: gazetteer of timexes in existing resources ● 2:1 train:eval split, strict matching evaluation ● Also found new NTEs! – European Cup – Dayton Peace Agreement
  • 14. How do we normalise NTEs? ● Target representation: TIMEX3 – January 2nd, 1980 → 1980-01-02 – Summer 2012 → 2012-SU – now → PRESENT REF ● Statistical learning won't manage ● Use dedicated tool, TIMEN – Open normalisation toolkit – Anyone can contribute – SotA normalisation performance – Takes a document with entity boundaries marked
  • 15. Using NTE descriptions ● We have semi-structured descriptions – “six weeks after Easter” – “last Friday in June” – “end of week 17” – “tenth day of Tishrei” ● How to convert these to rules?
  • 16. NTE normalisation rule extraction ● Create simple parser to cover majority of NTEs – “June 25th” – “Last Sunday in March” ● Covers 70.3% of NTE descriptions ● Remainder of rules may be added manually
  • 17. Normalisation + NTEs ● Evaluation ● Two corpora: – SotA (TempEval-3) – Purpose built to be hard to normalise (TimenEval) ● On TempEval-3 (restricted newswire): 0.7% error reduction ● On TimenEval (varied genre): 4.3% error reduction
  • 18. Outstanding issues: Spatial variation ● Labo[u]r Day – May 1 in much of the world – first Monday in May in Australia's QLD and NT ● Summer – Official vs. informal – North vs. south
  • 19. Outstanding issues: Easter ● Commonly used as an offset ● Non-trivial to determine ● “Computus”
  • 20. Outstanding issues: Multiple calendars ● Gregorian (Quite popular) – Not particularly rational in the first place ● Lunar (China) ● Astrological ● Hebrew ● .. and so on
  • 21. Outstanding issues: Forms of expression ● Orthographic variation: – Martin Luther King Day – MLK Day ● Regional variation: – autumn – fall
  • 22. Resources provided ● Corpus of NTEs ● Rules integrated into TIMEN in next release – around November 2013
  • 23. Thank you for your time! Do you have any questions?