Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

My research taster project

1,030 views

Published on

Published in: Technology, Education
  • Be the first to comment

My research taster project

  1. 1. CS-GN-TEAM: internal presentationresearch taster project temporal expressions extraction Michele Filannino + You Manchester, 15/02/2012
  2. 2. presentation my research taster projectcdt?■ 4-year PhD course■ funded by EPSRC■ industrial partners■ multi-disciplinary■ new model for all PhD training within the UK 15/02/2012, Michele Filannino 2 / 23
  3. 3. presentation my research taster projectcdt?■ 6 months of foundation period ● 3 postgraduate courses ▶ Machine Learning and Data Mining, Modelling and visualisation of high-dimensional data, Semi-structured data and the web ● 3 scientific methods courses ● 1 short taster project [6 weeks] ● creativity workshops■ 3,5 years of PhD research 15/02/2012, Michele Filannino 3 / 23
  4. 4. presentation my research taster projectwhere we are■ Computer science ● natural language processing ▶ information retrieval ★ information extraction ✦ temporal expressions extraction 15/02/2012, Michele Filannino 4 / 23
  5. 5. presentation my research taster projector... ■ Computer science ● data mining ▶ text mining ★ information extraction ✦ temporal expressions extraction 15/02/2012, Michele Filannino 5 / 23
  6. 6. presentation my research taster projecttemporal expression ■ natural language phrase that denotes a temporal entity: an interval or an instant1 ● fully-qualified: no reference to any other temporal entity ▶ March 15, 2001 ● deictic: reference to the time of utterance ▶ today, yesterday, three weeks ago, last Thursday ● anaphoric: reference to a timex2 previously evoked in the text ▶ March 15, the next week, Saturday, at that time1 L.Ferro, I. Mani, B. Sundheim, and G. Wilson, “Tides temporal annotation guidelines, v.1.0.2,” MITRE, 2001 15/02/2012, Michele Filannino 6 / 232 timex temporal expression
  7. 7. presentation my research taster projectwhy?■ user’s perspective ● temporal aspects of events and entities provide a natural mechanism for organising information.■ machine’s perspective ● improvements in ▶ question answering, summarisation, browsing 15/02/2012, Michele Filannino 7 / 23
  8. 8. presentation my research taster projecthow?■ annotation ● recognition ▶ automatically detect and delimitate expressions ▶ mostly machine-learning techniques ● normalisation ▶ assign attributes values for all the recognised expressions ▶ using a shared and formal format (standard?) ▶ mostly rule-based techniques■ reasoning or searching 15/02/2012, Michele Filannino 8 / 23
  9. 9. presentation my research taster projecttimex forms1 ■ time or date references ● 11pm, February 14th, 2005 ■ time references that anchor on another time ● one hour after midnight, two weeks before Christmas ■ durations ● few months, two days, five years ■ recurring times ● every third month, twice in the hour1 J.Poveda, M. Surdeanu, and J. Turmo, “An analysis of Bootstrapping for the Recognitionof Temporal Expressions”, 2009 15/02/2012, Michele Filannino 9 / 23
  10. 10. presentation my research taster projecttimex forms1 ■ context-dependent times ● today, last year ■ vague references ● somewhere in the middle of June, the near future ■ times indicated by an event ● the day S. Berlusconi resigned ▶ an event is considered a cover term for situations that happen or occur1 J.Poveda, M. Surdeanu, and J. Turmo, “An analysis of Bootstrapping for the Recognitionof Temporal Expressions”, 2009 15/02/2012, Michele Filannino 10 / 23
  11. 11. presentation my research taster projecttimeline ACE-2004 dev & eval TempEval Task#15 TempEval-3 Task#1 (TERN2004 corpus) (in SemEval07) (in SemEval13) TimeML TempEval-2 Task#13 (standard) (in SemEval10) 85%1 87.8%1 90.7%1 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 TimeBank SVM Conditional Random Fields (corpus) (machine learning) (machine learning) Hand grammar approach Maximum Entropy Class. Markov logic network (rule-based) (machine learning) (machine learning)1 TERN2004 corpus 15/02/2012, Michele Filannino 11 / 23
  12. 12. presentation my research taster projectstandards■ “the nice thing about standards is, there are so many to choose from” by Andrew S. Tanenbaum ● TimeML ● DAML-Time ● TIDES ● ACE-TERN 15/02/2012, Michele Filannino 12 / 23
  13. 13. presentation my research taster projectstandards■ there’s a tension between ● flexibility and efficiency ● usability and flexibility ● complexity and spreadability ● flexibility and agreement 15/02/2012, Michele Filannino 13 / 23
  14. 14. presentation my research taster projectabout the spreadability 15/02/2012, Michele Filannino 14 / 23
  15. 15. presentation my research taster projectabout the agreement TimeML Tag agreement TIMEX3 0.83 SIGNAL 0.77 EVENT 0.78 ALINK 0.81 SLINK 0.85 TLINK 0.55Source: http://timeml.org/site/timebank/documentation-1.2.html 15/02/2012, Michele Filannino 15 / 23
  16. 16. presentation my research taster projectexample: raw text That means Unisys must pay about $100 million in interest every quarter, on top of $27 million in dividends on preferred stock.Source: TRIOS TimeBank v.0.1 15/02/2012, Michele Filannino 16 / 23
  17. 17. presentation my research taster projectexample: recognition That means Unisys must <ev>pay</ev> about $100 million in interest <te>every quarter</te>, on top of $27 million in dividends on preferred stock.Source: TRIOS TimeBank v.0.1 15/02/2012, Michele Filannino 17 / 23
  18. 18. presentation my research taster projectexample: normalisation That means Unisys must <EVENT eid="e110" mainevent="YES" class="OCCURRENCE" stem="pay" tense="NONE" aspect="NONE" polarity="POS" pos="VERB">pay</EVENT> about $100 million in interest <TIMEX3 tid="t256" type="SET" value="P1Q" temporalFunction="false" functionInDocument="NONE" quant="every">every quarter</TIMEX3>, on top of $27 million in dividends on preferred stock. <TLINK lid="l32" relType="BEFORE" relatedToEvent="e110" eventID="e107"/> <TLINK lid="l26" relType="OVERLAP" eventID="e110" relatedToTime="t256"/>Source: TRIOS TimeBank v.0.1 15/02/2012, Michele Filannino 18 / 23
  19. 19. presentation my research taster projectconsiderations■ specialised linguistic approaches do not pay ● machine learning techniques usually perform better■ scarcity of pre-annotated corpus ● manual corpus annotation is very tricky ● partially solved with TempEval-3 (2013) ▶ 1M words corpus automatically annotated by TRIOS■ vibrant area in bio-medical domain 15/02/2012, Michele Filannino 19 / 23
  20. 20. presentation my research taster project “temporal expressions” “temporal expressions” AND “clinical” 500 450 44 42 41 45 400 46 36 350 22 300 15 250 15 16 433 200 410 410 412 10 12 382 370 150 310 280 220 230 100 182 180 50 9 33 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012Source: Google Scholar (last update 09/02/2012) 15/02/2012, Michele Filannino 20 / 23
  21. 21. presentation my research taster project “temporal expressions” “temporal expressions” AND “clinical” 100% 5% 6% 6% 5% 7% 7% 9% 9% 9% 10% 9% 11% 90% 80% 21% 70% 60% 50% 95% 94% 93% 94% 95% 93% 91% 91% 91% 90% 91% 89% 40% 79% 30% 20% 10% 0% 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012Source: Google Scholar (last update 09/02/2012) 15/02/2012, Michele Filannino 21 / 23
  22. 22. presentation my research taster projectconsiderations■ rule-based approach will never die ● CRF and MLN are machine learning hybridisation■ better performance means clever decomposition ● how to divide the general problem into sub-problems 15/02/2012, Michele Filannino 22 / 23
  23. 23. presentation my research taster projectmy to-do list ■ collect some corpus in clinical field ■ study novel machine learning approaches ● maximum likelihood, logistic regression, CRF, MLN ■ implement a prototype ● Python or MATLAB 12 days elapsed 18 days remaining0 3 6 9 12 15 18 21 24 27 30 15/02/2012, Michele Filannino 23 / 23
  24. 24. Thank you.

×