Successfully reported this slideshow.
Your SlideShare is downloading. ×

My research taster project

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 24 Ad
Advertisement

More Related Content

Similar to My research taster project (20)

Advertisement

Recently uploaded (20)

My research taster project

  1. 1. CS-GN-TEAM: internal presentation research taster project temporal expressions extraction Michele Filannino + You Manchester, 15/02/2012
  2. 2. presentation my research taster project cdt? ■ 4-year PhD course ■ funded by EPSRC ■ industrial partners ■ multi-disciplinary ■ new model for all PhD training within the UK 15/02/2012, Michele Filannino 2 / 23
  3. 3. presentation my research taster project cdt? ■ 6 months of foundation period ● 3 postgraduate courses ▶ Machine Learning and Data Mining, Modelling and visualisation of high-dimensional data, Semi-structured data and the web ● 3 scientific methods courses ● 1 short taster project [6 weeks] ● creativity workshops ■ 3,5 years of PhD research 15/02/2012, Michele Filannino 3 / 23
  4. 4. presentation my research taster project where we are ■ Computer science ● natural language processing ▶ information retrieval ★ information extraction ✦ temporal expressions extraction 15/02/2012, Michele Filannino 4 / 23
  5. 5. presentation my research taster project or... ■ Computer science ● data mining ▶ text mining ★ information extraction ✦ temporal expressions extraction 15/02/2012, Michele Filannino 5 / 23
  6. 6. presentation my research taster project temporal expression ■ natural language phrase that denotes a temporal entity: an interval or an instant1 ● fully-qualified: no reference to any other temporal entity ▶ March 15, 2001 ● deictic: reference to the time of utterance ▶ today, yesterday, three weeks ago, last Thursday ● anaphoric: reference to a timex2 previously evoked in the text ▶ March 15, the next week, Saturday, at that time 1 L.Ferro, I. Mani, B. Sundheim, and G. Wilson, “Tides temporal annotation guidelines, v. 1.0.2,” MITRE, 2001 15/02/2012, Michele Filannino 6 / 23 2 timex temporal expression
  7. 7. presentation my research taster project why? ■ user’s perspective ● temporal aspects of events and entities provide a natural mechanism for organising information. ■ machine’s perspective ● improvements in ▶ question answering, summarisation, browsing 15/02/2012, Michele Filannino 7 / 23
  8. 8. presentation my research taster project how? ■ annotation ● recognition ▶ automatically detect and delimitate expressions ▶ mostly machine-learning techniques ● normalisation ▶ assign attributes values for all the recognised expressions ▶ using a shared and formal format (standard?) ▶ mostly rule-based techniques ■ reasoning or searching 15/02/2012, Michele Filannino 8 / 23
  9. 9. presentation my research taster project timex forms1 ■ time or date references ● 11pm, February 14th, 2005 ■ time references that anchor on another time ● one hour after midnight, two weeks before Christmas ■ durations ● few months, two days, five years ■ recurring times ● every third month, twice in the hour 1 J.Poveda, M. Surdeanu, and J. Turmo, “An analysis of Bootstrapping for the Recognition of Temporal Expressions”, 2009 15/02/2012, Michele Filannino 9 / 23
  10. 10. presentation my research taster project timex forms1 ■ context-dependent times ● today, last year ■ vague references ● somewhere in the middle of June, the near future ■ times indicated by an event ● the day S. Berlusconi resigned ▶ an event is considered a cover term for situations that happen or occur 1 J.Poveda, M. Surdeanu, and J. Turmo, “An analysis of Bootstrapping for the Recognition of Temporal Expressions”, 2009 15/02/2012, Michele Filannino 10 / 23
  11. 11. presentation my research taster project timeline ACE-2004 dev & eval TempEval Task#15 TempEval-3 Task#1 (TERN2004 corpus) (in SemEval07) (in SemEval13) TimeML TempEval-2 Task#13 (standard) (in SemEval10) 85%1 87.8%1 90.7%1 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 TimeBank SVM Conditional Random Fields (corpus) (machine learning) (machine learning) Hand grammar approach Maximum Entropy Class. Markov logic network (rule-based) (machine learning) (machine learning) 1 TERN2004 corpus 15/02/2012, Michele Filannino 11 / 23
  12. 12. presentation my research taster project standards ■ “the nice thing about standards is, there are so many to choose from” by Andrew S. Tanenbaum ● TimeML ● DAML-Time ● TIDES ● ACE-TERN 15/02/2012, Michele Filannino 12 / 23
  13. 13. presentation my research taster project standards ■ there’s a tension between ● flexibility and efficiency ● usability and flexibility ● complexity and spreadability ● flexibility and agreement 15/02/2012, Michele Filannino 13 / 23
  14. 14. presentation my research taster project about the spreadability 15/02/2012, Michele Filannino 14 / 23
  15. 15. presentation my research taster project about the agreement TimeML Tag agreement TIMEX3 0.83 SIGNAL 0.77 EVENT 0.78 ALINK 0.81 SLINK 0.85 TLINK 0.55 Source: http://timeml.org/site/timebank/documentation-1.2.html 15/02/2012, Michele Filannino 15 / 23
  16. 16. presentation my research taster project example: raw text That means Unisys must pay about $100 million in interest every quarter, on top of $27 million in dividends on preferred stock. Source: TRIOS TimeBank v.0.1 15/02/2012, Michele Filannino 16 / 23
  17. 17. presentation my research taster project example: recognition That means Unisys must <ev>pay</ev> about $100 million in interest <te>every quarter</te>, on top of $27 million in dividends on preferred stock. Source: TRIOS TimeBank v.0.1 15/02/2012, Michele Filannino 17 / 23
  18. 18. presentation my research taster project example: normalisation That means Unisys must <EVENT eid="e110" mainevent="YES" class="OCCURRENCE" stem="pay" tense="NONE" aspect="NONE" polarity="POS" pos="VERB">pay</EVENT> about $100 million in interest <TIMEX3 tid="t256" type="SET" value="P1Q" temporalFunction="false" functionInDocument="NONE" quant="every">every quarter</TIMEX3>, on top of $27 million in dividends on preferred stock. <TLINK lid="l32" relType="BEFORE" relatedToEvent="e110" eventID="e107"/> <TLINK lid="l26" relType="OVERLAP" eventID="e110" relatedToTime="t256"/> Source: TRIOS TimeBank v.0.1 15/02/2012, Michele Filannino 18 / 23
  19. 19. presentation my research taster project considerations ■ specialised linguistic approaches do not pay ● machine learning techniques usually perform better ■ scarcity of pre-annotated corpus ● manual corpus annotation is very tricky ● partially solved with TempEval-3 (2013) ▶ 1M words corpus automatically annotated by TRIOS ■ vibrant area in bio-medical domain 15/02/2012, Michele Filannino 19 / 23
  20. 20. presentation my research taster project “temporal expressions” “temporal expressions” AND “clinical” 500 450 44 42 41 45 400 46 36 350 22 300 15 250 15 16 433 200 410 410 412 10 12 382 370 150 310 280 220 230 100 182 180 50 9 33 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Source: Google Scholar (last update 09/02/2012) 15/02/2012, Michele Filannino 20 / 23
  21. 21. presentation my research taster project “temporal expressions” “temporal expressions” AND “clinical” 100% 5% 6% 6% 5% 7% 7% 9% 9% 9% 10% 9% 11% 90% 80% 21% 70% 60% 50% 95% 94% 93% 94% 95% 93% 91% 91% 91% 90% 91% 89% 40% 79% 30% 20% 10% 0% 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Source: Google Scholar (last update 09/02/2012) 15/02/2012, Michele Filannino 21 / 23
  22. 22. presentation my research taster project considerations ■ rule-based approach will never die ● CRF and MLN are machine learning hybridisation ■ better performance means clever decomposition ● how to divide the general problem into sub-problems 15/02/2012, Michele Filannino 22 / 23
  23. 23. presentation my research taster project my to-do list ■ collect some corpus in clinical field ■ study novel machine learning approaches ● maximum likelihood, logistic regression, CRF, MLN ■ implement a prototype ● Python or MATLAB 12 days elapsed 18 days remaining 0 3 6 9 12 15 18 21 24 27 30 15/02/2012, Michele Filannino 23 / 23
  24. 24. Thank you.

×