Mining temporal footprints from Wikipedia

621 views

Published on

Discovery of temporal information is key for organising knowledge and therefore the task of extracting and representing temporal information from texts has received an increasing interest. In this paper we focus on the discovery of temporal footprints from encyclopaedic descriptions. Temporal footprints are time-line periods that are associated to the existence of specific concepts. Our approach relies on the extraction of date mentions and prediction of lower and upper bound- aries that define temporal footprints. We report on several experiments on persons’ pages from Wikipedia in order to illustrate the feasibility of the proposed methods.

Published in: Science
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
621
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Mining temporal footprints from Wikipedia

  1. 1. filannim@cs.man.ac.uk School of Computer Science presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 Mining temporal footprints from Wikipedia Michele Filannino, Goran Nenadic
  2. 2. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 introduction ■ Temporal information is crucial for organising structured and unstructured data ■ Several temporal information extraction (TIE) systems are nowadays available ● thanks to TempEval challenge series 2
  3. 3. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 ManTIME URL: http://www.cs.man.ac.uk/~filannim/mantime.html 3
  4. 4. Test with long text 4 / 23
  5. 5. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 temporal footprint A temporal footprint is a continuous period on the time-line that temporally defines the existence of a particular concept. Immanuel Kant, Paul Guyer, and Allen W Wood. 1998. Critique of pure reason. Cambridge University Press. 5
  6. 6. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 problem Can we predict temporal footprints from encyclopaedic descriptions of concepts? ■ input: textual description of a concept ■ output: prediction of a temporal interval
  7. 7. Web Cellphone Computer Car Richard Feynman Bicycle Carl Friedrich Gauss French revolution Age of Enlightenment Galileo Galilei Leonardo Da Vinci Christopher Columbus Renaissance Arming sword High Middle Ages Gengis Khan 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 Object Person Historical period Examples of temporal footprints 7 / 23
  8. 8. 8 / 23
  9. 9. 8 / 23
  10. 10. 8 / 23
  11. 11. 8 / 23
  12. 12. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 methodology 1. date mention extraction 2. outlier filtering 3. normal distribution fitting 4. prediction 9
  13. 13. presentation 1st AHA! Workshop, COLING 2014 1360 1410 1460 1510 1560 1610 1660 1710 1760 1810 Dublin, 23/08/2014 / 25 date mentions extraction 0.050 0.038 freq 0.025 0.013 0.000 time (in years) 10
  14. 14. presentation 1st AHA! Workshop, COLING 2014 outlier filtering γ param. 1360 1410 1460 1510 1560 1610 1660 1710 1760 1810 Dublin, 23/08/2014 / 25 freq 0.050 0.038 0.025 0.013 0.000 time (in years) Gamma parameter controls the outlier region’s boundaries. 11
  15. 15. presentation 1st AHA! Workshop, COLING 2014 normal distribution fitting 0.050 0.038 freq 1360 1410 1460 1510 1560 1610 1660 1710 1760 1810 12 Dublin, 23/08/2014 / 25 0.025 0.013 0.000 time (in years) Alpha and Beta parameters control the size and offset of the gaussian bell. α param.
  16. 16. presentation 1st AHA! Workshop, COLING 2014 normal distribution fitting 0.050 0.038 freq 1360 1410 1460 1510 1560 1610 1660 1710 1760 1810 12 Dublin, 23/08/2014 / 25 0.025 0.013 0.000 time (in years) Alpha and Beta parameters control the size and offset of the gaussian bell. α param.
  17. 17. presentation 1st AHA! Workshop, COLING 2014 normal distribution fitting β param. 1360 1410 1460 1510 1560 1610 1660 1710 1760 1810 Dublin, 23/08/2014 / 25 freq 0.050 0.038 0.025 0.013 0.000 time (in years) Alpha and Beta parameters control the size and offset of the gaussian bell. 13
  18. 18. presentation 1st AHA! Workshop, COLING 2014 normal distribution fitting β param. 1360 1410 1460 1510 1560 1610 1660 1710 1760 1810 Dublin, 23/08/2014 / 25 freq 0.050 0.038 0.025 0.013 0.000 time (in years) Alpha and Beta parameters control the size and offset of the gaussian bell. 13
  19. 19. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 error measure gold prediction Fatima De Carvalho. 1996. Histogrammes et indices de proximite ́en analyse donne és symboliques. Acyes de l’e ćole d’e t́e ́sur l’analyse des donne és symboliques. LISE-CEREMADE, Universite ́de Paris IX Dauphine, pages 101–127. 14 union overlap
  20. 20. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 error measure Fatima De Carvalho. 1996. Histogrammes et indices de proximite ́en analyse donne és symboliques. Acyes de l’e ćole d’e t́e ́sur l’analyse des donne és symboliques. LISE-CEREMADE, Universite ́de Paris IX Dauphine, pages 101–127. 15 union gold prediction
  21. 21. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 strategies A. RegEx B. RegEx + Filtering C. RegEx + Filtering + Gaussian fitting D. HeidelTime + Filtering + Gaussian fitting 16
  22. 22. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 evaluation ■ subject: people ■ lived from 1000 AD to 2014 ● text from Wikipedia web pages ● year of birth and death from DBpedia ■ 228,824 people collected ■ simple definition of temporal footprint ● birth and death dates 17
  23. 23. #people 500 400 300 200 100 0 0 250 500 750 1000 1250 1500 1750 2000 2250 2500 2750 3000 3250 3500 3750 #words People per textual length 1 8 / 23
  24. 24. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 aggregate results 19 Strategy Mean Distance Error Standard Deviation RegEx 0.2636 0.3409 RegEx + Filtering 0.2596 0.3090 RegEx + Filtering + Gaussian fitting 0.3503 0.2430 HeidelTime + Filtering + Gaussian fitting 0.5980 0.2470
  25. 25. presentation 1st AHA! Workshop, COLING 2014 1112 3336 5560 7785 10009 12233 14458 16682 18906 21131 23355 25579 27804 Dublin, 23/08/2014 / 25 results 1.0 0.8 0.6 MDE 0.4 0.2 0.0 #words 20 RegEx RegEx + Filtering HeidelTime + Filtering + Gaussian fitting RegEx + Filtering + Gaussian fitting
  26. 26. presentation 1st AHA! Workshop, COLING 2014 1112 3336 5560 7785 10009 12233 14458 16682 18906 21131 23355 25579 27804 Dublin, 23/08/2014 / 25 results 1.0 0.8 0.6 MDE 0.4 0.2 0.0 #words 20 RegEx RegEx + Filtering HeidelTime + Filtering + Gaussian fitting RegEx + Filtering + Gaussian fitting
  27. 27. presentation 1st AHA! Workshop, COLING 2014 results ■ Galileo Galilei (1564-1642), prediction: 1556-1654 Dublin, 23/08/2014 / 25 E: 0.204 21
  28. 28. presentation 1st AHA! Workshop, COLING 2014 results ■ Robin Williams (1951 - 2014), prediction: 1953-2006 Dublin, 23/08/2014 / 25 E: 0.159 22
  29. 29. presentation 1st AHA! Workshop, COLING 2014 other types of temporal footprint? ■ Christopher Columbus will die in 2057 ?! Dublin, 23/08/2014 / 25 Prediction: 1366-2057 (1451-1506), E: 0.92 23
  30. 30. presentation 1st AHA! Workshop, COLING 2014 other types of temporal footprint? ■ Christopher Columbus will die in 2057 ?! Dublin, 23/08/2014 / 25 Prediction: 1366-2057 (1451-1506), E: 0.92 23
  31. 31. presentation 1st AHA! Workshop, COLING 2014 other types of temporal footprint? ■ Christopher Columbus will die in 2057 ?! Dublin, 23/08/2014 / 25 Prediction: 1366-2057 (1451-1506), E: 0.92 23
  32. 32. presentation 1st AHA! Workshop, COLING 2014 other types of temporal footprint? ■ Christopher Columbus will die in 2057 ?! Dublin, 23/08/2014 / 25 Prediction: 1366-2057 (1451-1506), E: 0.92 23 AHA!
  33. 33. presentation 1st AHA! Workshop, COLING 2014 physical existence vs. social coverage ■ Anne Frank’s footprint is shifted in the future 24 Dublin, 23/08/2014 / 25
  34. 34. presentation 1st AHA! Workshop, COLING 2014 physical existence vs. social coverage ■ Anne Frank’s footprint is shifted in the future 24 Dublin, 23/08/2014 / 25
  35. 35. presentation 1st AHA! Workshop, COLING 2014 physical existence vs. social coverage ■ Anne Frank’s footprint is shifted in the future 24 Dublin, 23/08/2014 / 25
  36. 36. presentation 1st AHA! Workshop, COLING 2014 conclusions ■ how the methodology behaves on different Dublin, 23/08/2014 / 25 languages? how on different sources? ■ oracle-like side-effect behaviour: • Apple Inc. will be closed down this year • Stanford University will be closed down in 2029 ■ Future works • mixture of normal distributions 25
  37. 37. Thank you.
  38. 38. ? QUESTIONS Contact: filannim@cs.man.ac.uk ! Visit: tinyurl.com/temporal-footprints

×