Your SlideShare is downloading. ×
Mining temporal footprints from Wikipedia
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Mining temporal footprints from Wikipedia

115
views

Published on

Discovery of temporal information is key for organising knowledge and therefore the task of extracting and representing temporal information from texts has received an increasing interest. In this …

Discovery of temporal information is key for organising knowledge and therefore the task of extracting and representing temporal information from texts has received an increasing interest. In this paper we focus on the discovery of temporal footprints from encyclopaedic descriptions. Temporal footprints are time-line periods that are associated to the existence of specific concepts. Our approach relies on the extraction of date mentions and prediction of lower and upper bound- aries that define temporal footprints. We report on several experiments on persons’ pages from Wikipedia in order to illustrate the feasibility of the proposed methods.

Published in: Science

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
115
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. filannim@cs.man.ac.uk School of Computer Science presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 Mining temporal footprints from Wikipedia Michele Filannino, Goran Nenadic
  • 2. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 introduction ■ Temporal information is crucial for organising structured and unstructured data ■ Several temporal information extraction (TIE) systems are nowadays available ● thanks to TempEval challenge series 2
  • 3. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 ManTIME URL: http://www.cs.man.ac.uk/~filannim/mantime.html 3
  • 4. Test with long text 4 / 23
  • 5. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 temporal footprint A temporal footprint is a continuous period on the time-line that temporally defines the existence of a particular concept. Immanuel Kant, Paul Guyer, and Allen W Wood. 1998. Critique of pure reason. Cambridge University Press. 5
  • 6. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 problem Can we predict temporal footprints from encyclopaedic descriptions of concepts? ■ input: textual description of a concept ■ output: prediction of a temporal interval
  • 7. Web Cellphone Computer Car Richard Feynman Bicycle Carl Friedrich Gauss French revolution Age of Enlightenment Galileo Galilei Leonardo Da Vinci Christopher Columbus Renaissance Arming sword High Middle Ages Gengis Khan 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 Object Person Historical period Examples of temporal footprints 7 / 23
  • 8. 8 / 23
  • 9. 8 / 23
  • 10. 8 / 23
  • 11. 8 / 23
  • 12. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 methodology 1. date mention extraction 2. outlier filtering 3. normal distribution fitting 4. prediction 9
  • 13. presentation 1st AHA! Workshop, COLING 2014 1360 1410 1460 1510 1560 1610 1660 1710 1760 1810 Dublin, 23/08/2014 / 25 date mentions extraction 0.050 0.038 freq 0.025 0.013 0.000 time (in years) 10
  • 14. presentation 1st AHA! Workshop, COLING 2014 outlier filtering γ param. 1360 1410 1460 1510 1560 1610 1660 1710 1760 1810 Dublin, 23/08/2014 / 25 freq 0.050 0.038 0.025 0.013 0.000 time (in years) Gamma parameter controls the outlier region’s boundaries. 11
  • 15. presentation 1st AHA! Workshop, COLING 2014 normal distribution fitting 0.050 0.038 freq 1360 1410 1460 1510 1560 1610 1660 1710 1760 1810 12 Dublin, 23/08/2014 / 25 0.025 0.013 0.000 time (in years) Alpha and Beta parameters control the size and offset of the gaussian bell. α param.
  • 16. presentation 1st AHA! Workshop, COLING 2014 normal distribution fitting 0.050 0.038 freq 1360 1410 1460 1510 1560 1610 1660 1710 1760 1810 12 Dublin, 23/08/2014 / 25 0.025 0.013 0.000 time (in years) Alpha and Beta parameters control the size and offset of the gaussian bell. α param.
  • 17. presentation 1st AHA! Workshop, COLING 2014 normal distribution fitting β param. 1360 1410 1460 1510 1560 1610 1660 1710 1760 1810 Dublin, 23/08/2014 / 25 freq 0.050 0.038 0.025 0.013 0.000 time (in years) Alpha and Beta parameters control the size and offset of the gaussian bell. 13
  • 18. presentation 1st AHA! Workshop, COLING 2014 normal distribution fitting β param. 1360 1410 1460 1510 1560 1610 1660 1710 1760 1810 Dublin, 23/08/2014 / 25 freq 0.050 0.038 0.025 0.013 0.000 time (in years) Alpha and Beta parameters control the size and offset of the gaussian bell. 13
  • 19. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 error measure gold prediction Fatima De Carvalho. 1996. Histogrammes et indices de proximite ́en analyse donne és symboliques. Acyes de l’e ćole d’e t́e ́sur l’analyse des donne és symboliques. LISE-CEREMADE, Universite ́de Paris IX Dauphine, pages 101–127. 14 union overlap
  • 20. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 error measure Fatima De Carvalho. 1996. Histogrammes et indices de proximite ́en analyse donne és symboliques. Acyes de l’e ćole d’e t́e ́sur l’analyse des donne és symboliques. LISE-CEREMADE, Universite ́de Paris IX Dauphine, pages 101–127. 15 union gold prediction
  • 21. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 strategies A. RegEx B. RegEx + Filtering C. RegEx + Filtering + Gaussian fitting D. HeidelTime + Filtering + Gaussian fitting 16
  • 22. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 evaluation ■ subject: people ■ lived from 1000 AD to 2014 ● text from Wikipedia web pages ● year of birth and death from DBpedia ■ 228,824 people collected ■ simple definition of temporal footprint ● birth and death dates 17
  • 23. #people 500 400 300 200 100 0 0 250 500 750 1000 1250 1500 1750 2000 2250 2500 2750 3000 3250 3500 3750 #words People per textual length 1 8 / 23
  • 24. presentation 1st AHA! Workshop, COLING 2014 Dublin, 23/08/2014 / 25 aggregate results 19 Strategy Mean Distance Error Standard Deviation RegEx 0.2636 0.3409 RegEx + Filtering 0.2596 0.3090 RegEx + Filtering + Gaussian fitting 0.3503 0.2430 HeidelTime + Filtering + Gaussian fitting 0.5980 0.2470
  • 25. presentation 1st AHA! Workshop, COLING 2014 1112 3336 5560 7785 10009 12233 14458 16682 18906 21131 23355 25579 27804 Dublin, 23/08/2014 / 25 results 1.0 0.8 0.6 MDE 0.4 0.2 0.0 #words 20 RegEx RegEx + Filtering HeidelTime + Filtering + Gaussian fitting RegEx + Filtering + Gaussian fitting
  • 26. presentation 1st AHA! Workshop, COLING 2014 1112 3336 5560 7785 10009 12233 14458 16682 18906 21131 23355 25579 27804 Dublin, 23/08/2014 / 25 results 1.0 0.8 0.6 MDE 0.4 0.2 0.0 #words 20 RegEx RegEx + Filtering HeidelTime + Filtering + Gaussian fitting RegEx + Filtering + Gaussian fitting
  • 27. presentation 1st AHA! Workshop, COLING 2014 results ■ Galileo Galilei (1564-1642), prediction: 1556-1654 Dublin, 23/08/2014 / 25 E: 0.204 21
  • 28. presentation 1st AHA! Workshop, COLING 2014 results ■ Robin Williams (1951 - 2014), prediction: 1953-2006 Dublin, 23/08/2014 / 25 E: 0.159 22
  • 29. presentation 1st AHA! Workshop, COLING 2014 other types of temporal footprint? ■ Christopher Columbus will die in 2057 ?! Dublin, 23/08/2014 / 25 Prediction: 1366-2057 (1451-1506), E: 0.92 23
  • 30. presentation 1st AHA! Workshop, COLING 2014 other types of temporal footprint? ■ Christopher Columbus will die in 2057 ?! Dublin, 23/08/2014 / 25 Prediction: 1366-2057 (1451-1506), E: 0.92 23
  • 31. presentation 1st AHA! Workshop, COLING 2014 other types of temporal footprint? ■ Christopher Columbus will die in 2057 ?! Dublin, 23/08/2014 / 25 Prediction: 1366-2057 (1451-1506), E: 0.92 23
  • 32. presentation 1st AHA! Workshop, COLING 2014 other types of temporal footprint? ■ Christopher Columbus will die in 2057 ?! Dublin, 23/08/2014 / 25 Prediction: 1366-2057 (1451-1506), E: 0.92 23 AHA!
  • 33. presentation 1st AHA! Workshop, COLING 2014 physical existence vs. social coverage ■ Anne Frank’s footprint is shifted in the future 24 Dublin, 23/08/2014 / 25
  • 34. presentation 1st AHA! Workshop, COLING 2014 physical existence vs. social coverage ■ Anne Frank’s footprint is shifted in the future 24 Dublin, 23/08/2014 / 25
  • 35. presentation 1st AHA! Workshop, COLING 2014 physical existence vs. social coverage ■ Anne Frank’s footprint is shifted in the future 24 Dublin, 23/08/2014 / 25
  • 36. presentation 1st AHA! Workshop, COLING 2014 conclusions ■ how the methodology behaves on different Dublin, 23/08/2014 / 25 languages? how on different sources? ■ oracle-like side-effect behaviour: • Apple Inc. will be closed down this year • Stanford University will be closed down in 2029 ■ Future works • mixture of normal distributions 25
  • 37. Thank you.
  • 38. ? QUESTIONS Contact: filannim@cs.man.ac.uk ! Visit: tinyurl.com/temporal-footprints