Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing 
Pierpaolo Basile, Annalina Caputo and Giovanni Se...
Marty, in 2015 people will surf on the web!!! 
Surf!?!?! On the web!?!?!?
Distributional Semantic Models (DSM) 
•Analysis of word-usage statistics over huge corpora 
•Geometrical space of concepts...
DSM issue… 
Corpus 
DSM 
Word Space 
Word Space is a snapshot of word 
co-occurrences over a linguistic corpus
…DSM issue… 
Corpus1 
Word Space1 
Corpus2 
Word Space2 
Corpus3 
Word Space3 
Corpus4 
Word Space4 
Difficult to compare ...
Temporal DSM 
Corpus1900 
Word Space1 
Corpus1920 
Word Space2 
Corpus1930 
Word Space3 
Corpus1940 
Word Space4 
Each cor...
Temporal Random Indexing (TRI) 
Corpus1900 
RI Space1 
Corpus1920 
RI 
Space2 
Corpus1930 
RI 
Space3 
Corpus1940 
RI 
Spa...
Random Indexing 
Corpus 
Vocabulary 
Assign a Random Vector ci to each term 
ci  <1, 0, 0, 0, -1, 0, 1, 0, - 1, 0, 0, 0, ...
Temporal Random Indexing (TRI)… 
Corpus 
T1-T2 
Vocabulary 
Assign a Random Vector ci to each term 
RI 
Space T1-T2 
Corpu...
…Temporal Random Indexing 
•For each Word Space Tk-Tk+1 
–sum random vectors taking into account only documents dk in the ...
TRI System 
•TRI system1 performs Temporal Random Indexing 
1.Build a RI space for each year 
2.Merge RI spaces to create ...
Evaluation 
•Two case studies 
1.Project Gutenberg (PG): 349 Italian books from 1810 to 1922 
2.ACL Anthology Network data...
PG Dataset 
•Dataset split in two time periods 
–Pre 1900 
–Post 1900 
•Analyze the neighbourhood: “patria” 
•Semantic shi...
PG: neighbourhood of “patria” 
Pre 1900 
Post 1900 
Libertà 
Libertà 
Opera 
Gloria 
Pari 
Giustizia 
Comune 
Comune 
Glor...
PG: semantic shift of “cinematografo” 
Tpost 1900: “cinematografo” strongly related to “sonoro” 
푠푖푚(푠푣푐푖푛푒푚푎푡표푔푟푎푓표,푇푝푟푒1...
ANN Dataset 
•Dataset split in decades 
•Analyze the neighbourhood: “semantics” 
•Semantic shift: “bioscience”, “unsupervi...
ANN: neighbourhood of “semantics” 
1960-1969 
1970-1979 
1980-1989 
1990-1999 
2000-2009 
2010-2014 
linguistic 
natural 
...
ANN: neighbourhood of “semantics” 
1960-1969 
1970-1979 
1980-1989 
1990-1999 
2000-2009 
2010-2014 
linguistic 
natural 
...
ANN: neighbourhood of “semantics” 
1960-1969 
1970-1979 
1980-1989 
1990-1999 
2000-2009 
2010-2014 
linguistic 
natural 
...
ANN: neighbourhood of “semantics” 
1960-1969 
1970-1979 
1980-1989 
1990-1999 
2000-2009 
2010-2014 
linguistic 
natural 
...
…ANN: semantic shift “bioscience” 
bioscience 
extraterrestrial, extrasolar 
medline, bionlp, biomedi 
before 1990 
nowada...
…ANN: semantic shift “unsupervised” 
unsupervised 
observe, partition, selective 
supervised, disambiguation, probabilisti...
Conclusions 
•Temporal Random Indexing 
–build Word Spaces taking into account information about time 
–developed and publ...
That’s all folks! 
https://github.com/pippokill/tri
Upcoming SlideShare
Loading in …5
×

1

Share

Download to read offline

Analysing Word Meaning over Time by Exploiting Temporal Random Indexing

Download to read offline

This work proposes an approach to the construction of WordSpaces which takes into account temporal information. The proposed method is able to build a geometrical space considering several periods of time. This methodology enables the analysis of the time evolution of the meaning of a word. Exploiting this approach, we build a framework, called Temporal Random Indexing (TRI) that provides all the necessary tools for building WordSpaces and performing such linguistic analysis. We propose some examples of usage of our tool by analysing word meanings in two corpora: a collection of Italian books and English scientific papers about computational linguistics.
http://clic.humnet.unipi.it/proceedings/Proceedings-CLICit-2014.pdf

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Analysing Word Meaning over Time by Exploiting Temporal Random Indexing

  1. 1. Analysing Word Meaning over Time by Exploiting Temporal Random Indexing Pierpaolo Basile, Annalina Caputo and Giovanni Semeraro Department of Computer Science – University of Bari Aldo Moro pierpaolo.basile@uniba.it Prima conferenza italiana di Linguistica Computazionale Pisa 9-10 Dicembre 2014
  2. 2. Marty, in 2015 people will surf on the web!!! Surf!?!?! On the web!?!?!?
  3. 3. Distributional Semantic Models (DSM) •Analysis of word-usage statistics over huge corpora •Geometrical space of concepts •Similar words are represented close in the space
  4. 4. DSM issue… Corpus DSM Word Space Word Space is a snapshot of word co-occurrences over a linguistic corpus
  5. 5. …DSM issue… Corpus1 Word Space1 Corpus2 Word Space2 Corpus3 Word Space3 Corpus4 Word Space4 Difficult to compare Word Spaces built on different corpora
  6. 6. Temporal DSM Corpus1900 Word Space1 Corpus1920 Word Space2 Corpus1930 Word Space3 Corpus1940 Word Space4 Each corpus contains documents of a specific time period… …Word Spaces are still not comparable
  7. 7. Temporal Random Indexing (TRI) Corpus1900 RI Space1 Corpus1920 RI Space2 Corpus1930 RI Space3 Corpus1940 RI Space4 Words (vectors) in different Word Spaces are comparable… …comparison on different time periods is possible!
  8. 8. Random Indexing Corpus Vocabulary Assign a Random Vector ci to each term ci  <1, 0, 0, 0, -1, 0, 1, 0, - 1, 0, 0, 0, 0, 0, 0> RI Space A word vector svi is the sum of random vectors assigned to the co-occurring words 푠푣푖= 푐푖 −푚<푖<+푚푑∈퐶 Co-occurring words are defined as the set of m words that precede and follow wi
  9. 9. Temporal Random Indexing (TRI)… Corpus T1-T2 Vocabulary Assign a Random Vector ci to each term RI Space T1-T2 Corpus T2-T3 Corpus T3-T4 Corpus Tn-1-Tn … RI Space T2-T3 RI Space T3-T4 RI Space Tn-1-Tn …
  10. 10. …Temporal Random Indexing •For each Word Space Tk-Tk+1 –sum random vectors taking into account only documents dk in the period Tk-Tk+1 •A word wi has several semantic vectors (sv): one for each time period •Vectors are comparable 푠푣푖,푇푘= 푐푖 −푚<푖<+푚푑푘∈퐶
  11. 11. TRI System •TRI system1 performs Temporal Random Indexing 1.Build a RI space for each year 2.Merge RI spaces to create new time periods 3.Load RI space and fetch vectors 4.Combine vectors 5.Retrieve similar vectors 6.Extract and compare neighbourhood of words 1https://github.com/pippokill/tri
  12. 12. Evaluation •Two case studies 1.Project Gutenberg (PG): 349 Italian books from 1810 to 1922 2.ACL Anthology Network dataset (ANN): 21,212 papers published by ACL from 1960 to 2014 •Goals –Neighborhood: analyze the neighborhood of a word –Semantic shift: analyze words that clearly change their semantics
  13. 13. PG Dataset •Dataset split in two time periods –Pre 1900 –Post 1900 •Analyze the neighbourhood: “patria” •Semantic shift: “cinematografo”
  14. 14. PG: neighbourhood of “patria” Pre 1900 Post 1900 Libertà Libertà Opera Gloria Pari Giustizia Comune Comune Gloria Legge Nostra Pari Causa Virtù Italia Onore Giustizia Opera Guerra Popolo
  15. 15. PG: semantic shift of “cinematografo” Tpost 1900: “cinematografo” strongly related to “sonoro” 푠푖푚(푠푣푐푖푛푒푚푎푡표푔푟푎푓표,푇푝푟푒1900,푠푣푐푖푛푒푚푎푡표푔푟푎푓표,푇푝표푠푡1900)=0.4
  16. 16. ANN Dataset •Dataset split in decades •Analyze the neighbourhood: “semantics” •Semantic shift: “bioscience”, “unsupervised”
  17. 17. ANN: neighbourhood of “semantics” 1960-1969 1970-1979 1980-1989 1990-1999 2000-2009 2010-2014 linguistic natural syntax syntax syntax syntax theory linguistic natural theory theory theory semantic semantic general interpretation interpretation interpretation syntactic theory theory general description description natural syntax semantic linguistic meaning complex linguistic language syntactic description linguistic meaning distributional processing linguistic complex logical linguistic process syntactic interpretation natural complex logical computational description model representation representation structures syntax analysis description logical structures representation
  18. 18. ANN: neighbourhood of “semantics” 1960-1969 1970-1979 1980-1989 1990-1999 2000-2009 2010-2014 linguistic natural syntax syntax syntax syntax theory linguistic natural theory theory theory semantic semantic general interpretation interpretation interpretation syntactic theory theory general description description natural syntax semantic linguistic meaning complex linguistic language syntactic description linguistic meaning distributional processing linguistic complex logical linguistic process syntactic interpretation natural complex logical computational description model representation representation structures syntax analysis description logical structures representation
  19. 19. ANN: neighbourhood of “semantics” 1960-1969 1970-1979 1980-1989 1990-1999 2000-2009 2010-2014 linguistic natural syntax syntax syntax syntax theory linguistic natural theory theory theory semantic semantic general interpretation interpretation interpretation syntactic theory theory general description description natural syntax semantic linguistic meaning complex linguistic language syntactic description linguistic meaning distributional processing linguistic complex logical linguistic process syntactic interpretation natural complex logical computational description model representation representation structures syntax analysis description logical structures representation
  20. 20. ANN: neighbourhood of “semantics” 1960-1969 1970-1979 1980-1989 1990-1999 2000-2009 2010-2014 linguistic natural syntax syntax syntax syntax theory linguistic natural theory theory theory semantic semantic general interpretation interpretation interpretation syntactic theory theory general description description natural syntax semantic linguistic meaning complex linguistic language syntactic description linguistic meaning distributional processing linguistic complex logical linguistic process syntactic interpretation natural complex logical computational description model representation representation structures syntax analysis description logical structures representation
  21. 21. …ANN: semantic shift “bioscience” bioscience extraterrestrial, extrasolar medline, bionlp, biomedi before 1990 nowadays
  22. 22. …ANN: semantic shift “unsupervised” unsupervised observe, partition, selective supervised, disambiguation, probabilistic, algorithms, statistical before 1990 nowadays
  23. 23. Conclusions •Temporal Random Indexing –build Word Spaces taking into account information about time –developed and published an open-source framework •Potentiality of our framework –capture word usage changes over time
  24. 24. That’s all folks! https://github.com/pippokill/tri
  • puria1

    Dec. 10, 2014

This work proposes an approach to the construction of WordSpaces which takes into account temporal information. The proposed method is able to build a geometrical space considering several periods of time. This methodology enables the analysis of the time evolution of the meaning of a word. Exploiting this approach, we build a framework, called Temporal Random Indexing (TRI) that provides all the necessary tools for building WordSpaces and performing such linguistic analysis. We propose some examples of usage of our tool by analysing word meanings in two corpora: a collection of Italian books and English scientific papers about computational linguistics. http://clic.humnet.unipi.it/proceedings/Proceedings-CLICit-2014.pdf

Views

Total views

949

On Slideshare

0

From embeds

0

Number of embeds

2

Actions

Downloads

12

Shares

0

Comments

0

Likes

1

×