Ekaterina Vylomova/Brown Bag seminar presentation

  • 103 views
Uploaded on

Associative thesari, Russian Associative Thesauri

Associative thesari, Russian Associative Thesauri

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
103
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Associative thesauri: structure and analysis Brown bag seminar Ekaterina Vylomova Fulbright scholar at Montclair State University February 21, 2014 E. Vylomova Associative thesauri
  • 2. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Brief bio Brief Bio 2011: MSc, Bauman Moscow State Technical University E. Vylomova Associative thesauri
  • 3. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Brief bio Brief Bio 2011: MSc, Bauman Moscow State Technical University 2009: BSc, Bauman Moscow State Technical University E. Vylomova Associative thesauri
  • 4. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Brief bio Brief Bio 2011: MSc, Bauman Moscow State Technical University 2009: BSc, Bauman Moscow State Technical University 2009: Yandex School of Data Analysis (Moscow Institute of Physics & Technology) E. Vylomova Associative thesauri
  • 5. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work What's AE? Associative Experiments What's AE? Associative experiment is one of methods of psycholinguistics. It's based on method of free associations. Sir Francis Galton conducted the rst experiment in 1879. E. Vylomova Associative thesauri
  • 6. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work What's AE? Associative Experiments What's AE? Associative experiment is one of methods of psycholinguistics. It's based on method of free associations. Sir Francis Galton conducted the rst experiment in 1879. Types of AE Single Free Association Multiple Free Associations Single Controlled Association (synonym, noun, verb, hyponym, etc.) Multiple Controlled Associations E. Vylomova Associative thesauri
  • 7. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work What's associative thesaurus? Example of data AT for dierent languages Slavic Associative Thesauri What's associative thesaurus? E. Vylomova Associative thesauri
  • 8. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work What's associative thesaurus? Example of data AT for dierent languages Slavic Associative Thesauri Example of data EAT Word Associations CAT stimulated the following associations: DOG 49 0.52 MOUSE 8 0.08 BLACK 4 0.04 MAT 3 0.03 ANIMAL 2 0.02 EYES 2 0.02 GUT 2 0.02 KITTEN 2 0.02 E. Vylomova Associative thesauri
  • 9. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work What's associative thesaurus? Example of data AT for dierent languages Slavic Associative Thesauri AT for dierent languages English The Structure of Associations in Language and Thought (Deese, 1965) Word association (Cramer, 1968) An associative thesaurus of English and its computer analysis (Kiss et al., 1973) Word Association, rhyme and fragment norms (Nelson, McEvoy & Schreiber, 1999) E. Vylomova Associative thesauri
  • 10. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work What's associative thesaurus? Example of data AT for dierent languages Slavic Associative Thesauri AT for dierent languages Dutch Word association norms with response times (De Groot, 1988) Word associations: Norms for 1,424 Dutch words in a continuous task (De Deyne & Storms, 2008) Swedish A Swedish Associative Thesaurus (Lonngren, 1998) E. Vylomova Associative thesauri
  • 11. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work What's associative thesaurus? Example of data AT for dierent languages Slavic Associative Thesauri AT for dierent languages Japanese Construction of associative concept dictionary with distance information, and comparison with electronic concept dictionary (Okamoto & Ishizaki, 2001) Building a word association database for basic Japanese vocabulary (Joyce, 2005) Korean Network analysis of Korean Word Associations(Jung et al., 2010) E. Vylomova Associative thesauri
  • 12. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work What's associative thesaurus? Example of data AT for dierent languages Slavic Associative Thesauri AT for dierent languages Czech Volne slovni parove asociace v cestine (Novak, 1988) Hebrew Free association norms in the Hebrew language (Rubinsten, 2005) E. Vylomova Associative thesauri
  • 13. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work What's associative thesaurus? Example of data AT for dierent languages Slavic Associative Thesauri Slavic Associative Thesauri Dictionary of associative norms in Russian (Leontiev,1973) Russian Associative Thesaurus (Karaulov et al.,2002) Slavic Associative Thesaurus(Russian, Belorussian,Bulgarian, Ukrainian) (Umtseva et al., 2004) Normas asociativas del espanol y del ruso(Sanchez Puig,Karaulov,Cherkasova, 2000) E. Vylomova Associative thesauri
  • 14. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Data Research Russian associative experiment description Time frame: 1988-1998 Participants: 11,000 1st-3rd year students; 34 specialities Stimuli: 6,624(initial list: 1,277) Associative pairs:1,032,522 (dierent - 462,500) Reactions:102,926 Subset used for analysis Stimuli: 6,577 Reactions:21,312 Associative pairs:102,516 Dataset Set of triplets: < c , r , w >, where w = i j ij E. Vylomova ij freqij n 1 freqij Associative thesauri j= .
  • 15. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Data Research Comparison with frequency dictionary of Russian language Frequency dictionary Frequency dictionary of modern Russian language (Lyashevskaya, Sharov, 2009). Based on the texts from Russian National Corpus (www.ruscorpora.ru) and includes information about 20,000 most common words in Russian language. RAT Lemmatisation RAT->MyStem(Segalovich, 2003)->lemmas E. Vylomova Associative thesauri
  • 16. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Data Research Comparison with frequency dictionary of Russian language TOP-11 Nouns RAT FreqDict E. Vylomova Associative thesauri Human Home, House Money Day Friend Home Male Fool Business Life Illness Year Human Time Business Life Day Hand Work Word Place Friend
  • 17. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Data Research Comparison with frequency dictionary of Russian language Semantic primes? Concept "Human": "human "child "friend "male" Concept "Time": "day "time" Adjectives: "good "bad "big". These concepts don't change over the time. Positive correlation with semantic primes (Wierzbicka) E. Vylomova Associative thesauri
  • 18. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Description Associative Network based on RAT'98 Network analysis Description Nodes correspond to words(lemmas) Edges correspond to associations Edge's weight correspond to association strength E. Vylomova Associative thesauri
  • 19. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Description Associative Network based on RAT'98 Network analysis Main characteristics of the network Nodes: |V | = 23, 195, among them: nodes with outgoing edges(stimuli): |S | = 1, 883 nodes with incoming edges(reactions): |R | = 16, 618 nodes with both types of edges: |SR | = 4, 694 Edges: |E | = 102, 516 E. Vylomova Associative thesauri
  • 20. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Description Associative Network based on RAT'98 Network analysis Table of network characteristics Sign N L D <k> ψ Description Number of nodes average shortest path length Diameter Average node degree Degree distribution (P(k)) par. Directed 23,195 3.98 9 4.42 2.2 Directed to undirected w ij = wji = wij + wji Degree distribution function P (k ) ≈ k −ψ E. Vylomova Associative thesauri Undirected 23,195 3.83 8 8.83 1.85
  • 21. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Description Associative Network based on RAT'98 Network analysis Small-world networks Denition Introduced by Milgram, 1967 ("The small world problem") L ∝ log (N ),i.e. distance L between two randomly chosen nodes grows proportionally to the logarithm of the number of nodes N in the network Also known as "Six degrees of separation" Examples World Wide Web (WWW; Adamic, 1999; Albert, Jeong, & Barabasi, 1999), networks of scientic collaboration (Newman, 2001),metabolic networks in biology (Jeong, Tombor, Albert, Oltval, & Barabasi, 2000) E. Vylomova Associative thesauri
  • 22. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Description Associative Network based on RAT'98 Network analysis Scale-free networks Description Amaral, Scala et al., 2000 studied small-world networks and compared degree distribution function P (k ). 2 types of distribution: exponential(power grid system in USA, neural system of C.elegans) power law(WWW, metabolic networks): P (k ) = k −ψ , ψ ∈ (2..4) Scale-free networks provide better signal propagation. E. Vylomova Associative thesauri
  • 23. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Description Associative Network based on RAT'98 Network analysis Scale-free networks Other examples Similar results were obtained for Roget thesaurus(Roget, 1911),WordNet and associative networks(Steyvers and Tenenbaum, 2005). E. Vylomova Associative thesauri
  • 24. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Models of Associative Network Concept-based model Three Models of Associative Network Concept-based model Vector-based models Multidimensional scaling(Torgerson,1958) Latent Semantic analysis(Landauer, Dumais, 1997) E. Vylomova Associative thesauri
  • 25. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Models of Associative Network Concept-based model Data Core of the network: 4,692 lemmas with 59,392 connections The structure is similar to associative network(nodes-lemmas, edges - associations) Activity accumulation 1. Initial state: random activity 2. Spreading of activation: S = S −1 + w S −1 , where S is activity of neuron i at the moment t . 3. Activation exceeds the threshold => produce the reaction. S = 0. t t i i t j ij t i E. Vylomova Associative thesauri t j i
  • 26. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Models of Associative Network Concept-based model Pros and cons Pros very simple model easy to understand easy to modify(no need in reevaluation of the model) Cons unclear how to choose the threshold value(we did series of experiment to nd optimal value) once activation is released, should we also do modication for neighbouring neurons? E. Vylomova Associative thesauri
  • 27. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Models of Associative Network Concept-based model Multidimensional scaling From concept to vector Distance matrix:  δ1,1 δ1,2 · · · δ2,1 δ2,2 · · ·  = . . .. .  . . . . δI ,1 δI ,2 · · ·  δ1,I δ2,I   .  .  . δ I ,I where I means number of objects(words). Our goal is to nd such vectors x1 , ..., x ∈ R that for all i , j ∈ I . In other words: 2 min 1 ,.., < ( x −x −δ ) . I x xI i j i j N ij E. Vylomova Associative thesauri x i − xj ≈ δij
  • 28. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Models of Associative Network Concept-based model Latent Semantic Analysis From concept to vector-2 Technique of analysing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. In my case Terms are lemmas, document is a set of associations for a given stimulus. Inputs: term-document matrix with TF*IDF values Term frequency: TF = w = , Inverse document freqij ij N 1 freqij j= frequency: IDF = log |S | is a total number of stimuli.Singular Value Decomposition => vector representations. |S | |s ∈ S : r ∈ s | , E. Vylomova Associative thesauri
  • 29. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Models of Associative Network Concept-based model Clustering k-means So, we've got vectors. What's next? Let's evaluate similarity: First, set a distance metric, e.g. d = =1 |x − x | And use it with k-means clustering: min =1 ∈ (x − µ )2 , where k is a number of clusters, S are evaluated clusters,µ are centers of the clusters. So, the technique is based on nding the nearest cluster. r ij N k r ik jk k i xj Si j i i E. Vylomova i Associative thesauri
  • 30. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Models of Associative Network Concept-based model Clustering E. Vylomova Associative thesauri
  • 31. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Models of Associative Network Concept-based model Pros and cons Pros easy to operate with vectors: add, multiple, subtract, etc. possible to set preferred dimensionality and visualize Cons problem with storage: matrices are huge complexity: MDS and LSA are based on SVD; it takes O (n3 ) choosing optimal number of clusters and dimensionality E. Vylomova Associative thesauri
  • 32. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Models of Associative Network Concept-based model "Tip of the tongue"application Data&Method Data: RAT+Abramov's synonym dictionary Method: LSA+k-means E. Vylomova Associative thesauri
  • 33. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Models of Associative Network Concept-based model "Tip of the tongue"application Data&Method Usage of associative thesauri for solving tasks related to the “tip of the tongue” phenomenon Ekaterina Vylomova Bauman Moscow State Technical University, Moscow State University of Printing Arts DATA INTRODUCTION The tip-of-the-tongue(TOT) phenomenon is the failure to retrieve a word from memory, combined with partial recall and the feeling that retrieval is imminent. People in a tip-of-the-tongue state can often recall one or more features of the target word, such as the first letter, its syllabic stress, and words similar in sound and/or meaning. •TOT appears to be universal (Brennenet al. 2007) •An occasional tip-of-the-tongue state is normal for people of all ages •TOT becomes more frequent as people age. R. Braun, D. McNeill and A. Luria consider the processes of recalling and naming the words as processes of probabilistic choice of a word from involuntary associations’ chain and relate them to the construction of human semantic memory. Abramov. Dictionary of Russian synonyms and similar expressions, 1890-1999 19,297 words & phrases 18,136 synonym articles Karaulov Y.,, Tarasov E., Sorokin Y., Ufimtseva N., Cherkasova G.. 1999. Associative thesaurus of modern russian language. RAS, Moscow. 56,540 associative pairs 50,923 associative pairs (after lemmatization) 26,803 lemmas Overall (synonym and associative pairs combined together) 316,018 METHODOLOGY RAT RAT Abramov & RAT Abramov dict. lemmas Lemmatization Abramov RAT dict. Lemmatization using Yandex mystem stemmer LSA & k-NN Apply Latent Semantic Algorithm to get vector representation of words and k-nearest neighbours for clustering Clusters containing similar by meaning and association words EXAMPLE Hmm...What's the name Hmm...What's the name of that Ukranian food? of that Ukranian food? Associative thesauri+Abramov dictionary: Комильфо - приличие After clustering: не выходить из пределов благопристойности 0.001 степенный 0.591 чинный 0.591 благочинный 0.646 бонтонный 0.646 комильфотный 0.646 пристойный 0.646 благонравный 0.684 благоприличный 0.684 благопристойный 0.684 корректный 0.684 REFERENCES FUTURE PLANS 1. Expand synonym and associative thesauri with new ones 2. Add first letter filtering (see above) 3. Add hyponyms and hyperonyms RHF #12-04-12039B E. Vylomova 1. Brown, R., and McNeill, D. (1966). The "tip of the tongue" phenomenon. Journal of Verbal Learning and Verbal Behavior 5, 325-337. 2. Караулов Ю.Н., Тарасов Е.Ф., Сорокин Ю.А., Уфимцева Н.В., Черкасова Г.А. (1999). Ассоциативный тезаурус современного русского языка. РАН. (russian) 3. Лурия А.Р. (1979). Язык и сознание.//под редакцией Хомско Е.Д., МГУ, Москва - 320 стр.(russian) E-mail: evylomova@gmail.com Associative thesauri
  • 34. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work RAT'10 Time frame: 20 years after the rst one(2009-2010) Location: dierent regions of Russia. Stimuli included 1000 most frequent words in Russian language. The participants: young people at the age of 17-25. E. Vylomova Associative thesauri
  • 35. Introduction Associative Experiments Associative Thesauri Russian Associative Thesaurus'98 Associative Network(Graph) Modelling of Associative Network Future work Thank you! Questions? E. Vylomova Associative thesauri