Neural modeling of verbal
consciousness based on the results
of the associative experiment


Researcher : Katerina Vylomova
Scientific adviser: Yuri Philippovich



                          3/18/2013     1
Goals and tasks
 Theme actuality:
Syntax->Semantic: search engines, machine translation, NL texts
generation
Neural system modeling: CMU, University of CA, Irvine
 Goal:
Development of the neural network model of the verbal consciousness
 Tasks:
   ◦ The associative verbal thesaurus analysis
   ◦ The associative verbal network analysis
   ◦ Analysis of the formal models of the associative experiment
   ◦ Development of the neural network models of the verbal
     consciousness
   ◦ Research of the neural network
   ◦ Practical implementation of the research and results’ visualization



                                  3/18/2013                                2
Source data
Ю.Н. Караулов, Е.Ф. Тарасов, Ю.А. Сорокин, Н.В.Уфимцева, Г.А. Черкасова. (1999).
Ассоциативный тезаурус современного русского языка. РАН
Example: серьезный человек 25
Main parameters:
     Time period: 1988-1998
     Participants: 11,000 1-3year students of 34 specialities
     Number of stimuli: 6,624
     Number of cue-reaction pairs: 1,032,522
     Different pairs: 462,500
     Different reactions: 102,926
Current subset :
     Number of stimuli: 6,577
     Number of reactions: 21,312
     Different cue-reaction pairs: 102,516
 Conversion: the relative frequency(weight; reactions per cue):
                 𝑓𝑟𝑒𝑞 𝑖𝑗
  𝑤𝑒𝑖𝑔𝑕𝑡 𝑖𝑗 =   𝑛     𝑓𝑟𝑒𝑞 𝑖𝑗
                                , 𝑓𝑟𝑒𝑞 𝑖𝑗 , ∣ 𝑓𝑟𝑒𝑞 𝑖𝑗 ∣= 102516 – associative pairs
                𝑗=1
 frequency



                                             3/18/2013                                3
The thesaurus analysis
      Analogues:
      USA (Jenkins & Palermo, 1964; Deese, 1965; Cramer, 1968; Nelson, 1999) ,
      Russia (Леонтьев, 1977) , Belgium(De Groot, 1988;De Deyne & Storms,
      2008), Japan (Okamoto & Ishizaki, 2001; Joyce, 2005), South Korea(Jung et
      al.,2010), Great Britain (Kiss et al., 1973)
      Reactions’ parts of speech analysis (lemmatization – Mystem utility):
      ~77%(~55%) - nouns, 16%(~25%) - adj., 6% (~18%) – verbs, 0.4%(~0.9%) –
      adv., others- 0.6% (~0.9)*.
      The most frequent reactions’ comparison:
     RAT                  Sharov’s dict.**     Intersection         RAT full   KorWA
     Друг (13154.93)      Год (2718.78)        Друг                 Человек    Деньги
     Вода (7402.37)       Человек (2369.34)    Вода                 Дом        Любовь
     Дурак (7309.50)      Время (1662.10)      Дело                 Деньги     Друг
     Дело (7062.12)       Дело (1175.12)       Язык                 День       Человек
     Язык (6409.32)       Жизнь (1155.78)      Ребенок              Друг       Вода
     Ребенок (6373.7)     День (970.49)        Вопрос               Домой      Мечта
     Вопрос (5261.97)     Рука (969.75)        Стол                 Мужчина    Армия
     Стол (5218.45)       Работа (904.43)      Время                Дурак      Разум
     Время (5163.06)      Слово (817.80)       Море                 Дело       Дом
     Свет (4858.42)       Вопрос (751.74)      Ответ                Жизнь      Слеза

* Values in brackets – only for different words
                                                                                         4
** Lyashevskaya & Sharov dictionary based on Russian National Corpora
The associative-verbal network
    analysis
      Words as vertices: ∣ 𝑉 ∣= 23195,connections between
      them(associations) as edges: ∣ 𝐸 ∣= 102516.
      3 types of the vertices: output edges only(stimuli),∣ 𝑆 ∣= 1883; input
      edges only (reactions),∣ 𝑅 ∣= 16618; input and output edges (stimuli-
      reactions) ,∣ 𝑆𝑅 ∣= 4694.
      Graph parameters (Steyvers and Tenenbaum, 2005):
           Sign               Description                      Directed           Undirected

n                     Number of vertices           23195                  23195

|E|                   Number of edges              102516                 95518

L                     Average length of the         3.989461              3.836189
                      shortest path between pair of
                      nodes
D                     Graph diameter               9                      8

𝛾                     Nodes power distribution     2,200                  1,850
                      function parameter
<k>                   Average node power           4,42                   8,839


                                                 3/18/2013                                     5
Characteristics of the associative
graph
 «small-world» networks(Milgram, 1967)
        6 degrees of separation: 𝐿 ∝ log 𝑁
World Wide Web (WWW; Adamic, 1999; Albert, Jeong, &
Barabási, 1999), networks of scientific collaboration(Newman,
2001), metabolic networks in biology (Jeong, Tombor, Albert,
Oltval, & Barabási, 2000)
 Scale-free networks (Amaral, Scala, Barthélémy, Stanley,
  2000)
𝑃(𝑘) ≈ 𝑘 (−𝛾) , where
𝛾 ∈ (2. . 4)




                            3/18/2013                           6
Data preprocessing
   Data quantity reduction


   Transition from concept space
to vector space
    ◦ Latent Semantic analysis
       Create TF*IDF matrix
       Find eigenvectors and eigenvalues
       Apply Lanczos algorithms (for sparse matrices)
    ◦ Multidimensional scaling (Torgerson, 1958)
    𝑆 𝑖,𝑗 = 𝑆 𝑗,𝑖 = 𝐴 𝑖,𝑗 + 𝐴 𝑗,𝑖 , 𝑇𝑖,𝑗 = −log(𝑆 𝑖,𝑘 𝑆 𝑘,𝑙 . . . 𝑆ℎ,𝑗 )(Steyvers et al.,2004)
        𝛿11     …     𝛿1𝐼
     𝛥= …       …     …             𝑚𝑖𝑛        𝑖<𝑗(∥   𝑥 𝑖 − 𝑥 𝑗 ∥ −𝛿 𝑖,𝑗 )2 , 𝑥 𝑖 , 𝑥 𝑗 -?
                                  𝑥1, ..,𝑥 𝐼
        𝛿 𝐼1    …     𝛿 𝐼𝐼


                                                3/18/2013                                        7
Model based on vector space
  Clustering (k-means)
       𝑘
Min 𝑖=1 𝑥 𝑗 ∈𝑆 𝑖(𝑥 𝑗 − 𝜇 𝑖 )2 ,where k – number of clusters, 𝑆 𝑖 – current
clusters, 𝜇 𝑖 – centers of the clusters 𝑥 𝑗 ∈ 𝑆 𝑖 .
                              𝑟
                                        𝑛                   𝑟
   Distance metric: 𝑑 𝑖𝑗 =             𝑘=1   𝑥 𝑖𝑘 − 𝑥 𝑗𝑘
   Search of the closest center :
                                                                    Model advantages:
                                                                 Possibility of creating
                                                                  Non-existent concept;
                                  C1                        Possibility to visualize and
                                                  C3          to change dimensionality;
                          2                                     Model disadvantages:
                                       C2
                                                         Have to find optimal clusters
                                                                                number;
                                                                   Accurate clustering;
                                                   Have to find optimal dimensionality.


                                               3/18/2013                                8
Model based on the concept space
   Core of the network: Overall number of concepts– 4,692, connections
    – 59,392
    𝑆 𝑖𝑡 = 𝑆 𝑖𝑡−1 + 𝑗 𝑤 𝑗𝑖 ∗ 𝑆 𝑗𝑡−1 , where 𝑆 𝑖𝑡 – activity of ith neuron at the
    moment t,
    𝑤 𝑗𝑖 − weight of the connection between neuron 𝑗 and neuron 𝑖
                                                                РЫБА
                                                                РЫБА
                                                                          РЕКА
                                                                          РЕКА




                                                                                 МАТЬ
                                                                                 МАТЬ

                                                                ВОЛГА
                                                                ВОЛГА




                                                                                 МОТО
                                                                                 МОТО
                                                                                  РР
                       АВТО
                       АВТО
                                                         МАШИ
                                                         МАШИ
                       МОБИ
                       МОБИ
                                                          НА
                                                          НА
                        ЛЬ
                        ЛЬ

    ДВИГА
    ДВИГА                           СКОРО
                                    СКОРО
     ТЕЛЬ
     ТЕЛЬ                            СТЬ
                                      СТЬ
                                                                        ТЕЛЕГ
                                                                        ТЕЛЕГ
                                                                          А
                                                                          А


            КРАСИ
            КРАСИ
             ВЫЙ
             ВЫЙ
                                     ТАЧКА
                                     ТАЧКА

                    ЛЕГКО
                    ЛЕГКО
                     ВОЙ
                     ВОЙ




                                             3/18/2013                                  9
Experiments with the network




              3/18/2013        10
Experiments with the network




              3/18/2013        11
Experiments with the network




              3/18/2013        12
Experiments with the network




              3/18/2013        13
Web-application
   Django, Python, PostgreSQL,
                     Model-View-
                           Controller




                     3/18/2013          14
Web-application




              3/18/2013   15
Autocomplete system
                                           Urls.py                                     Views.py
                                                                                                                           Avs.html
                    ../avs/                                                   def avs_noargs(request):
                                                                                cues_form = CuesForm()
                                                                                             ….                     {{cues_form.as_table }}
                                    (r'^avs/$', avs_noargs)                                return
                                                                              render_to_response('avs.htm
                                                                               l',{'cues_form':cues_form})




                                                                                        Views.py
                                                                              class CuesForm(forms.Form):
                                                                                 cues = forms.CharField….
                                                                                     self.fields['cues'].widget =
                                                                              AutoCompleteWidget(lookup_ur
                                                                              l = '/cues_ac/')




              Views.py
 def cues_ac(request):query =
 request.GET.get('query', None)                 urls.py                               widgets.py
 if query:                                                      ../cues_ac/
       qargs =
 [Q(data__startswith=query)]            url(r'^cues_ac/$',                              class
    cues =                              cues_ac),                             AutoCompleteWidget(forms.
 Cues.objects.filter(Q(*qargs)).o                                                 widgets.TextInput)
 rder_by('data')[:limit]
     return JsonResponse(cues)




                                                json.py
                                                 class
                                         JsonResponse(HttpRes
                                                ponse)




                                                                              3/18/2013                                                       16
Conclusion
   Subject area analysis
   Comparative analysis with Russian Frequency Dictionary
   Comparative analysis with the results of other experiments
   Associative graph analysis
   Data preprocessing
   Models of the network : 2 vector space + 1 concept space
   Neural Network experiments
   Web-application to work with associative network and the
    model




                              3/18/2013                          17
Thank you for your attention!




           Questions?




               3/18/2013        18

Neural modeling of verbal consciousness based on the results of the associative experiment

  • 1.
    Neural modeling ofverbal consciousness based on the results of the associative experiment Researcher : Katerina Vylomova Scientific adviser: Yuri Philippovich 3/18/2013 1
  • 2.
    Goals and tasks Theme actuality: Syntax->Semantic: search engines, machine translation, NL texts generation Neural system modeling: CMU, University of CA, Irvine  Goal: Development of the neural network model of the verbal consciousness  Tasks: ◦ The associative verbal thesaurus analysis ◦ The associative verbal network analysis ◦ Analysis of the formal models of the associative experiment ◦ Development of the neural network models of the verbal consciousness ◦ Research of the neural network ◦ Practical implementation of the research and results’ visualization 3/18/2013 2
  • 3.
    Source data Ю.Н. Караулов,Е.Ф. Тарасов, Ю.А. Сорокин, Н.В.Уфимцева, Г.А. Черкасова. (1999). Ассоциативный тезаурус современного русского языка. РАН Example: серьезный человек 25 Main parameters:  Time period: 1988-1998  Participants: 11,000 1-3year students of 34 specialities  Number of stimuli: 6,624  Number of cue-reaction pairs: 1,032,522  Different pairs: 462,500  Different reactions: 102,926 Current subset :  Number of stimuli: 6,577  Number of reactions: 21,312  Different cue-reaction pairs: 102,516 Conversion: the relative frequency(weight; reactions per cue): 𝑓𝑟𝑒𝑞 𝑖𝑗 𝑤𝑒𝑖𝑔𝑕𝑡 𝑖𝑗 = 𝑛 𝑓𝑟𝑒𝑞 𝑖𝑗 , 𝑓𝑟𝑒𝑞 𝑖𝑗 , ∣ 𝑓𝑟𝑒𝑞 𝑖𝑗 ∣= 102516 – associative pairs 𝑗=1 frequency 3/18/2013 3
  • 4.
    The thesaurus analysis Analogues: USA (Jenkins & Palermo, 1964; Deese, 1965; Cramer, 1968; Nelson, 1999) , Russia (Леонтьев, 1977) , Belgium(De Groot, 1988;De Deyne & Storms, 2008), Japan (Okamoto & Ishizaki, 2001; Joyce, 2005), South Korea(Jung et al.,2010), Great Britain (Kiss et al., 1973) Reactions’ parts of speech analysis (lemmatization – Mystem utility): ~77%(~55%) - nouns, 16%(~25%) - adj., 6% (~18%) – verbs, 0.4%(~0.9%) – adv., others- 0.6% (~0.9)*. The most frequent reactions’ comparison: RAT Sharov’s dict.** Intersection RAT full KorWA Друг (13154.93) Год (2718.78) Друг Человек Деньги Вода (7402.37) Человек (2369.34) Вода Дом Любовь Дурак (7309.50) Время (1662.10) Дело Деньги Друг Дело (7062.12) Дело (1175.12) Язык День Человек Язык (6409.32) Жизнь (1155.78) Ребенок Друг Вода Ребенок (6373.7) День (970.49) Вопрос Домой Мечта Вопрос (5261.97) Рука (969.75) Стол Мужчина Армия Стол (5218.45) Работа (904.43) Время Дурак Разум Время (5163.06) Слово (817.80) Море Дело Дом Свет (4858.42) Вопрос (751.74) Ответ Жизнь Слеза * Values in brackets – only for different words 4 ** Lyashevskaya & Sharov dictionary based on Russian National Corpora
  • 5.
    The associative-verbal network analysis Words as vertices: ∣ 𝑉 ∣= 23195,connections between them(associations) as edges: ∣ 𝐸 ∣= 102516. 3 types of the vertices: output edges only(stimuli),∣ 𝑆 ∣= 1883; input edges only (reactions),∣ 𝑅 ∣= 16618; input and output edges (stimuli- reactions) ,∣ 𝑆𝑅 ∣= 4694. Graph parameters (Steyvers and Tenenbaum, 2005): Sign Description Directed Undirected n Number of vertices 23195 23195 |E| Number of edges 102516 95518 L Average length of the 3.989461 3.836189 shortest path between pair of nodes D Graph diameter 9 8 𝛾 Nodes power distribution 2,200 1,850 function parameter <k> Average node power 4,42 8,839 3/18/2013 5
  • 6.
    Characteristics of theassociative graph  «small-world» networks(Milgram, 1967) 6 degrees of separation: 𝐿 ∝ log 𝑁 World Wide Web (WWW; Adamic, 1999; Albert, Jeong, & Barabási, 1999), networks of scientific collaboration(Newman, 2001), metabolic networks in biology (Jeong, Tombor, Albert, Oltval, & Barabási, 2000)  Scale-free networks (Amaral, Scala, Barthélémy, Stanley, 2000) 𝑃(𝑘) ≈ 𝑘 (−𝛾) , where 𝛾 ∈ (2. . 4) 3/18/2013 6
  • 7.
    Data preprocessing  Data quantity reduction  Transition from concept space to vector space ◦ Latent Semantic analysis  Create TF*IDF matrix  Find eigenvectors and eigenvalues  Apply Lanczos algorithms (for sparse matrices) ◦ Multidimensional scaling (Torgerson, 1958) 𝑆 𝑖,𝑗 = 𝑆 𝑗,𝑖 = 𝐴 𝑖,𝑗 + 𝐴 𝑗,𝑖 , 𝑇𝑖,𝑗 = −log(𝑆 𝑖,𝑘 𝑆 𝑘,𝑙 . . . 𝑆ℎ,𝑗 )(Steyvers et al.,2004) 𝛿11 … 𝛿1𝐼 𝛥= … … … 𝑚𝑖𝑛 𝑖<𝑗(∥ 𝑥 𝑖 − 𝑥 𝑗 ∥ −𝛿 𝑖,𝑗 )2 , 𝑥 𝑖 , 𝑥 𝑗 -? 𝑥1, ..,𝑥 𝐼 𝛿 𝐼1 … 𝛿 𝐼𝐼 3/18/2013 7
  • 8.
    Model based onvector space  Clustering (k-means) 𝑘 Min 𝑖=1 𝑥 𝑗 ∈𝑆 𝑖(𝑥 𝑗 − 𝜇 𝑖 )2 ,where k – number of clusters, 𝑆 𝑖 – current clusters, 𝜇 𝑖 – centers of the clusters 𝑥 𝑗 ∈ 𝑆 𝑖 . 𝑟 𝑛 𝑟  Distance metric: 𝑑 𝑖𝑗 = 𝑘=1 𝑥 𝑖𝑘 − 𝑥 𝑗𝑘  Search of the closest center :  Model advantages:  Possibility of creating Non-existent concept; C1  Possibility to visualize and C3 to change dimensionality; 2  Model disadvantages: C2  Have to find optimal clusters number;  Accurate clustering;  Have to find optimal dimensionality. 3/18/2013 8
  • 9.
    Model based onthe concept space  Core of the network: Overall number of concepts– 4,692, connections – 59,392  𝑆 𝑖𝑡 = 𝑆 𝑖𝑡−1 + 𝑗 𝑤 𝑗𝑖 ∗ 𝑆 𝑗𝑡−1 , where 𝑆 𝑖𝑡 – activity of ith neuron at the moment t, 𝑤 𝑗𝑖 − weight of the connection between neuron 𝑗 and neuron 𝑖 РЫБА РЫБА РЕКА РЕКА МАТЬ МАТЬ ВОЛГА ВОЛГА МОТО МОТО РР АВТО АВТО МАШИ МАШИ МОБИ МОБИ НА НА ЛЬ ЛЬ ДВИГА ДВИГА СКОРО СКОРО ТЕЛЬ ТЕЛЬ СТЬ СТЬ ТЕЛЕГ ТЕЛЕГ А А КРАСИ КРАСИ ВЫЙ ВЫЙ ТАЧКА ТАЧКА ЛЕГКО ЛЕГКО ВОЙ ВОЙ 3/18/2013 9
  • 10.
    Experiments with thenetwork 3/18/2013 10
  • 11.
    Experiments with thenetwork 3/18/2013 11
  • 12.
    Experiments with thenetwork 3/18/2013 12
  • 13.
    Experiments with thenetwork 3/18/2013 13
  • 14.
    Web-application  Django, Python, PostgreSQL, Model-View- Controller 3/18/2013 14
  • 15.
    Web-application 3/18/2013 15
  • 16.
    Autocomplete system Urls.py Views.py Avs.html ../avs/ def avs_noargs(request): cues_form = CuesForm() …. {{cues_form.as_table }} (r'^avs/$', avs_noargs) return render_to_response('avs.htm l',{'cues_form':cues_form}) Views.py class CuesForm(forms.Form): cues = forms.CharField…. self.fields['cues'].widget = AutoCompleteWidget(lookup_ur l = '/cues_ac/') Views.py def cues_ac(request):query = request.GET.get('query', None) urls.py widgets.py if query: ../cues_ac/ qargs = [Q(data__startswith=query)] url(r'^cues_ac/$', class cues = cues_ac), AutoCompleteWidget(forms. Cues.objects.filter(Q(*qargs)).o widgets.TextInput) rder_by('data')[:limit] return JsonResponse(cues) json.py class JsonResponse(HttpRes ponse) 3/18/2013 16
  • 17.
    Conclusion  Subject area analysis  Comparative analysis with Russian Frequency Dictionary  Comparative analysis with the results of other experiments  Associative graph analysis  Data preprocessing  Models of the network : 2 vector space + 1 concept space  Neural Network experiments  Web-application to work with associative network and the model 3/18/2013 17
  • 18.
    Thank you foryour attention! Questions? 3/18/2013 18