A Vector Field Approach to Lexical Semantics
Upcoming SlideShare
Loading in...5

A Vector Field Approach to Lexical Semantics



The very fact that physics can model learning, a cognitive activity, tells us that we do not understand something fundamental about language, humans, and ultimately, the role of information in the ...

The very fact that physics can model learning, a cognitive activity, tells us that we do not understand something fundamental about language, humans, and ultimately, the role of information in the universe. Namely in order for a set of rules to work to similar ends in two absolutely unrelated domains suggests that they must be based on the very same principles and are therefore not at all unrelated. Below we depart from this assumption and model an index term vocabulary over the Reuters-21578 document collection as a vector field. We use an emergent self-organizing map with approximately five nodes per index term to interpolate a potential field to study lexical gaps in distributional patterns. Our finding paves the way to model this vector field on physical fields, and thereby model lexical cohesion on forces.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

A Vector Field Approach to Lexical Semantics Presentation Transcript

  • 1. A Vector Field Approach to Lexical Semantics Peter Wittek*, SÁNDOR DARÁNYI* and Ying-Hsang Liu** * Swedish School of Library and Information Science, University of Borås ** School of Information Studies, Charles Sturt University QI-14, Filzbach, Switzerland, June 30-July 3, 2014
  • 2. Report on work in progress • Ongoing work in building semantic spaces from distributional / compositional semantics: – E.g. Padó & Lapata 2007, Erk & Padó 2008, Baroni & Lenci 2010, Blacoe et al 2013, Grefenstette 2013, Socher et al 2012 – Content representation by vectors and matrices, binding by tensors, HRR • Our working hypothesis: – Word meaning can be expressed as energy (QI-11, QI-12) • Safe to say that located semantic content generates potential with energy minima on a potential surface • Content in regions, cf. semantic density (Mihalcea & Moldovan 1998) – In ML, the process of learning can be modelled as energy minimization • Either looking for a global minimum or local minima • Below we depart from this assumption and model the vocabulary of the Reuters-21578 ML test collection as a vector field – Emergent self-organizing map with approximately 5 nodes per index term to interpolate a potential field to study lexical gaps in distributional patterns – This paves the way to model lexical cohesion on forces • Lexical cohesion, also called collocation: when two sentence elements share a lexical field
  • 3. Aristotle and QI • Aristotle and QM: Bohm 1951, Heisenberg 1958, reviewed in Koznjak 2007 • Given existence/reality as the sum total of two components, observable/measurable actuality (energeia) plus conceivable potentiality (dynamis) (Aristotle Metaphysics) – Potentiality vs. actuality: the latent vs. manifest capacity of existents to induce change – In the QI frame, Aerts & Gabora (2005): a context-dependent property of a concept lends graded existence to it by weights spanning potential to actual (certain feature combinations are “less real” for subjects) • In our current model, existence consists of two layers, potentiality (a continuum) and actuality (a discrete distribution sampling the former one). Its field nature may go back to the potentiality layer which we indirectly perceive by the actualized values of events – For localized entities, actuality = one position vs. potentiality = all positions – Popping in and out of existence: anything in the present is in the overlap between last moment of future and first moment of past – Reality is in the state of potentiality before and after observation • Energy manifest in observed events (parole), dynamics latent in fields (langue)
  • 4. The two planes • Using geometry/probability as a vehicle of meaning, i.e. building a new medium of language, aims at maximizing similarity between the original and its statistical reconstruction • This original, i.e. some mental correlate called mental states/internal states (Elman 2004) in neuroscience, is possibly related to the “language of thought hypothesis” in philosophy (Fodor 1975), also called mentalese • “Hidden metaphysics” in traditional mentalist and more recent generalist- universalist theories about language: • Language is but a tool operated by something deeper – thought, reason, logic, cognition – which functions in line with biological neurological mechanisms common to all human beings (House 2000:69) • The same duplicity pertains to Neo-Humboldtian field theories of word meaning where lexical/semantic fields based on language use are underpinned by the assumption of conceptual fields in the mind • The lexical field of related words is only an outward manifestation of the underlying conceptual field • The sum total of conceptual fields describes one’s world view (Trier 1934) • Products of the mind are continuous, their mapping to speech (parole) is discrete • Hypothetically this can be aligned with potentiality as a continuum vs. actuality as a discrete mapping of some of the options into “real”, objective existence • Potentiality can be speculated about but not observed • Actuality and collapse of the wave function are the same
  • 5. Concepts, kinds and latent content Melucci’s kinds (2013): weights, algebra and probabilities Conceptual dynamics by biseriation • Term-document matrix • Kinds, due to 2-d layout, correspond to regions but without interpolation • Biseriation plus Gaussian blurring introduce interpolated content • The evolution of such field-like content is easy to visualize • But since graph local maxima go back to tfidf, this is not physical enough to account for dynamics • Interpolation approximates potentiality from actuality
  • 6. “Meaning is interaction between mental state and word as cue” (Elman 2004) Fallisard, B. (2011). A thought experiment reconciling neuroscience and psychoanalysis. Journal of Psychology 105, 201-206. Continuous to discrete • In neuroscience, continuous vehicle of percepts modeled as a grid by NN • In linguistics/semiotics, continuous conceptual field mapped onto discrete lexical field by content sampling • Mappings yield actual positions for lexemes, transitions between concepts remain in potentiality called lexical gaps • Can be modelled by grid nodes for actual lexemes vs. interpolated values for transitions
  • 7. Energy, concepts and categorization on the grid • In ML, the only reason why gradient descent and simulated annealing algorithms work is that minima as learning goal states overlap with minima as concepts, that is, the nature of the learning algorithm and the phenomenon are identical • Put another way, the learning function is isomorphic with the semantic substratum it is supposed to identify, both belonging in function space • Hence we regard concepts as attractors in a conceptual field, with lexemes as lexical attractors in its respective lexical field mapping • This view is supported by neurosemantics where concrete noun representations are stored in a spectral fashion (Just et al 2010)
  • 8. Reuters-21578 ESOM test • ML test collection • 21578 docs, 12000 terms (filtered here) • Economy news over one year • ESOM: source by Ultsch et al 2005 • SOMOCLU: fastest open-source SOM algorithm available • Visualization: ESOM Tools by Databionics (third-party software)
  • 9. Mapping content by weight vectors to BMUs Weight vectors (z axis) indicate a potential, mapped here to a grid of nodes with best matching units (BMUs) standing for terms
  • 10. Reuters-21578 term space as a vector field
  • 11. A first evaluation by cherry-picking A cropped section of the U-matrix with best matching units and labels, showing a tight cluster Large gap with BMUs pulled apart indicating tensions in the field The terms in this group, including ones that are not plotted in the figure, are: bongard, consign, ita, louisvill, occupi, reafffirm (with this misspelling), stabil, stabilis, strength, temporao, tight. Some are clearly related, for others, we find justification in the corpus. Large gap examples: Apart from energet and garrison, these words are frequent, with over twenty appearances each, but were separated from other regions by emerging content in the fault lines
  • 12. Evaluation plans • Next we plan a detailed evaluation of vector fields as vehicles of word meaning. This will focus on two things: – A genuine lexical corpus instead of Reuters – Measurement of semantic and concept drifts • Possible evaluation directions: – Change of meaning of specific words by changing context • E.g. in Baroni & Lenci 2010, their Distributional Memory approach was evaluated by several well- defined tasks and methods, such as similarity ratings and TOEFL synonym detection – Dislocation of the same concepts/terms and how their positions evolve over time – Emergence of new words (“lexical gap studies”)
  • 13. Ideas for future research • In other words, we plan to look at the dynamics (i.e. the tensions) underlying semantic and concept drifts by specifically looking at: – Distributional similarity studies (Weeds & Weir 2005, Rohde et al 2005), generalized to an algebraic form (Clarke 2012), also applied to image content (Bruni et al 2012) – New studies of word meaning in context (e.g. Erk et al 2013) – Neural models of lexical semantics, e.g. Ursino et al 2010 who suggest that during processing, it is regions and not single locations that become activated in the brain • Hence Elman's idea that word meaning is interaction between words as cues and mental states – or actuality and potentiality – starts making new sense
  • 14. Acknowledgements • The open source software used for this analysis is being developed by Peter Wittek, downloadable from here: http://peterwittek.github.io/som oclu/ • This work was supported by the European Commission Seventh Framework Programme under Grant Agreement Number FP7- 601138 PERICLES and by the AWS in Education Machine Learning Grant award • (And, before you ask, the point of digital preservation is access to “canned” content…)
  • 15. References Baroni, M. & Lenci, A. (2010). Distributional Memory: A General Framework for Corpus-Based Semantics. Computational Linguistics, 36(4), 673-721. Blacoe, W., Kashefi, E. & Lapata, M. (2013). A Quantum-Theoretic Approach to Distributional Semantics. Proceedings of NAACL- HLT 2013, 847–857, Atlanta, Georgia, 9–14 June 2013. Bruni, E., Uijlings, J., Baroni, M. & Sebe, N. (2012). Distributional Semantics with Eyes: Using Image Analysis to Improve Computational Representations of Word Meaning. Proceedings of the 20th ACM international conference on Multimedia, 1219- 1228 Clarke, D. (2012). A Context-Theoretic Framework for Compositionality in Distributional Semantics. Computational Linguistics, 38(1), 41-71. Elman, J.L. (2004). An alternative view of the mental lexicon. TRENDS in Cognitive Sciences 8(7), 301-306. Erk, K., McCarthy, D. & Gaylord, N. (2013). Measuring Word Meaning in Context. Computational Linguistics 39(3), 511-554. Erk, K. & Padó, S. (2008). A Structured Vector Space Model for Word Meaning in Context. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, 897–906, Honolulu, October 2008. Grefenstette, E., Dinu, G., Zhang, Y., Sadrzadeh, M. & Baron, M. (2013). Multi-Step Regression Learning for Compositional Distributional Semantics . arXiv: 1301.6939v2 [cs.CL] 30 Jan 2013. Fallisard, B. (2011). A thought experiment reconciling neuroscience and psychoanalysis. Journal of Psychology 105, 201-206. Hermann, K.M. & Blunsom, P. (2013). The Role of Syntax in Vector Space Models of Compositional Semantics. Proceedings of the 51st Annual Meeting of the ACL, 894–904, Sofia, Bulgaria, August 4-9 2013. Just, M.A., Cherkassky, V.L., Aryal, S. & Mitchell, T.M. (2010). A neurosemantic theory of concrete noun representation based on the underlying brain codes.. PLoS ONE 5(1), e8622. Mihalcea, R. & Moldovan, D. (1998). Word sense disambiguation based on semantic density. Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal, Canada, August 1998. Padó, S. & Lapata, M. (2007). Dependency-Based Construction of Semantic Space Models. Computational Linguistics, 33(2), 161- 199. Rohde, D., Gonnerman, L.M. & Plaut, D.C. (2006). An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence. Communications of the ACM, 8, 627-633. Socher, R., Huval, B., Manning, C.D. & Ng, A.Y. (2012). Semantic Compositionality through Recursive Matrix-Vector Spaces. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning,1201-1211. Ursino, M., Cuppini, C. & Magosso, E. (2010). A computational model of the lexical-semantic system based on a grounded cognition approach. Frontiers in Psychology 1, 1-19. Ultsch, A., & Moerchen, F. (2005). ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM. Technical Report Dept. of Mathematics and Computer Science, University of Marburg, Germany, No. 46. Weeds, J., & Weir, D. (2005). Co-occurrence retrieval: A flexible framework for lexical distributional similarity. Computational Linguistics 31(4), 439-475.