Random Indexing

1,294 views

Published on

On space and meaning.

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,294
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
15
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Random Indexing

  1. 1. Random indexing: On space and meaning Simon Belak
  2. 2. Order of the day <ul><li>Meaning </li></ul><ul><ul><li>Philosophy </li></ul></ul><ul><ul><li>Neuroscience </li></ul></ul><ul><ul><li>Computer science </li></ul></ul><ul><li>Space </li></ul><ul><ul><li>Words as points in space </li></ul></ul><ul><ul><li>On dimensionality </li></ul></ul><ul><li>Random indexing </li></ul>
  3. 3. What’s the meaning of meaning ?
  4. 4. Philosophers say: <ul><li>“ Meaning just is use.” </li></ul><ul><li>– Wittgenstein </li></ul>
  5. 5. Neuroscientists say: <ul><li>Episodic memory  semantic memory </li></ul><ul><ul><li>(concrete event  abstract concept) </li></ul></ul><ul><li>Hebbian process </li></ul>
  6. 6. Computer scientists say: <ul><li>LSA </li></ul><ul><li>semantic networks </li></ul><ul><li>HAL </li></ul><ul><li>TLC </li></ul><ul><li>SAM </li></ul><ul><li>ACT-R </li></ul><ul><li>ontology </li></ul>
  7. 7. Projecting meaning into space
  8. 8. Adjacent words closely related
  9. 9. Movement <ul><li>Co-occurrences </li></ul><ul><li>Hebbian process </li></ul><ul><ul><li>Self-organisation </li></ul></ul><ul><ul><li>Clustering </li></ul></ul><ul><li>Evolution of language </li></ul><ul><ul><li>Coach ( Kocs  carriage  train  car) </li></ul></ul>
  10. 10. Problem: homonym s <ul><li>Table </li></ul><ul><li>1. </li></ul><ul><li>a. An article of furniture supported by one or more vertical legs and having a flat horizontal surface. </li></ul><ul><li>b. The objects laid out for a meal on this article of furniture. </li></ul><ul><li>2. The food and drink served at meals; fare: kept an excellent table. </li></ul><ul><li>3. The company of people assembled around a table, as for a meal. </li></ul><ul><li>4 A plateau or tableland. </li></ul><ul><li>5 . </li></ul><ul><li>a. A flat facet cut across the top of a precious stone. </li></ul><ul><li>b . A stone or gem cut in this fashion. </li></ul><ul><li>6 . Music </li></ul><ul><li>a. The front part of the body of a stringed instrument. </li></ul><ul><li>b. The sounding board of a harp. </li></ul><ul><li>7 . Architecture </li></ul><ul><li>a. A raised or sunken rectangular panel on a wall. </li></ul><ul><li>b. A raised horizontal surface or continuous band on an exterior wall; a stringcourse. </li></ul><ul><li>8 . A part of the human palm framed by four lines, analyzed in palmistry. </li></ul><ul><li>9 . An orderly arrangement of data, especially one in which the data are arranged in columns and rows in an essentially rectangular form. </li></ul><ul><li>1 0 . An abbreviated list, as of contents; a synopsis. </li></ul><ul><li>1 1 . An engraved slab or tablet bearing an inscription or a device. </li></ul><ul><li>1 2 . Anatomy The inner or outer flat layer of bones of the skull separated by the dipole. </li></ul>
  11. 11. Solution: high dimensionality <ul><li>One dimension per word </li></ul><ul><li>Table extends into food , furniture , music ,... dimensions </li></ul>
  12. 12. Problem: synonyms <ul><li>amazing , stupefying , staggering , awesome , awful , awe-inspiring , awing , astonishing , astounding </li></ul>
  13. 13. Solution: latent meaning <ul><li>Reduced dimensionality </li></ul><ul><li>Closely related words fold into one </li></ul><ul><li>“ Higher-order” meaning </li></ul>
  14. 14. Random indexing
  15. 15. The idea <ul><li>Word is the sum of it’s contexts </li></ul><ul><li>Context is the sum of it’s words </li></ul><ul><li>Grounding? </li></ul>
  16. 16. The algorithm <ul><li>Take a context of words </li></ul><ul><li>Generate a context index vector </li></ul><ul><li>Add index to all the word vectors </li></ul><ul><li>Go to 1) </li></ul><ul><li>Episodic memory (2) + Hebbian process (3) </li></ul>
  17. 17. Dimensionality reduction <ul><li>Sparse high-dimensional ternary index </li></ul><ul><li>(a small number of randomly distributed +1s and -1s) </li></ul><ul><li>N early orthogonal </li></ul><ul><ul><li>Distances approximately preserved </li></ul></ul>
  18. 18. The good <ul><li>Fast, scalable </li></ul><ul><li>Trivially parallelised </li></ul><ul><ul><li>Per word </li></ul></ul><ul><ul><li>Addition is associative, commutative </li></ul></ul><ul><li>Stable </li></ul><ul><ul><li>Words are independent </li></ul></ul><ul><ul><li>Integer arithmetics </li></ul></ul><ul><li>Incremental </li></ul>
  19. 19. The bad <ul><li>Memory hungry </li></ul><ul><ul><li>Caching (Zipf’s law) </li></ul></ul>
  20. 20. Uses <ul><li>Comparing words to words </li></ul><ul><ul><li>Query expnasion </li></ul></ul><ul><li>Comparing documents to documents </li></ul><ul><ul><li>Clustering </li></ul></ul><ul><ul><li>Search </li></ul></ul><ul><ul><li>Recomendations </li></ul></ul><ul><li>Comparing documents to words </li></ul><ul><ul><li>Keyword extraction </li></ul></ul>
  21. 21. Key points <ul><li>Meaning is use </li></ul><ul><li>Words in space </li></ul><ul><li>Multiple meanings, multiple dimensions </li></ul><ul><li>Random indexing </li></ul><ul><ul><li>Cognitive rationale </li></ul></ul><ul><ul><li>Simple </li></ul></ul><ul><ul><li>Fast, scalable </li></ul></ul>
  22. 22. Questions?
  23. 23. References <ul><li>http://www.sics.se/~mange/papers/KarlgrenSahlgren2001.pdf </li></ul><ul><li>http://www.kfs.org/~jonathan/witt/tlph.html </li></ul><ul><li>http://www.mtsu.edu/~sschmidt/Cognitive/semantic/semantic.html </li></ul><ul><li>http://memory.syr.edu/marc/papers/HowaAddiJingKaha-LSAChap-doc.pdf </li></ul><ul><li>http://memory.psych.upenn.edu/research/research_episodic_memory.php </li></ul>

×