Combining Word Semantics within Complex Hilbert Space for Information Retrieval
Upcoming SlideShare
Loading in...5
×
 

Combining Word Semantics within Complex Hilbert Space for Information Retrieval

on

  • 382 views

Complex numbers are a fundamental aspect of the mathematical formalism of quantum physics. Quantum-like models developed outside physics often overlooked the role of complex numbers. Specifically, ...

Complex numbers are a fundamental aspect of the mathematical formalism of quantum physics. Quantum-like models developed outside physics often overlooked the role of complex numbers. Specifically, previous models in Information Retrieval (IR) ignored complex numbers. We argue that to advance the use of quantum models of IR, one has to lift the constraint of real-valued representations of the information space, and package more information within the representation by means of complex numbers. As a first attempt, we propose a complex-valued representation for IR, which explicitly uses complex valued Hilbert spaces, and thus where terms, documents and queries are represented as complex-valued vectors. The proposal consists of integrating distributional semantics evidence within the real component of a term vector; whereas, ontological information is encoded in the imaginary component. Our proposal has the merit of lifting the role of complex numbers from a computational byproduct of the model to the very mathematical texture that unifies different levels of semantic information. An empirical instantiation of our proposal is tested in the TREC Medical Record task of retrieving cohorts for clinical studies.

Statistics

Views

Total Views
382
Views on SlideShare
382
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Combining Word Semantics within Complex Hilbert Space for Information Retrieval Combining Word Semantics within Complex Hilbert Space for Information Retrieval Document Transcript

    • CombiningWordSemanticswithin ComplexHilbertSpaceforInformation Retrieval Peter Wittek, Bevan Koopman, Guido Zuccon, and Sándor Darányi KEY POINTS • We combine distributional and concep- tual representation; • Two random indices merge to create a complex space; • Efficiency of information retrieval im- proves; • The phase of the complex vector carries additional information. MOTIVATION Combining term frequency and inverse docu- ment frequency in a complex space provided poor retrieval effectiveness, and the power of representation was questioned [1]. In concept-based indexing, documents are rep- resented by concepts rather than terms, as is in- stead the case for traditional term-based repre- sentations. It is highly efficient in medical in- formation retrieval [2]. We encode both distributional information and higher-level conceptual information in the components of the complex numbers. An L2 space-based representation embedded the two kinds of semantics by a seriation of the feature space [3]. Seriation is slow, the only preprocess- ing we need is mapping the terms to concepts. TOY EXAMPLE Consider the following documents and query: • Document D1 = “kidney stones” • Document D2 = “kidney” • Document D3 = “renal calculi” • Query Q = “kidney stones” A term-based retrieval system processing Q would return only the documents D1, D2 (in this order). However, renal calculi in D3 is actually a synonym of the query kidney stones, while D2 is actually not relevant. Therefore, D3 should be ranked higher than D2. Using our method, when a concept representa- tion is included, the phrases kidney stones and renal calculi both map to the same SNOMED CT concept 155868000. Thus, our ranking approach would retrieve D1 in the first rank position because of the contribution of both term and concept weighting; and D3 above D2 because of the inclusion of the con- cept weighting. CONSTRUCTING THE SPACE Terms ConceptsMetaMap Random Index Terms Random Index Concepts To construct the complex space, we rely on Se- manticVectors to generate to random indices – one based on the term representation, one based on a concept representation. The two random indices are merged in one complex space. RESULTS 11-point precision-recall graph 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Precision Recall Term Concept Complex MAP and P@10 for term retrieval, concept retrieval, and our method Measure Term-based Concept-based Complex MAP 0.0886 0.1084 0.1245 P@10 0.1593 0.1963 0.2235 Distribution of angles between real and imaginary components of complex vectors representing documents from the TREC Medical Records Track collection. Angles were expressed in radians and were normalised in the range [0, 2π[. 90degrees 1.60 1.65 1.70 Angle 100 200 300 400 500 600 700 Frequency Mean value and extrema of phase Mean absolute phase Lowest Phase Highest Phase 6.9520 -39.9163 28.6791 REFERENCES [1] Zuccon, G., Piwowarski, B., Azzopardi, L.: On the use of complex numbers in quantum models for infor- mation retrieval. In: Proceedings of ICTIR-11, 3rd International Conference on the Theory of Information Retrieval, Bertinoro, Italy (September 2011) [2] Koopman, B., Zuccon, G., Bruza, P., Sitbon, L., Lawley, M.: Graph-based concept weighting for medical information retrieval. In: Proceedings of ADCS-12, 17th Australasian Document Computing Symposium, Dunedin, New Zealand (December 2012) 80–87 [3] Wittek, P., Tan, C.L.: Compactly supported basis functions as support vector kernels for classification. Transactions on Pattern Analysis and Machine Intelligence 33(10) (2011) 2039 –2050 MORE INFORMATION Source code and steps to repro- duce the results are available here: http://peterwittek. com/2013/07/ merging-in-complex-space/