Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dual Embedding Space Model (DESM)

242 views

Published on

A fundamental goal of search engines is to identify, given a query, documents that have relevant text. This is intrinsically difficult because the query and the document may use different vocabulary, or the document may contain query words without being relevant. We investigate neural word embeddings as a source of evidence in document ranking. We train a word2vec embedding model on a large unlabelled query corpus, but in contrast to how the model is commonly used, we retain both the input and the output projections, allowing us to leverage both the embedding spaces to derive richer distributional relationships. During ranking we map the query words into the input space and the document words into the output space, and compute a query-document relevance score by aggregating the cosine similarities across all the query-document word pairs.

We postulate that the proposed Dual Embedding Space Model (DESM) captures evidence on whether a document is about a query term in addition to what is modelled by traditional term-frequency based approaches. Our experiments show that the DESM can re-rank top documents returned by a commercial Web search engine, like Bing, better than a term-matching based signal like TF-IDF. However, when ranking a larger set of candidate documents, we find the embeddings-based approach is prone to false positives, retrieving documents that are only loosely related to the query. We demonstrate that this problem can be solved effectively by ranking based on a linear mixture of the DESM and the word counting features.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Dual Embedding Space Model (DESM)

  1. 1. Dual Embedding Space Model (DESM) Bhaskar Mitra, Eric Nalisnick, Nick Craswell and Rich Caruana https://arxiv.org/abs/1602.01137
  2. 2. How do you learn a neural embedding? Setup a prediction task Source Item → Target Item (The bottleneck layers are crucial for generalization) Target item (sparse) Source item (sparse) Source embedding (dense) Target Embedding (dense) Distance Metric The bottleneck Word2vec Mikolov et. al. (2013) Word → Neighboring word I/O: One-Hot DSSM (Query-Document) Huang et. al. (2013), Shen et. al. (2014) Query → Document I/O: Bag-of-trigrams DSSM (Session Pairs) Mitra (2015) Query → Neighboring query in session I/O: Bag-of-trigrams DSSM (Language Model) Mitra and Craswell (2015) Query prefix → query suffix I/O: Bag-of-trigrams
  3. 3. Not all embeddings are created equal The source-target training pairs strictly dictate what notion of relatedness will be modelled in the embedding space Is eminem more similar to rihanna or rap? Is yale more similar to harvard or alumni? Is seahawks more similar to broncos or seattle? (Be careful of using pre-trained embeddings as inputs to a different model – one-hot representations or learning an in situ embedding may be better!)
  4. 4. Word2vec Learning word embeddings based on word co-occurrence data. Well-known for word analogy tasks, [king] – [man] + [woman] ≈ [queen] What if I told you that everyone who uses Word2vec is throwing half the model away?
  5. 5. Typical vs. Topical Relatedness The IN-IN and the OUT-OUT similarities cluster words that occur in the same context and therefore of the same Type. The overall word2vec model is trained to predict neighboring words. Therefore the IN-OUT similarity clusters words that commonly co- occur under the same Topic.
  6. 6. Typical embeddings for Web search? B. Mitra and N. Craswell. Query auto-completion for rare prefixes. In Proc. CIKM. ACM, 2015.
  7. 7. Which passage is about Albuquerque? Traditionally in Search we look for evidence of relevance of a document to a query in terms of the number of matches of the query terms in the document. But there is useful signal in the non-matching terms in the document about whether the document is really about the query terms, or simply mentions them. A word co-occurrence model can be used to check if the other words in the document support the presence of the matching terms. Passage about Albuquerque Passage not about Albuquerque
  8. 8. Dual Embedding Space Model • All pairs comparison between query and document terms • Document embedding can be pre- computed as the centroid of all the unit vectors of the words in the document • DESMIN-OUT uses IN-embeddings for query words and OUT-embeddings for document words • DESMIN-IN uses IN-embeddings document words as well
  9. 9. IN-OUT vs. IN-IN
  10. 10. Because Cambridge is not an African mammal DESM = ✔ BM25 = ✔ DESM = ✘ BM25 = ✔ DESM = ✔ BM25 = ✘ Query: cambridge
  11. 11. Telescoping Evaluation As a weak ranking feature DESMIN-OUT performs better than BM25, LSA and DESMIN-IN models on a UHRS (Overall) set and a click based test set.
  12. 12. Full retrieval evaluation The DESM models only a specific aspect of document relevance. In the presence of many random documents (distractors) it is susceptible to spurious false positives and needs to be combined with lexical ranking features such as BM25
  13. 13. DESM vs. BM25
  14. 14. Making different mistakes
  15. 15. Questions?

×