Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Exploring Session Context using Distributed Representations of Queries and Reformulations (SIGIR 2015)

1,255 views

Published on

Search logs contain examples of frequently occurring patterns of user reformulations of queries. Intuitively, the reformulation "san francisco" → "san francisco 49ers" is semantically similar to "detroit" →"detroit lions". Likewise, "london"→"things to do in london" and "new york"→"new york tourist attractions" can also be considered similar transitions in intent. The reformulation "movies" → "new movies" and "york" → "new york", however, are clearly different despite the lexical similarities in the two reformulations. In this paper, we study the distributed representation of queries learnt by deep neural network models, such as the Convolutional Latent Semantic Model, and show that they can be used to represent query reformulations as vectors. These reformulation vectors exhibit favourable properties such as mapping semantically and syntactically similar query changes closer in the embedding space. Our work is motivated by the success of continuous space language models in capturing relationships between words and their meanings using offset vectors. We demonstrate a way to extend the same intuition to represent query reformulations.

Furthermore, we show that the distributed representations of queries and reformulations are both useful for modelling session context for query prediction tasks, such as for query auto-completion (QAC) ranking. Our empirical study demonstrates that short-term (session) history context features based on these two representations improves the mean reciprocal rank (MRR) for the QAC ranking task by more than 10% over a supervised ranker baseline. Our results also show that by using features based on both these representations together we achieve a better performance, than either of them individually.

Paper: http://research.microsoft.com/apps/pubs/default.aspx?id=244728

Published in: Science
  • Unlock Her Legs(Official) $69 | Get 90% Off + 8 Special Bonus? ▲▲▲ http://t.cn/AijLRbnO
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • My special guest's 3-Step "No Product Funnel" can be duplicated to start earning a significant income online. ■■■ http://dwz1.cc/G9GauKYg
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Secrets To Working Online, Hundreds of online opportunites you can profit with today! ♥♥♥ http://ishbv.com/ezpayjobs/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Exploring Session Context using Distributed Representations of Queries and Reformulations (SIGIR 2015)

  1. 1. Exploring Session Context using Distributed Representations of Queries and Reformulations Bhaskar Mitra Microsoft (Paper: http://research.microsoft.com/apps/pubs/default.aspx?id=244728)
  2. 2. Motivation
  3. 3. Intuitively, the following query reformulation (or intent shift) is similar to london things to do in london new york new york tourist attractions
  4. 4. …and is similar to san francisco san francisco 49ers detroit detroit lions
  5. 5. …but is NOT similar to movies new movies york new york
  6. 6. Questions • Can we learn intuitively “meaningful” vector representations for query reformulations? • Can we use it for modelling session context for tasks such as query auto-completion (QAC)?
  7. 7. Background
  8. 8. Session Context for QAC muscle cars f facebook fandango forever 21 fox news f facebook ford ford mustang fast and furious or Previous query
  9. 9. Session Context What’s the more likely query after “big ben”? Topical disambiguation. vs Transition likelihood (symmetrical) (asymmetrical) big ben big ben height london clock tower
  10. 10. Distributed Representation A (low-dimensional) vector representation for items (e.g., words, sentences, images, etc.) such that all the values in a vector are necessary to determine the exact item. Imaginary example: Also called embeddings. 6 3 0 4 1 7 2 8
  11. 11. As opposed to… One-hot representation scheme, where all except one of the values of the vector are zeros. Imaginary example: 0 1 0 0 0 0 0 1
  12. 12. For Neural Networks… Localist Representations • One neuron to represent each item • One-to-one relationship • For few items / classes only Distributed Representations • Multiple neurons to represent each item • Many-to-many relationship • For many items with shared attributes
  13. 13. Vector Algebra on Word Embeddings Word2vec linguistic regularities vector(“king”) – vector(“man”) + vector(“woman”) = vector(“queen”) T. Mikolov, et al. Efficient estimation of word representations in vector space. arXiv preprint, 2013. T. Mikolov, et al. Distributed representations of words and phrases and their compositionality. NIPS, 2013.
  14. 14. Convolutional Latent Semantic Model • DNN trained on clickthrough data • Maximize cosine similarity • Tri-gram hashing over raw terms • Convolutional-Pooling structure P.-S. Huang, et al. Learning deep structured semantic models for web search using clickthrough data. CIKM, 2013. Y. Shen, et al. Learning semantic representations using convolutional neural networks for web search. WWW, 2014.
  15. 15. Main Contributions
  16. 16. Main Contributions • CLSM models trained on Session Pairs (SP) • Demonstrate semantic regularities in the CLSM query embedding space • Leverage the regularities to explicitly represent query reformulations as vectors • Improved Mean Reciprocal Rank (MRR) for session context-aware QAC ranking by more than 10% using CLSM based features
  17. 17. Training on Session Pairs • Pairs of consecutive queries from search sessions • Pre-Query and Post-Query model • Symmetric vs. Asymmetric models q1 q2 q3 q4 Advantages 1. Demonstrates higher levels of reformulation regularities (discussed next) 2. Train on time-stamped query log, no need for clickthrough data
  18. 18. Vector Algebra on Query Embeddings (SP)
  19. 19. More Examples
  20. 20. Embeddings for Query Reformulations Explicit vector representation k-means clustering of 65K in- session query pairs shows intuitive clusters
  21. 21. Reformulation Likelihood How crowded is the neighborhood of a reformulation in the embedding space?
  22. 22. Session Context-Aware QAC • Evaluation setup based on • Temporally separated background, train, validation and test sets • Sample queries and extract all possible prefixes • Submitted query as ground truth • Re-rank top N suggestion candidates using a LambdaMART model • Two testbeds: search logs from AOL & Bing M. Shokouhi. Learning to personalize query auto-completion. SIGIR, 2013.
  23. 23. Features Non-contextual features Prefix length, suggestion length, vowels- alphabet ratio, contains numeric, etc. N-gram similarity features Character trigram similarity between previous queries and suggestion candidate Pairwise frequency feature Pairwise frequency based on popular sessions pairs in the background data CLSM topical similarity features CLSM similarity between previous queries and suggestion candidate CLSM reformulation features Values along each dimension of the reformulation vector based on previous query and suggestion candidate
  24. 24. Results LambdaMART-based baseline performs better than Popularity-based baseline
  25. 25. Results The models with CLSM features perform better than the model without CLSM features
  26. 26. Results Reformulation + Similarity features perform better than only Similarity features
  27. 27. Results Session Pairs based CLSMs perform better than Query-Document based CLSM
  28. 28. Results Session Pairs based CLSMs mostly win due to Reformulation based features
  29. 29. Effect of Prefix Length, History Length and Vector Dimensions
  30. 30. Example Queries
  31. 31. Summary of Contributions • CLSM models trained on Session Pairs (SP) • Demonstrate semantic regularities in the CLSM query embedding space • Leverage the regularities to explicitly represent query reformulations as vectors • Improved Mean Reciprocal Rank (MRR) for session context-aware QAC ranking by more than 10% using CLSM based features
  32. 32. Potential Future Work • Studying search trails (White et. al.) in the embedding space • Query change retrieval model (Guan et. al.) using reformulation embeddings • Generating user embeddings for search personalization • Study how reformulations vary by user expertise and device
  33. 33. QUESTIONS?

×