Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

897 views

Published on

"Query Expansion with Locally-Trained Word Embeddings" presented at ACL 2016

Published in: Technology
  • Be the first to comment

Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

  1. 1. Query Expansion with Locally- Trained Word Embeddings Fernando Bhaskar Mitra Nick Craswell Microsoft
  2. 2. p(d) d
  3. 3. p(d) d q p(d|q)
  4. 4. cut global local* cutting tax squeeze deficit reduce vote slash budget reduction reduction spend house lower bill halve plan soften spend freeze billion global: trained using full corpus local: trained using topically- *gas
  5. 5. global local t-SNE projection: top words by ˜p(d|q) (blue: query; red: top words by p(d|q))
  6. 6. • local term clustering [Lesk 1968, Attar and Fraenkel 1977] • local latent semantic analysis [Hull 1995, Hull, 1994; Schutze et al., 1995; Singhal et al., 1997] • local document clustering [Tombros and van Rijsbergen, 2001; Tombros et al., 2002; Willett, 1985] • one sense per discourse [Gale et al., 1992]
  7. 7. target corpus query results
  8. 8. q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …] query = gas tax
  9. 9. q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …] query = gas tax d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]
  10. 10. q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …] query = gas tax … gas petroleum:0.9 indigestion:0.6 … tax tariff:0.7 strain:0.4 … …[ ]W=
  11. 11. q = [gas:1.0 tax:1.0 petroleum:0.8 tariff:0.6 …] query = gas tax d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]
  12. 12. W = UUT U m ⇥ k embedding matrix
  13. 13. p(d) d q p(d|q)
  14. 14. p(d) d q ˜p(d|q)
  15. 15. target corpus query results external corpus query results
  16. 16. U = 8 >>>< >>>: uniform p(d) on the target corpus uniform p(d) on an external corpus p(d|q) on the target corpus p(d|q) on an external corpus
  17. 17. docs words queries trec12 469,949 438,338 150 robust 528,155 665,128 250 web 50,220,423 90,411,624 200
  18. 18. global local target target wikipedia+gigaword* gigaword† google news* wikipedia† *publicly available embedding; †publicly available external corpus target corpus query results external corpus query results target corpus query results target corpus query results external corpus query results
  19. 19. trec12 robust web local vs global NDCG@10 0.0 0.1 0.2 0.3 0.4 0.5 expansion none global local
  20. 20. trec12 robust web local embedding NDCG@10 0.0 0.1 0.2 0.3 0.4 0.5 corpus target gigaword wikipedia
  21. 21. • local embedding provides a stronger representation than global embedding • potential impact for other topic-specific natural language processing tasks • future work • effectiveness improvements • efficiency improvements

×