Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Entity extraction for product search


Published on

This talk was given during Activate Conference 2018.
A user looking for “awesome smartphone 2018” is likely really after “+review:awesome +category:smartphone +release_date:2018”. A clever use of (e)dismax might get us pretty close to where we want, but it’s not real query understanding. There are other ways, of course, like training a model that will be based on the keyword, guess which field it’s looking into. In this session, we’ll discuss some of the ways, their pros, and cons and how you’d implement them on top of Solr. We’ll specifically look into existing open-source tools that you can re-use in order to build such a system.
Learn more on

Published in: Engineering

Entity extraction for product search

  1. 1. sony headphones mdr1000x title^100 cat^10 model /select defType=edismax title:sony^100 title:headphones^100 title:mdr1000x^100 cat:sony^10 cat:headphones^10 cat:mdr1000x^100 model:sony model:headphones model:mdr1000x
  2. 2. index add_field_type: name: tag postingsFormat: FST50 add_field: name: name_tag type: tag {"name_tag":"Thomas Jefferson"} Thomas Jefferson (April 13, [O.S. April 2] 1743 – July 4, 1826) was an American Founding Father who was the principal author of the Declaration of Independence ... thomas jefferson startOffset: 100 endOffset: 116 startOffset: 215 endOffset: 231 /tagger
  3. 3. Let's dive in!
  4. 4. doc index query
  5. 5. Let's dive in!
  6. 6. {"Sony headphones", {'entities': [(0,4,'Manufacturer')]}} {"Beats headphones", {'entities': [(0,5,'Manufacturer')]}} Text Label Gradient Model Label predict
  7. 7. trained model spaCy load sony headphones mdr1000x manufacturer: sony q=manufacturer:sony AND (headphones mdr1000x)
  8. 8. Let's dive in!
  9. 9. Let's dive in!
  10. 10. ⇒ MDR1000X is nice WH1000X is nicesimilar
  11. 11. Let's dive in!
  12. 12. expr=significantTerms sony 0,7981 philips 0,6534 beats 0,5342 features recalculate scores q=...&rq={!ltr model=manufacturerTrainingModel reRankDocs=20} features model = + weights Model store
  13. 13. Let's dive in!