Han Xiao from Zalando Research in Berlin held this presentation on "Towards an End-to-End Product Search System" on the COMPUTER SCIENCE, MACHINE LEARNING & STATISTICS MEETUP in the Zalando adtech lab Office in Hamburg on 6th September 2017
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - TOWARDS AN END-TO-END PRODUCT SEARCH SYSTEM
1. Towards an End-to-End Product Search System
Zalando Research
Han Xiao
Sept 6, 2017
Animation involved, best view in presentation mode
2. NLP Team @ Zalando Research
Alan Akbik,
Duncan Blythe,
Leonidas Lefakis,
Han Xiao
3. About Me
Han Xiao
Senior Research Scientist @ Zalando Research
2.5y engineering experience in Reco and Search teams @ Zalando
Ph.D. & M.Sc. in Computer Science @ TU Munich
Blog: https://hanxiao.github.io
LinkedIn: https://www.linkedin.com/in/hxiao87/
4. Agenda
1. How most product search systems work?
2. Why do we need an end-to-end product search system?
3. What data do we need to build end-to-end search?
4. Query2Attribute model: character-based LSTM + multi-task learning
5. Discussion
6. Classic product search system with filter query
Indexing
{
“brand”: “Miss Selfridge”,
“category”: “Umhängetasche”,
“color”: “red”,
...
}
Message
Queue
Structured
string index
Filter query
*Animation
brand="nike" AND color="orange"
7. Parsing a full-text query to a filter query
Indexing
{
“brand”: “Miss Selfridge”,
“category”: “Umhängetasche”,
“color”: “red”,
...
}
Message
Queue
Structured
string index
Filter query
Parsing
8. Query understanding as a pipeline (ideal)
tokenize
lemmatize
spell-correct
recognize
named-entity
disambiguate
Filter queryquery-builder
recognize synonym
& acronym
Full text query normalize
Queryparsing
"nikke sport whiteschoe"
brand="Nike" AND
category=("sportshoe"
OR "sport" OR "shoe")
AND color="white"
9. Query understanding as a DAG (in practice)
tokenize
lemmatize
spell-correct
recognize
named-entity
disambiguate
Filter queryquery-builder
Full text query normalize
Queryparsing
recognize synonym
& acronym
"nikke sport whiteschoe"
brand="Nike" AND
category=("sportshoe"
OR "sport" OR "shoe")
AND color="white"
10. Pros & cons of a pipeline system
Upside: intuitive, modular, many off-the-shelf packages, easy to collaborate
● Fragile
● Complicated dependency
● Not straightforward to improve overall search experience
● Difficult to scale out on other languages
11. Question 1:
If finding the right article is the final goal,
then why should we even care about spell-checking?
12. Problems of a symbolic-based system
● Limited interpretability;
● Hard-coding rules to enable acronyms, synonyms, etc.
Can't scale to different appdomains;
● No matter how well the intention is, the overall system will turn into a set of
heuristics.
Upside: easy to implement, efficient, very well-studied
13. Question 2:
How can we associate “fur mamas” with
“Schwangerschaftsmode”
without hard-coding for each language?
14. Motivation of building end-to-end product search
Question 1:
If finding the right article is the final goal,
then why should we even care about
spell-checking?
Question 2:
How can we associate “fur mamas” with
“Schwangerschaftsmode”
without hard-coding on each domain?
eliminate
components in the
pipeline
find better
representation for
query and product
An end-to-end product
search system with deep
learning
more robust
easier to maintain
more scalable
simpler
architecture
smarter
15. Classical system vs end-to-end product search system
Query
Symbolic
representation
Product
Symbolic
representation
① indexing② parsing
③ matching
offline
Query
Latent
representation
Product
Latent
representation
matching
offline
deep learning deep learning
*Animation
17. Three types of data sources
● Query2SKU
● Crowdsourcing annotations
● Customer reviews
User-generated
content
Product
18. Extracting Query↔SKU mapping from message queue
receive-query:
"denim shirt"
search-result
user
type in search-box
see search
result page
retrieval-search-result
click a product
click-through:
SKU00000-001
retrieve-reco
-result
"denimshirt"
Message
Queue
Time
Time
{
query: "denim shirt"
skus: ["SKU00000-001", "SKU00000-002"]
}
see PDP
search-result PDP PDP
click-through:
SKU0000-002
click on reco
*Animation
19. Example of Query → SKU map
{"query":"ananas",
"skus":[
{"id":"CE321D0HP-A11","freq":371},
{"id":"RL651E02D-F11","freq":273},
{"id":"EV411AA0K-T11","freq":243},
{"id":"L1211E001-A11","freq":208},
{"id":"ES121D0ON-C11","freq":180},
...
{"id":"TO226K009-I11","freq":2},
{"id":"BH523F01J-A11","freq":2},
{"id":"MOC83C00C-J11","freq":1},
{"id":"MOC83C001-J11","freq":1},
{"id":"HG223F04A-A11","freq":1}]}
23. Classical system vs end-to-end product search system
Query
Symbolic
representation
SKU
Symbolic
representation
① indexing② parsing
③ matching
offline
Query
Latent
representation
Product
Latent
representation
matching
offline
deep learning deep learning
Query
Symbolic
representation
Product
Symbolic
representation
matching
offline
deep learning
Classic Query2Attribute Query2SKU
➢ Leverage current indexing and
matching system
➢ Good interpretability
➢ Completely end-to-end
➢ Require efficient matching
algorithm
*Animation
24. Translating a full-text query to a filter query
1. input: "sports"
2. output: brand, color, gender, category distributions
3. translating results to a filter query:
"brand=Nike Performance AND color=schwarz AND category=(Sport OR Sportbekleidung)"
brand color gender category
25. Character-based RNN with multi-task learning
Brandclasscolorclass
*Animation
Fully-connected
Fully-connected
LSTM LSTM LSTM...
q u y
character-embedding
...
encoder
29. Discussion: pros/cons of an end-to-end product search
Query
Symbolic
representation
Product
Symbolic
representation
① indexing② parsing
③ matching
offline
Query
Latent
representation
Product
Latent
representation
matching
offline
deep learning deep learning
Classic Deep learning based End2End
Scalable,
maintainable, data-driven
Need a lot of data, comp.
resources