Firekqsps

Keyword queries in structured,
product search
M.Chelliah
Director of Engg., Academic collabora?on
Saptarshi Ghosh
Assistant professor, CSE Dept., IIT Kharagpur
1
Flipkart - introduc?on

Product search: business problem
3
Product ?tle
may not have
all keywords
User query
intent hard to
understand
4
Product search: challenges/opportuni?es
Limited/dynamic
inventory
Long/sparse tail of
query distribu?on
Lack of domain
knowledge
Structured
data backend
Rich seman?cs/
aQribute-value pairs
Monetary impact
of purchase
transac?ons

Product search: query understanding
5
Classiﬁca?on
Tagging User intent
Facets
Segmenta?on
Query understanding: agenda
•  Classiﬁca.on
•  Tagging
•  Segmenta?on
•  Facets
6

7
Classifica?on: types Informa?onal
Naviga?onal
Transac?onal vs. topic
8
Classifica?on: customer need
•  Predefined taxonomy
– Ranked list of categories
•  Relevant informa?on
– Even if no exact match

Classiﬁca?on: technical challenges
•  Brevity
–  Not enough clues
–  Few features
•  Ambiguity
–  Belong to mul?ple
categories
9
10
Classiﬁca?on: query enrichment
[Shen ’06a, ‘06b]
•  Search results
–  True meaning from Web
–  Ensemble learning to
remove engine bias
•  Category word match
–  Extended with WordNet
–  (e.g., hardware -> device)

11
Classiﬁca?on: linguis?c analysis
•  [Beitzel ‘07]
•  Selec?onal preference
rules
–  Mined from unlabeled
query logs
•  Syntac?c arguments of
words
–  Belonging to seman?c
classes
–  E.g., object of Eat is edible
12
Classiﬁca?on: training data
[Acero ‘08]
•  Click-thru log
– Infer class
membership of
unlabeled queries
– Through proximity to
labeled ones

13
Classiﬁca?on: product query
1. Enrichment through
snippets
2. Expansion
through similar
queries

3. Transla?ng
labeled product
name to
pseudo queries
Classiﬁca?on: search snippet
•  [Shen ‘09]
•  Context: surrounding
text
–  E.g., sx430 20.0
megapixel
•  Similar to pseudo-
relevance feedback
–  Even if no exact match
between query/result
14

15
Classifica?on: related query
•  Bipar?te graph
–  Query/URL
•  Link weighted by click
number
–  Propor?onal transi?on
probability
•  Random walk for finding
similarity [Mei ‘08]
–  Hiing ?me: from one
node to another
Classifica?on: pseudo query
•  Language difference
between product name/
query
–  Feature space of training/
test data
•  Tuple ngram based
transla?on model
–  Joint probability of source/
target sentence pairs
16

17
Classifica?on: incorpora?ng context
•  [Cao ‘09]
•  CRF captures session
intent
–  neighboring queries
and clicked URLS
•  More flexible than
HMM
–  richer features:Web
directory
18
Classifica?on: local features
•  Query term
–  Weights learned in
training phase
–  Sparse to associate
with category labels
•  Feedback from external
Web directory
–  Pseudo: top M results
–  Implicit: clicked URL

19
Classiﬁca?on: contextual features
•  Direct associa?on
– Adjacent labels
•  Taxonomy-based
– Siblings
•  Classiﬁca?on
•  Tagging
•  Segmenta?on
•  Facets
20

Tagging: beyond keyword search
•  Instant answers
–  directly address user
informa?on needs
•  Avoid clicking
–  into an external
document
21
Tagging: structured search
•  Precise results
–  within a domain
•  Unstructured queries
–  free word order
•  Natural language
commands
–  virtual assistant
applica?on
22

Tagging: business problem
•  Crawl data from RDB
–  Index into less
ambiguous content
•  Extract informa?on
–  Format consistent with
backend
23
24
Tagging: task deﬁni?on
•  Analyze query
–  structure/seman?cs
•  Assign label to
tokens
– Indica?ng which ﬁeld
it belongs to

25
Tagging: seman?c features
•  Intent
–  Head: aQribute, Modiﬁer: value
•  Lexicon
–  Structured data sources
–  Query log mining
Lexicon: challenges
•  Enumerate values
–  for each ﬁeld
•  Query formula?on
–  ambiguous terms
–  Out-of-dic?onary words
26

Lexicon: challenges (contd.)
•  Domain-specific
knowledge
–  Reduces manual
annota?on
•  Human effort s?ll
needed
•  Search topics evolve with
?me
27
Lexicon: seman?c class
•  [Wang ‘09]
•  extrac?on automated
–  using HTML lists
•  precision less important
•  BeQer criteria is trade-
off
–  Maximizing recall
–  Minimizing confusability
28

29
Lexicon: set expansion
•  HTML lists
–  contain instances of a single concept
•  Phrases of classes
–  extracted from annotated queries
•  Used as seeds
–  in graph learning for more instances
Tagging: derived labels
•  [Li 09]
•  [query, product ?tle] as click
event from log
•  Associate query with product
metadata in RDB
–  Through fuzzy match of ?tle
for higher coverage
•  Map source to target schema
–  E.g., color -> aQribute
30

Tagging: Sequence label
•  [Cheung ‘12]
•  Sequence clustering
–  queries with similar intent
•  Recognize paQerns
–  With sequence of seman?c
concepts/lexical items
•  Automa?cally annotate
new queries
–  With intent summary
31
•  Tagging
•  Segmenta.on
•  Facets
32

Segmenta?on: seman?c units
33
2. Divide tokens into
individual phrases
[Bergsma ‘07] 3. Subs?tu?on/
expansion for recall
(e.g., person for man)
4. Phrasal search for
precision
1. Iden?fy structural
rela?onships among
query keywords
Segmenta?on: decision boundary
34
Indicator features, e.g.,

“free online” vs. “sugar free”
Sta?s?cal features, e.g.,

“star wars” vs. “weapons guns”

Segmenta?on: noun phrase
•  Context features
– Neighboring tokens cri?cal for discrimina?on
– E.g., “bank loan” “amor?za?on schedule”, NOT
“loan amor?za?on”
•  Dependency features
– More likely to modify a later token
– E.g., female bus driver
35
36
Segmenta?on: external knowledge
•  Corpus
– Frequent paQerns
[Bergsma ‘07]
•  Wikipedia [Tan ‘08]
[Hagen ‘11]
– Well-formed
concepts
•  Search log [Li ‘11]

Segmenta?on: language models
•  Vector space based on bag-of-
words
–  sparsity makes es?ma?on with
dependencies harder
–  E.g., “new york ?mes”
•  Query term order important
–  Not independent in relevance
computa?on
–  E.g., “NY ?mes” “travel guide”
37
Segmenta?on: n-gram scoring
•  Weighted sum of
frequencies
•  Apply word-length
based normaliza?on
–  Exponen?al factors
rather than length
•  So longer segment has
beQer chance
–  E.g., Toronto blue jays
38

39
Segmenta?on: modeling ambiguity
•  [Li ‘11]
•  Quan?fy uncertainty
–  Probabilis?cally through query/clicked document
•  Capture user preference
–  Likelihood of a structure through collec?ve behavior
Segmenta?on: evalua?on framework
•  Decouple accuracy from how segmenta?on is used for
IR [Roy ‘12]
–  Generate all possible quoted versions
•  Only human relevance judgements for query-URL pairs
40

41
Segmenta?on: e-tail challenges
Phrases permuted
oren
No matching
Wikipedia ?tles

Reformula?on
for null queries

Segmenta?on: user-intent score
[Parikh ‘13]
•  Mul?ple training corpora
–  For each product meta category
–  Frequency taken from most
relevant corpus
•  Unsupervised metric
–  from click paQerns on result set
–  Captures for segmented form,
closeness between intent/result
42

Segmenta?on: improving precision
43
NOT
Segmenta?on: similar queries
44
1. Term-based
2. Unit-based

Segmenta?on: en?ty seeking
•  [Joshi ‘14]
•  Assign a purpose to each segment
–  Base en?ty, rela?on type, target en?ty type, contextual
words
45
•  Tagging
•  Segmenta?on
•  Facets
46

47
Facet: customer need
•  Summariza?on
–  from mul?ple
perspec?ves
•  Set of items
–  which describe a query
aspect
•  Similar to en?ty
aQributes
48
Facet: customer need (contd.)
•  Helps restrict results
–  to relevant items
–  by clarifying intent
•  Displayed along with
results
– for beQer experience

Facet: exploratory search
49
1. Many matches for
catalog query
2. few, popular results
shown on screen
ADV.1: free text
query integrated
with structured
search
4. Itera?ve naviga?on con?nues
un?l desired result is found
3. Facet ﬁlter adds structural
constraint to query
ADV.2: count on
selected facet
serves as context
Facet: discovery-driven analysis
50
1. How surprising a
summary is per an
expecta?on
2. Set expecta?on
on interes?ngness
through naviga?on
3. Measure degree
of surprise
[Dash ‘08]

51
Facet: intent taxonomy
•  [Yin ‘10]
•  Find phrases from
queries
•  Organize into tree
–  Indica?ng equivalent
meaning on same node
–  Parent-child links
represent intent
rela?onship
52
Facet: Intent phrase
•  Represent generic
search sense
– Involving en??es of a
certain class
•  Rela?onship
– E.g., City -> tourism,
government

53
Facet: query dimension
•  [Dou ‘11]
•  Aggregate frequent lists
–  within top results
•  Extract from free text
–  HTML tags and repeat
regions
•  Group into clusters
–  based on contained
items
Facet: QD Miner
54
1. By gender,
brand, type, …
2. Descending:
brand, type, price, …
3. Grouping for dimension
4. Item ranked by
importance (e.g., black)

55
Facet: subtopic mining
•  [Hu ‘12]
•  One per search
– Mul?ple URLs clicked
in a query represent
same sense
•  Subtopic clariﬁca?on
by keyword
– Expansion for intent
56
Facet: clariﬁca?on by keyword

•  Short queries simply noun phrases
– E.g., Harry Shum
•  Expansion for intent
– Label of subtopic
– E.g., Microsor, Jr.

57
Facet: extrac?on from search results
•  [Kong ‘13]
– Noisy candidates
with facet terms
– Terms grouped into
facets
58
Facet: supervised graphical model
•  Labeling for
predic?on
– List item t is a facet
term
– Pair p of 2 list items
has label z of query
facet

Research landscape - Interpreta?on
59
Classifica?on - snippets/similar [Shen ’09]

Tagging: derived labels [Li ‘09]
lexicon [Wang ‘09]

Segmenta?on: snippets/related-pseudo
queries [Parikh ‘13]

Facets: query dimensions [Dou ‘11]
Commerce queries
(exis?ng result)
Classifica?on: neighboring [Cao ‘09],

Tagging: sequen?al labels [Cheung ‘12]

Segmenta?on: en?ty-seeking
queries[Joshi ’14]

Facet: from search results [Kong ‘13]

Web retrieval
(future direc?on)
References
1. Shen, D., Li, Y., Li, X. and Zhou, D. Product query classifica?on. CIKM 2009.
2. Li, Xiao, Ye-Yi Wang, and Alex Acero. "Extrac?ng structured informa?on from user
queries with semi-supervised condi?onal random fields." SIGIR 2009.
3. Wang, Ye-Yi, Raphael Hoffmann, Xiao Li, and Jakub Szymanski. "Semi-supervised
learning of seman?c classes for query understanding: from the web and for the web."
CIKM 2009.
4. Parikh, Nish, Prasad Sriram, and Mohammad Al Hasan. "On segmenta?on of
ecommerce queries." CIKM, 2013.
5. Dou, Zhicheng, et al. "Finding dimensions for queries." CIKM, 2011.
6. Cao, Huanhuan, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen,
and Qiang Yang. "Context-aware query classifica?on.” SIGIR 2009
7. Cheung, Jackie Chi Kit, and Xiao Li. "Sequence clustering and labeling for
unsupervised query intent discovery." WSDM 2012.
8. Li, Yanen, Bo-Jun Paul Hsu, ChengXiang Zhai, and Kuansan Wang. "Unsupervised
query segmenta?on using clickthrough for informa?on retrieval." SIGIR 2011.
9. Kong, Weize, and James Allan. "Extrac?ng query facets from search results." SIGIR
2013
60

Compe??on/opportunity
61
62
Classiﬁca?on: store iden?ﬁca?on
Filter out irrelevant results

Reformula?on: term dele?on
63
Store should NOT be lost
Tagging: intent broadening
64
Relaxing to par?al match of catalog aQributes

Firekqsps

Recommended

Recommended

More Related Content

Similar to Firekqsps

Similar to Firekqsps (20)

Recently uploaded

Recently uploaded (20)

Firekqsps