5. Classifica?on: technical challenges
• Brevity
– Not enough clues
– Few features
• Ambiguity
– Belong to mul?ple
categories
9
10
Classifica?on: query enrichment
[Shen ’06a, ‘06b]
• Search results
– True meaning from Web
– Ensemble learning to
remove engine bias
• Category word match
– Extended with WordNet
– (e.g., hardware -> device)
6. 11
Classifica?on: linguis?c analysis
• [Beitzel ‘07]
• Selec?onal preference
rules
– Mined from unlabeled
query logs
• Syntac?c arguments of
words
– Belonging to seman?c
classes
– E.g., object of Eat is edible
12
Classifica?on: training data
[Acero ‘08]
• Click-thru log
– Infer class
membership of
unlabeled queries
– Through proximity to
labeled ones
8. 15
Classifica?on: related query
• Bipar?te graph
– Query/URL
• Link weighted by click
number
– Propor?onal transi?on
probability
• Random walk for finding
similarity [Mei ‘08]
– Hiing ?me: from one
node to another
Classifica?on: pseudo query
• Language difference
between product name/
query
– Feature space of training/
test data
• Tuple ngram based
transla?on model
– Joint probability of source/
target sentence pairs
16
9. 17
Classifica?on: incorpora?ng context
• [Cao ‘09]
• CRF captures session
intent
– neighboring queries
and clicked URLS
• More flexible than
HMM
– richer features:Web
directory
18
Classifica?on: local features
• Query term
– Weights learned in
training phase
– Sparse to associate
with category labels
• Feedback from external
Web directory
– Pseudo: top M results
– Implicit: clicked URL
15. 29
Lexicon: set expansion
• HTML lists
– contain instances of a single concept
• Phrases of classes
– extracted from annotated queries
• Used as seeds
– in graph learning for more instances
Tagging: derived labels
• [Li 09]
• [query, product ?tle] as click
event from log
• Associate query with product
metadata in RDB
– Through fuzzy match of ?tle
for higher coverage
• Map source to target schema
– E.g., color -> aQribute
30
16. Tagging: Sequence label
• [Cheung ‘12]
• Sequence clustering
– queries with similar intent
• Recognize paQerns
– With sequence of seman?c
concepts/lexical items
• Automa?cally annotate
new queries
– With intent summary
31
Query understanding: agenda
• Classifica?on
• Tagging
• Segmenta.on
• Facets
32
20. 39
Segmenta?on: modeling ambiguity
• [Li ‘11]
• Quan?fy uncertainty
– Probabilis?cally through query/clicked document
• Capture user preference
– Likelihood of a structure through collec?ve behavior
Segmenta?on: evalua?on framework
• Decouple accuracy from how segmenta?on is used for
IR [Roy ‘12]
– Generate all possible quoted versions
• Only human relevance judgements for query-URL pairs
40
24. 47
Facet: customer need
• Summariza?on
– from mul?ple
perspec?ves
• Set of items
– which describe a query
aspect
• Similar to en?ty
aQributes
48
Facet: customer need (contd.)
• Helps restrict results
– to relevant items
– by clarifying intent
• Displayed along with
results
– for beQer experience
26. 51
Facet: intent taxonomy
• [Yin ‘10]
• Find phrases from
queries
• Organize into tree
– Indica?ng equivalent
meaning on same node
– Parent-child links
represent intent
rela?onship
52
Facet: Intent phrase
• Represent generic
search sense
– Involving en??es of a
certain class
• Rela?onship
– E.g., City -> tourism,
government
27. 53
Facet: query dimension
• [Dou ‘11]
• Aggregate frequent lists
– within top results
• Extract from free text
– HTML tags and repeat
regions
• Group into clusters
– based on contained
items
Facet: QD Miner
54
1. By gender,
brand, type, …
2. Descending:
brand, type, price, …
3. Grouping for dimension
4. Item ranked by
importance (e.g., black)