The Query Flow Graph: Model and Applications

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    The Query Flow Graph: Model and Applications - Presentation Transcript

    1. The query- flow graph: M Paolo Boldi Y Francesco Bonchi Y Carlos Castillo Y Debora Donato model and Y Aris Gionis M Sebastiano Vigna M Università degli studi applications di Milano, Italy Y Yahoo! Research Barcelona, Spain
    2. The query- flow graph: M Paolo Boldi Y Francesco Bonchi Y Carlos Castillo Y Debora Donato model and Y Aris Gionis M Sebastiano Vigna M Università degli studi applications di Milano, Italy Y Yahoo! Research Barcelona, Spain
    3. R. Baeza-Yates: “Graphs from search engine queries”. SOFSEM 2007.
    4. Basic concepts User session ebay autotrader used fox vw ryanair barcelona places barcelona rent barcelona Radlinksi & Joachims: “Query chains: learning to rank from implicit feedback”. KDD 2005.
    5. Basic concepts User session ebay autotrader used fox vw ryanair barcelona places barcelona rent barcelona Chain/Mission #1 Chain/Mission #2 Radlinksi & Joachims: “Query chains: learning to rank from implicit feedback”. KDD 2005.
    6. Whence come the chains? ebay autotrader used fox vw barcelona places barcelona rent barcelona soccer barcelona soccer barcelona fc ... P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    7. Whence come the chains? Query-flow graph ebay autotrader used fox vw barcelona places barcelona rent barcelona soccer barcelona soccer barcelona fc ... P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    8. Query-flow graph ebay autotrader used fox vw barcelona places barcelona rent s t soccer barcelona soccer barcelona ... barcelona fc Query chains P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    9. The query-flow graph ● Directed graph ● Built from a query log ● Nodes are queries P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    10. Frequencies of query transitions
    11. Frequencies of query transitions
    12. Transitions are non-symmetrical barclona barcelona 93% of the time barcelona rent barcelona 7% of the time And when it happens, the frequencies are not correlated
    13. The query-flow graph ● Directed graph ● Built from sessions/chains ● Nodes are queries ● Edges hold information, e.g.: – Aggregate features from query transitions – Weights derived from those features P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    14. Statistics on query transitions barcelona places barcelona rent barcelona Aggregate Data for (q,q') multiple users ● Frequency ● Relative to other transitions from q ● Bags of terms and character n-grams ● Cosine similarity, Jaccard coefficient, ... ● Average time ● Average position in session ● ... Radlinksi & Joachims: “Query chains: learning to rank from implicit feedback”. KDD 2005. R. Jones & K. Klinkner “Beyond session timeout”. CIKM 2008.
    15. Our query-flow graph ● Directed graph ● Built from sessions/chains ● Nodes are queries ● Edges hold information, e.g.: – Aggregate features from query transitions – Weights derived from those features P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    16. Weighting schemes ● Interested in weights such as: – w(q,q') = chain(q,q') ● -or- – w(q,q') = freq(q,q') / freq(q) P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    17. Weighting schemes ● Interested in weights such as: – w(q,q') = chain(q,q') ● -or- – w(q,q') = freq(q,q') / freq(q) ● chain(q,q') is the chaining probability – Pr(q, q' in same chain | q, q' in same session) ● freq(q,q') is the frequency of the transition P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    18. Weighting schemes ● Interested in weights such as: – w(q,q') = chain(q,q') ● -or- – w(q,q') = freq(q,q') / freq(q) ● chain(q,q') is the chaining probability – Pr(q, q' in same chain | q, q' in same session) ● freq(q,q') is the frequency of the transition P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    19. Estimating chain(q,q') barcelona places barcelona rent barcelona Data for (q,q') ● Frequency ● N-grams and terms similarity ● Average delta time ● Average position in session ● Clicks between q and q' ● ... R. Jones & K. Klinkner “Beyond session timeout”. CIKM 2008.
    20. Estimating chain(q,q') barcelona places barcelona rent barcelona Data for (q,q') Training labels ● Frequency ● 5,000 consecutive query pairs N-grams and terms similarity + Manually labelled ● ● ● Average delta time ● 2/3 are same-chain ● Average position in session ● Clicks between q and q' ● ... P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    21. Estimating chain(q,q') barcelona places barcelona rent barcelona freq(q,q') = 1: different-chain if Data for (q,q') α time ● Frequency -β sim.ngrams.jaccard ● N-grams and terms similarity -γ sim.terms.jaccard + δ > θ ● Average delta time same-chain otherwise ● Average position in session ● Clicks between q and q' ● ... freq(q,q') > 1: C5.0 rules, + Training labels 8 rules in total P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    22. Applications of the query-flow graph ● 1. Session → chains segmentation ● 2. Query recommendation P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    23. Applications of the query-flow graph ● 1. Session → chains segmentation ● 2. Query recommendation P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    24. There are intertwined chains pointui forum audi ipswich golfers elbow cox ipswich (photos: commons.wikimedia.org)
    25. Efficient session breaking 1)1. sort the session 2)2. break into contiguous chains P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    26. Sorting: ATSP formulation pointui forum audi ipswich golfers elbow cox ipswich P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    27. ATSP formulation pointui forum audi ipswich golfers elbow cox ipswich P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    28. ATSP formulation pointui forum audi ipswich golfers elbow cox ipswich Weights given by chain(q,q') P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    29. ATSP solution pointui forum audi ipswich golfers elbow cox ipswich Maximize Π chain(qi,qi+1) s.t. TSP constraints Greedy algorithm with d-steps lookahead P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    30. ATSP solution pointui forum audi ipswich cox ipswich golfers elbow P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    31. Breaking using a threshold pointui forum audi ipswich cox ipswich golfers elbow P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    32. Breaking using a threshold pointui forum audi ipswich cox ipswich golfers elbow Chain #1 Chain #2 P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    33. Testing set ● Queries from 1,458 users from UK – Collected queries over 3 months – Selected users with >1 query – Partitioned into chains manually ● 415 (28%) were single-chain ● 3.6 chains per user on average
    34. Evaluation wrt ground truth ● Sorting strategy – Rand index of omniscient breaking strategy ● Breaking strategy – Rand index – – – Rand index = pair-wise agreements / ( n ) 2 ● P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    35. Evaluation of reordering strategy Strategy Optimal Rand Index Ours 0.99 Original 0.97 Shuffled 0.92 Average over all users
    36. Evaluation of reordering + breaking Strategy Rand Index Ours 0.90 Baseline 0.85 Baseline: 30 min. timeout
    37. Evaluation (cont.) Better rand index for the “difficult” chains. Ours and baseline score 0.85 and 0.71 if we exclude the cases where baseline has score of 1 Ours Baseline: 30' timeout
    38. Applications of the query-flow graph ● 1. Session → chains segmentation ● 2. Query recommendation P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    39. Graph-based recommendation strategies ● We use w(q,q') = freq(q,q') / freq(q) 1. Maximum weight from current query (single-step) 2. Random walk with restart to current query 3. Random walk with restart, gain wrt uniform restart P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    40. Sample recommendations query “apple” Max. weight Random walk Random walk gain t t apple apple ipod apple apple fruit apple store apple ipod apple ipod apple trailers apple store apple belgium amazon apple trailers eating apple apple mac google apple.nl itunes amazon apple monitor pc world argos apple usa argos itunes apple jobs currys pc world apple movie download t : end of session P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    41. Sample recommendations query “jeep” Max. weight Random walk Random walk gain t t jeep jeep cherokee jeep jeep trails jeep grand cherokee jeep cherokee jeep kinderkleding jeep wrangler jeep grand rover jeep compass land rover bmw jeep cherokee landrover jeep wrangler swain and jones jeep ebay land rover jeep bag chrisler landrover country living spring show bmw chrysler buy range rover sport XS nissan google craviotto snare t : end of session P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    42. History-based recommendation ● Random walk – Restart to all the queries in the user's history – Do not restart uniformly ● Restart to more recent queries more frequently
    43. History-based recommendations (current+1 query of context) banana → apple → ? beatles → apple → ? apple → ? banana beatles apple apple apple apple ipod usb no apple ipod apple trailers banana cs scarring apple store giant chocolate bar srg peppers artwork apple mac where is the seed in a nut ill get you apple fruit banana shoe bashles apple usa fruit banana dundee folk songs apple ipod nano banana cloths the beatles love album apple.com/ipod eating bugs place lyrics beatles t P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    44. Conclusions and future work ● Session segmentation – Simple time-outs OK ... but we can do much better – Particularly important when dealing with super-sessions ● Recommendation with history – Often one of the two queries “swallows” the other – Requires tuning e.g. a better way of combining scores ● Current work: model transition types P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    45. Transition types barcelona f.c. Parallel move Generalize Specialize Specialize cheap barcelona hotels Correct barcelona hotels brcelona barcelona Generalize luxury barcelona hotels Specialize P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    46. Types of transition inside a chain ● Error correction – startford cinema → stratford cinema ● Rephrasing – wikipedia english → english wikipedia – robbs celebrities → robbs celebs Rieh and Xie: “Analysis of multiple query reformulations”. IPM 2006.
    47. Types of transition inside a chain ● Generalization (“zoom out”) – barcelona hotels → barcelona ● Specialization (“zoom in”) – barcelona soccer → barcelona camp nou ● Parallel move (“pan”) – barcelona → rome Rieh and Xie: “Analysis of multiple query reformulations”. IPM 2006. Zoom-in, zoom-out, pan, names comes from Y!SAMA
    48. Why model refinement types? ● Improved segmentation – Some refinement sequences are more frequent than others ● Improved recommendations P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008.
    49. Q&A

    + Carlos CastilloCarlos Castillo, 2 years ago

    custom

    1293 views, 1 favs, 0 embeds more stats

    Paolo Boldi, Francesco Bonchi, Carlos Castillo, Deb more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 1293
      • 1293 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 24
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories