Query Flow Graphs

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Query Flow Graphs - Presentation Transcript

    1. Francesco Bonchi, Carlos Castillo , Debora Donato, Aris Gionis Yahoo! Research · Barcelona, Spain Query-Flow Graphs Paolo Boldi, Sebastiano Vigna Universita degli Studi · Milan, Italy
    2. Before we begin ...
      • Users trust in search engines
      • We take their privacy very seriously
      • Research datasets
    3. R. Baeza-Yates: “Graphs from search engine queries”. SOFSEM 2007.
    4. 1. Building a query-flow graph 2. Application to session segmentation 3. Query-reformulation types 4. Application to query recommendation 5. Conclusions
    5. 1. Building a query-flow graph 2. Application to session segmentation 3. Query-reformulation types 4. Application to query recommendation 5. Conclusions
    6. User session Basic concepts Radlinksi & Joachims: “Query chains: learning to rank from implicit feedback”. KDD 2005. ebay autotrader used fox vw ryanair barcelona places barcelona rent barcelona
    7. Chain/Mission #1 Chain/Mission #2 User session Basic concepts Radlinksi & Joachims: “Query chains: learning to rank from implicit feedback”. KDD 2005. ebay autotrader used fox vw ryanair barcelona places barcelona rent barcelona
    8. Whence come the chains? ... ebay autotrader used fox vw barcelona places barcelona rent barcelona barcelona soccer barcelona fc soccer
    9. Whence come the chains? Query-flow graph ... ebay autotrader used fox vw barcelona places barcelona rent barcelona barcelona soccer barcelona fc soccer
    10. Query-Flow Graph P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The Query-Flow Graph”. CIKM 2008. ebay autotrader used fox vw barcelona hotel barcelona rent barcelona barcelona soccer barcelona fc soccer s t ... ...
    11. The query-flow graph
      • Directed graph
      • Built from a query log
      • Nodes are queries
    12. Frequencies of query transitions
    13. Frequencies of query transitions
    14. Transitions are non-symmetrical 93% of the time 93% of the time 7% of the time And when it happens, the frequencies are not correlated barclona barcelona barcelona rent barcelona
    15. The query-flow graph
      • Directed graph
      • Built from sessions/chains
      • Nodes are queries
      • Edges hold information, e.g.:
        • Aggregate features from query transitions
        • Weights derived from those features
    16. Statistics on query transitions Data for (q,q')
        • Frequency
          • Relative to other transitions from q
        • Bags of terms and character n-grams
          • Cosine similarity, Jaccard coefficient, ...
        • Average time
        • Average position in session
        • ...
      Aggregate multiple users R. Jones & K. Klinkner “Beyond session timeout”. CIKM 2008. Radlinksi & Joachims: “Query chains: learning to rank from implicit feedback”. KDD 2005. barcelona places barcelona rent barcelona
    17. Our query-flow graph
      • Directed graph
      • Built from sessions/chains
      • Nodes are queries
      • Edges hold information, e.g.:
        • Aggregate features from query transitions
        • Weights derived from those features
    18. Weighting schemes
      • Interested in weights such as:
        • w(q,q') = freq(q,q') / freq(q)
              • -or-
        • w(q,q') = chain(q,q')
    19. Weighting schemes
      • Interested in weights such as:
        • w(q,q') = freq(q,q') / freq(q)
              • -or-
        • w(q,q') = chain(q,q')
      • freq(q,q') is the frequency of the transition
      • chain(q,q') is the chaining probability
        • Pr(q, q' in same chain | q, q' in same session)
    20. Example with observed transition probabilities
    21. 1. Building a query-flow graph 2. Application to session segmentation 3. Query-reformulation types 4. Application to query recommendation 5. Conclusions
    22. Weighting schemes
      • Interested in weights such as:
        • w(q,q') = freq(q,q') / freq(q)
              • -or-
        • w(q,q') = chain(q,q')
      • freq(q,q') is the frequency of the transition
      • chain(q,q') is the chaining probability
        • Pr(q, q' in same chain | q, q' in same session)
    23. Estimating chain(q,q') Data for (q,q')
        • Frequency
        • N-grams and terms similarity
        • Average delta time
        • Average position in session
        • Clicks between q and q'
        • ...
      R. Jones & K. Klinkner “Beyond session timeout”. CIKM 2008. barcelona places barcelona rent barcelona
    24. Estimating chain(q,q') Data for (q,q')
        • Frequency
        • N-grams and terms similarity
        • Average delta time
        • Average position in session
        • Clicks between q and q'
        • ...
      Training labels
        • 5,000 consecutive query pairs
        • Manually labelled
        • 2/3 are same-chain
      + barcelona places barcelona rent barcelona
    25. Estimating chain(q,q') Data for (q,q')
        • Frequency
        • N-grams and terms similarity
        • Average delta time
        • Average position in session
        • Clicks between q and q'
        • ...
      freq(q,q') = 1: different-chain if α time - β sim.ngrams.jaccard - γ sim.terms.jaccard + δ > θ same-chain otherwise freq(q,q') > 1: C5.0 rules, 8 rules in total + Training labels barcelona places barcelona rent barcelona
    26. There are intertwined chains (photos: commons.wikimedia.org) pointui forum audi ipswich golfers elbow cox ipswich
    27. Efficient session breaking
      • 1. sort the session
      • 2. break into contiguous chains
    28. Sorting: ATSP formulation pointui forum audi ipswich golfers elbow cox ipswich
    29. ATSP formulation pointui forum audi ipswich golfers elbow cox ipswich
    30. ATSP formulation Weights given by chain(q,q') pointui forum audi ipswich golfers elbow cox ipswich
    31. ATSP solution Maximize Π chain(q i ,q i+1 ) s.t. TSP constraints Greedy algorithm with d-steps lookahead pointui forum audi ipswich golfers elbow cox ipswich
    32. ATSP solution pointui forum audi ipswich golfers elbow cox ipswich
    33. Breaking using a threshold pointui forum audi ipswich golfers elbow cox ipswich
    34. Breaking using a threshold Chain #1 Chain #2 pointui forum audi ipswich golfers elbow cox ipswich
    35. Testing set
      • Queries from 1,458 users from UK
        • Collected queries over 3 months
        • Selected users with >1 query
        • Partitioned into chains manually
      • 415 (28%) were single-chain
      • 3.6 chains per user on average
    36. Evaluation wrt ground truth
      • Sorting strategy
        • Rand index of omniscient breaking strategy
      • Breaking strategy
        • Rand index
        • Rand index = pair-wise agreements / ( )
      n 2
    37. Evaluation of reordering strategy
      • Average over all users
    38. Evaluation of reordering + breaking
      • Baseline: 30 min. timeout
    39. Evaluation (cont.) Baseline: 30' timeout Ours Better rand index for the “difficult” chains. Ours and baseline score 0.85 and 0.71 if we exclude the cases where baseline has score of 1
    40. Session segmentation remarks
      • Simple time-outs OK ... but we can do much better
      • Global information in the query-flow graph can help
    41. 1. Building a query-flow graph 2. Application to session segmentation 3. Query-reformulation types 4. Application to query recommendation 5. Conclusions
    42. Rieh, S. Y. and Xie, H: “Analysis of multiple query reformulations on the web”. IPM 32 (3) 2006. Query-reformulation types barcelona cheap barcelona hotels barcelona hotels luxury barcelona hotels Specialize Generalize brcelona Correct barcelona f.c. Generalize Parallel move Specialize Specialize
    43. Reformulation types
      • Correction
        • startford cinema -> stratford cinema
      • Generalization (“zoom out”)
        • barcelona hotels -> barcelona
      • Specialization (“zoom in”)
        • barcelona soccer -> barcelona camp nou
      Zoom-in, zoom-out, pan, names comes from Y!SAMA Rieh and Xie: “Analysis of multiple query reformulations”. IPM 2006.
    44. Reformulation types
      • Rephrasing
        • wikipedia english -> english wikipedia
        • robbs celebrities -> robbs celebs
      • Parallel move
        • barcelona -> rome
      Rieh and Xie: “Analysis of multiple query reformulations”. IPM 2006.
    45. P. Boldi, F. Bonchi, C. Castillo, S. Vigna: “From 'dango' to 'japanese cakes'”. 2009. G S C P
    46. P. Boldi, F. Bonchi, C. Castillo, S. Vigna: “From 'dango' to 'japanese cakes'”. 2009. Query-reformulation classifier
    47. Classifier: 92% accuracy
    48. Example 1/4: “ cat ”
    49. Example 2/4: “ peanut ”
    50. Example 3/4: “ surf board ”
    51. Example 4/4: “ bruce springsteen ”
    52. Finding query-reformulation patterns
      • Sessions can be seen as strings of QRTs:
        • XSSX
        • XCGSSX
        • XCSGSGSGPX
      • Patterns can be identified in these strings
      • Data: UK and US query logs, early 2008
    53. Chain length distribution
    54. QRT distribution
    55. Interesting patterns
    56. General observations on QRTs
      • Parallel moves (50%-60%)
        • The most frequent class
      • Specializations (30%-40%)
      • Generalizations (5%-10%)
        • Frequently appear together in alternating order
      • Corrections (5%-10%)
        • More frequent at the beginning or end of a chain
    57. 1. Building a query-flow graph 2. Application to session segmentation 3. Query-reformulation types 4. Application to query recommendation 5. Conclusions
    58. Query recommendation setting
      • Dataset: “Spring 2006 Microsoft data”
        • Provided by WSCD'09 workshop organizers
        • 15M queries from a 1-month period
      • Automatic chains and reformulations labels
        • Obtained with models from previous works
      • Recommendations based on random walks
      • WebGraph (graph) SUX4J (hashing)
    59. Implemented systems
      • Slices
        • Queryflow-{G, S, P, C, SP, GSP, GSPC, ...}
      • Composed graphs
        • Queryflow-{(SS T ), (SG), ...}
        • Weight is weight of heaviest length-2 path
    60. Random walks
      • Number of steps: 1, 5, 10 (2, 6, 12)
      • Scoring: PPR ( Absolute ) or PPR/PR ( Relative )
      • Self-transition with prob. 0.9, no random jumps
    61. Click graphs Weights based on clicks Two weighting schemes: N. Craswell and M. Szummer: “Random walks on the click graph” SIGIR 2007. d 1 d 2 d 3 q 1 q 2 d 4 d 5 q 3 q 4
    62. Example: “ sudoku ”
      • sudoku
      • free sudoku
      • sudoku shack
      • daily sudoku
      • msn sudoku
      • rci
      • sudoku
      • sudoku #5 easy
      • brainy puzzle
      • sudoku game
      • SuDoKu
      • How to play Sudoku
      QueryFlow-S, 10, Abs. QueryDocument-Bwd, 6, Rel.
    63. Example: “ myspace layouts ”
      • myspace layouts
      • free myspace layouts
      • myspace contact tables
      • myspace codes
      • premade myspace layouts
      • myspace icons
      • myspace layouts
      • myspace codes + hide last login
      • myspace and layouts
      • dissect virtual frogs
      • Myspace Editors Cars
      • MYSPACE GENERATOR
      QueryFlow-S, 10, Abs. QueryDocument-Bwd, 6, Rel.
    64. Evaluation method
      • Sampled 114 queries (freq 700 - 15,000)
      • Run all systems for each query
      • Create a pool: union of top-5 recommendations
        • ~ 6,000 query recommendations in total
      • Evaluate each recommendation in the pool
        • Useful, somewhat useful, not useful
    65. Assessment
      • Example, query “ cnn news ”:
    66. Distribution of assessments n = 6 093
    67. Inter-assessor agreement n = 560
    68. Inter-assessor agreement n = 560 Good Bad
    69. Precision @ 5 P. Boldi, F. Bonchi, C. Castillo, D. Donato, S. Vigna: “From 'dango' to 'japanese cakes'”. WSCD 2009.
    70. Precision @ 5
    71. Diversity score
      • Take one query
      • Get top-5 recommendations that are good
      • Get top-5 search results for each of them
      • Count how many distinct URLs appear
        • Discard URLs from initial query, this is in [0,25]
    72. Diversity scores
    73. 1. Building a query-flow graph 2. Application to session segmentation 3. Query-reformulation types 4. Application to query recommendation 5. Conclusions
    74. Conclusions (1/3)
      • Query reformulation types
        • Classification can be improved using the graph itself
        • Learn topic simultaneously
        • Parallel moves are interesting, even if risky
        • Refine the categories depending on the application
    75. Conclusions (2/3)
      • Query recommendations
        • Recognizing chains is important
        • Fewer slices is better than more
        • Few iterations are enough
        • Why 5? Let the system decide how many...
    76. Conclusions (3/3) (history-based recommendation)
    77. Q&A barcelona cheap barcelona hotels barcelona hotels luxury barcelona hotels Specialize Generalize brcelona Correct barcelona f.c. Generalize Parallel move Specialize Specialize

    + Carlos CastilloCarlos Castillo, 6 months ago

    custom

    372 views, 0 favs, 1 embeds more stats

    Talk about query flow graphs and query recommendati more

    More info about this document

    CC Attribution License

    Go to text version

    • Total Views 372
      • 371 on SlideShare
      • 1 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 10
    Most viewed embeds
    • 1 views on http://www.plaxo.com

    more

    All embeds
    • 1 views on http://www.plaxo.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories

    Tags