usage mining techniques         with applications to web search           and content recommendation                      ...
yahoo! research, barcelona       web mining       social media and multimedia       large-scale distributed systems       ...
web mining in yahoo! research  themes      usage mining and query-log mining      social network analysis and graph mining...
web mining in yahoo! research  themes      usage mining and query-log mining      social network analysis and graph mining...
overview of the talk       query-log mining           query graphs           query recommendations       yahoo! tips      ...
query-log miningyandex           aug 31, 2012
query-log mining       search engines collect a large amount of query logs       lots of interesting information          ...
query-log mining       search engines collect a large amount of query logs       lots of interesting information          ...
the click graph  [Craswell and Szummer, 2007]  yandex                         aug 31, 2012
applications of the click graph   [Craswell and Szummer, 2007]        query-to-document search        query-to-query sugge...
the query-flow graph  [Boldi et al., 2008]       take into account temporal information       captures the “flow” of how use...
building the query-flow graph       an edge (q, q ) if q and q are consecutive in       at least one session       weights ...
query-flow graph                                                                           barcelona fc                    ...
query-flow graph                                                       picture of a funny                                  ...
query recommendations  the general theme:       given an input query q       identify similar queries q       rank them an...
query recommendations  the general theme:       given an input query q       identify similar queries q       rank them an...
recommendations using the query-flow graph  [Boldi et al., 2008]       perform a random walk on the query-flow graph       t...
example : apple   Max. weight      sq               sq                                     ˆ               sq             ...
example : banana → apple           banana → apple       banana           banana               banana           apple      ...
example : beatles → apple           beatles → apple          beatles           beatles                  beatles           ...
recommendations as shortcuts to qfg                    [Anagnostopoulos et al., 2010]yandex                               ...
the query-recommendation problem  yandex                           aug 31, 2012
the query-recommendation problem  yandex                           aug 31, 2012
the query-recommendation problem  yandex                           aug 31, 2012
the query-recommendation problem  yandex                           aug 31, 2012
the recommendation problem       model user behavior as a random walk on qfg       a user starts at query q0 and follows a...
probabilistic model       we can only suggest, not order the user       we do not know how the user will act       random ...
utility functions        reward function w (q) on queries      - quality of search results, user satisfaction, dwell time,...
utility                      Sum of expected values            1.2            1.0            0.8            0.6           ...
qfg projections for diverse recommendations                                   [Bordino et al., 2010]yandex                ...
diverse recommendations  [Bordino et al., 2010]      we want not only relevant and high-quality      recommendations, but ...
example: time       Spectral projection on 2-hop neighborhood        time        time magazine new york times time zone   ...
improving recommendation         for long-tail queries via templates                        [Szpektor et al., 2011]yandex ...
motivation       goal: improve coverage of query-recommendation systems       observation: in a typical query log 50 % of ...
overview of the approach    1   generate candidate query-templates for each query            Paris hotels → <city> hotels ...
overview of the approach    1   generate candidate query-templates for each query            Paris hotels → <city> hotels ...
overview of the approach    1   generate candidate query-templates for each query            Paris hotels → <city> hotels ...
overview of the approach    1   generate candidate query-templates for each query            Paris hotels → <city> hotels ...
overview of the approach    1   generate candidate query-templates for each query            Paris hotels → <city> hotels ...
overview of the approach    1   generate candidate query-templates for each query            Paris hotels → <city> hotels ...
query templates       defined over a hierarchy of entity types       define a global set of templates over the whole query l...
query templates       defined over a hierarchy of entity types       define a global set of templates over the whole query l...
candidate templates – example           substance                   food     drink                                dessert ...
candidate templates – example           substance                   food     drink                                dessert ...
candidate templates – example           substance                   food     drink                                dessert ...
ranking candidate templates       ambiguity       Jaguar spare parts → <car> spare parts       Jaguar spare parts → <anima...
ranking candidate templates       ambiguity       Jaguar spare parts → <car> spare parts       Jaguar spare parts → <anima...
ranking candidate templates       ambiguity       Jaguar spare parts → <car> spare parts       Jaguar spare parts → <anima...
construction of query templates – details       hierarchy used: WordNet 3.0 hierarchy and Wikipedia       category hierarc...
query-to-template edges       mapping from a query q to its set of templates T (q)       viewed as query-to-template edges...
template-to-templates edges       reasoning about transitions between templates       <food> recipe → healthy <food> recip...
example – ambiguity       consider query transition:       jaguar transmission → jaguar spare parts       template transit...
example – ambiguity       consider query transition:       jaguar transmission → jaguar spare parts       template transit...
the query-template flow graph       extension of the query-flow graph       superposition of all the concepts we have seen s...
generating recommendations                                s4                           q              q                s1 ...
methodology  methods:      query-template flow graph      query-flow graph  evaluation:       inspection a sample of the res...
training dataset                           queries       templates       # nodes             95 279 132     5 382 051 983 ...
anecdotal evidence   {“guangzhou flights”, “guangzhou map”}   <capital> flights → <capital> map   {“a thousand miles notes...
editorial evaluation        set-A: 300 pairs from each configuration,        recommendation in the top-10        set-B: 100...
automated evaluation – guiding principle       extract query pairs {qi , qi+1 } from a testing dataset, such       that us...
results                              qfg       qtfg      relative increase                               pair occurrences ...
results                                  20                                                                 QFG           ...
conclusions       improve coverage of query recommendation systems       recommendations for rare or previously unseen que...
yahoo! tips         [Weber et al., 2011]yandex             aug 31, 2012
motivation       provide answers, not links       identify “how to” queries and provide tips       tip: piece of advice th...
yahoo! tips  yandex      aug 31, 2012
yahoo! tips  yandex      aug 31, 2012
yahoo! tips  yandex      aug 31, 2012
yahoo! tips  yandex      aug 31, 2012
extract tips from yahoo! answers   tip: To tell if your eggs are fresh : place eggs in a bowl/glass         of water.....i...
system diagram                                            zest lime without zester           rule-based extraction        ...
mining tips from yahoo! answers       consider tips of a specific structure: “X : Y ”       X : goal of the tip       Y : a...
mining tips from yahoo! answers       english       only literal “how to” queries       answer should start with a verb   ...
quality filtering        generated 249 675 tips        manually label 20 000 using CrowdFlower        classes: very good (2...
quality filtering        generated 249 675 tips        manually label 20 000 using CrowdFlower        classes: very good (2...
quality filtering        generated 249 675 tips        manually label 20 000 using CrowdFlower        classes: very good (2...
quality filtering — machine learning results            Method        handcrafted    content      both                     ...
quality filtering — machine learning results           Category                    P,R      VG     size           Beauty & ...
detecting “how to” queries       how many? 2-3% of volume, 3-4% of distinct queries       start with “how to” “how do i” o...
detecting “how to” queries       how many? 2-3% of volume, 3-4% of distinct queries       start with “how to” “how do i” o...
detecting “how to” queries       how many? 2-3% of volume, 3-4% of distinct queries       start with “how to” “how do i” o...
detecting “how to” queries       how many? 2-3% of volume, 3-4% of distinct queries       start with “how to” “how do i” o...
detecting “how to” queries       how many? 2-3% of volume, 3-4% of distinct queries       start with “how to” “how do i” o...
matching queries to tips       precision–recall trade-off           index only the “goal” or also “action”           use AN...
matching queries to tips — evaluation      mode   min span   vol. dist.   P@1    median      AND      .50     8.7% 2.7% .4...
future work       mine tips from other recourses           twitter           wikitravel       improve quality of existing ...
information dissemination in social networksyandex                                       aug 31, 2012
the information dissemination spectrum   news sites   content-provider sites                  web search   editorially cur...
the information dissemination spectrum   news sites   content-provider sites                  web search   editorially cur...
the information dissemination spectrum   news sites   content-provider sites                  web search   editorially cur...
social media  yandex       aug 31, 2012
the information overload problem  yandex                           aug 31, 2012
social media and user-generated content       paradigm shift from a broadcast one-to-many mechanism       to a many-to-man...
benefits and opportunities       wealth of information of extreme volume and diversity       wisdom of crowd phenomena     ...
challenges       heterogeneous sources       high variability in quality       needle-in-the-haystack problems  we want to...
challenges       heterogeneous sources       high variability in quality       needle-in-the-haystack problems  we want to...
personalized news recommendations             by harnessing the real-time web                [De Francisci Morales et al.,...
overview       a news recommendation system based on real-time web,       e.g., twitter       suggest news articles to twi...
yahoo! news  yandex      aug 31, 2012
yahoo! news  yandex      aug 31, 2012
yahoo! news  yandex      aug 31, 2012
sources characteristics   news stream    + high coverage    − sparse and noisy data for user profiling    − latency on coll...
otivation                                              1.2                                                                ...
ke into account recency: new                                            Motivat pularity45counts of older enti-           ...
yandex   aug 31, 2012
challenges       scale to large volumes of news and tweets       high dynamicity of news and tweets       news have short ...
relate users, tweets, and news articles   yandex                                 aug 31, 2012
9:;<;=-1>;?$1%9*"$10                                 @ABC-1!AD1;?A      T.rex architecture"*0+$#,(Q#9(+$5%R"?0%<"+:"%09#,"...
recommendation model    Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n)  social model  Σ(i, j) social relevance of ...
recommendation model    Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n)  social model  Σ(i, j) social relevance of ...
recommendation model    Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n)  social model  Σ(i, j) social relevance of ...
recommendation model    Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n)  social model  Σ(i, j) social relevance of ...
Personalized News Recommendation  popularity update rule orales                                                           ...
model learning and evaluation       Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n)       Yahoo! toolbar data      ...
systems evaluated  T.rex: basic model using only user profiles       Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n)...
0%#%4++1%)*"1(9+*%+>%($9"*"095      $(3.!4)/!5.(/!&!2&!&#-(τ6  results                                                    ...
Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»
Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»
Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»
Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»
Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»
Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»
Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»
Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»
Upcoming SlideShare
Loading in...5
×

Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»

2,049

Published on

Научно-технический семинар «Умный веб-поиск: не только находит, но и рекомендует», 31 августа 2012 г.

Арис Гионис, старший научный сотрудник Yahoo!Research, Барселона.

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,049
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
11
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»

  1. 1. usage mining techniques with applications to web search and content recommendation Aristides Gionis Yahoo! Research, Barcelonayandex aug 31, 2012
  2. 2. yahoo! research, barcelona web mining social media and multimedia large-scale distributed systems user engagement semantic web yandex aug 31, 2012
  3. 3. web mining in yahoo! research themes usage mining and query-log mining social network analysis and graph mining influence propagation other data mining problems data sources - query logs (search) and toolbar (browsing) - social networks (flickr, messenger, email, ...) - question-answering (answers) - micro-blogging (twitter) yandex aug 31, 2012
  4. 4. web mining in yahoo! research themes usage mining and query-log mining social network analysis and graph mining influence propagation other data mining problems data sources - query logs (search) and toolbar (browsing) - social networks (flickr, messenger, email, ...) - question-answering (answers) - micro-blogging (twitter) yandex aug 31, 2012
  5. 5. overview of the talk query-log mining query graphs query recommendations yahoo! tips news recommendations using real-time web yandex aug 31, 2012
  6. 6. query-log miningyandex aug 31, 2012
  7. 7. query-log mining search engines collect a large amount of query logs lots of interesting information analyzing users’ behavior creating user profiles and personalization creating knowledge bases and folksonomies finding similar concepts building systems for query recommendations using statistics for improving systems’ performance ... yandex aug 31, 2012
  8. 8. query-log mining search engines collect a large amount of query logs lots of interesting information analyzing users’ behavior creating user profiles and personalization creating knowledge bases and folksonomies finding similar concepts building systems for query recommendations using statistics for improving systems’ performance ... yandex aug 31, 2012
  9. 9. the click graph [Craswell and Szummer, 2007] yandex aug 31, 2012
  10. 10. applications of the click graph [Craswell and Szummer, 2007] query-to-document search query-to-query suggestion document-to-query annotation document-to-document relevance feedback yandex aug 31, 2012
  11. 11. the query-flow graph [Boldi et al., 2008] take into account temporal information captures the “flow” of how users submit queries definition: nodes V = Q ∪ {s, t} the distinct set of queries Q, plus a starting state s and a terminal state t edges E ⊆ V × V weights w (q, q ) representing the probability that q and q are part of the same chain yandex aug 31, 2012
  12. 12. building the query-flow graph an edge (q, q ) if q and q are consecutive in at least one session weights w (q, q ) learned by machine learning features used textual features: cosine similarity, Jaccard coefficient, size of intersection, etc. session features: the number of sessions, the average session length, the average number of clicks in the sessions, the average position of the queries in the sessions, etc. and time-related features: average time difference, etc. yandex aug 31, 2012
  13. 13. query-flow graph barcelona fc website 0.043 barcelona fc fixtures 0.031 barcelona fc 0.017 real madrid 0.080 0.011 0.506 0.439 barcelona hotels 0.072 0.018 cheap barcelona 0.023 hotels 0.029 <T> barcelona luxury 0.043 barcelona 0.018 barcelona hotels weather 0.416 0.523 0.100 barcelona weather online yandex aug 31, 2012
  14. 14. query-flow graph picture of a funny cat and dog picture of a cat funny dog cat funny cat ^ picture of a dog dog dog for sale $ breed of dog yandex aug 31, 2012
  15. 15. query recommendations the general theme: given an input query q identify similar queries q rank them and present them to the user most query graphs can be used for both tasks: similarity and ranking yandex aug 31, 2012
  16. 16. query recommendations the general theme: given an input query q identify similar queries q rank them and present them to the user most query graphs can be used for both tasks: similarity and ranking yandex aug 31, 2012
  17. 17. recommendations using the query-flow graph [Boldi et al., 2008] perform a random walk on the query-flow graph teleportation to the submitted query teleportation to previous queries to take into account the user history normalize PageRank score to un-biasing for very popular queries yandex aug 31, 2012
  18. 18. example : apple Max. weight sq sq ˆ sq ¯ t t apple apple apple ipod apple apple fruit apple ipod apple store apple ipod apple ipod apple trailers apple trailers apple store apple belgium apple store amazon apple trailers eating apple apple mac apple mac google apple.nl apple fruit itunes amazon apple monitor apple usa pc world argos apple usa apple ipod nano argos itunes apple jobs apple.com/ipod... yandex aug 31, 2012
  19. 19. example : banana → apple banana → apple banana banana banana apple eating bugs usb no banana holiday banana cs opening a banana giant chocolate bar banana shoe where is the seed in fruit banana anut banana shoe recipe 22 feb 08 fruit banana banana jules oliver banana cloths banana cs eating bugs banana cloths yandex aug 31, 2012
  20. 20. example : beatles → apple beatles → apple beatles beatles beatles apple scarring apple ipod paul mcartney scarring yarns from ireland srg peppers artwork statutory instrument A55 ill get you silver beatles tribute band bashles beatles mp3 dundee folk songs GHOST’S the beatles love album ill get you place lyrics beatles fugees triger finger remix yandex aug 31, 2012
  21. 21. recommendations as shortcuts to qfg [Anagnostopoulos et al., 2010]yandex aug 31, 2012
  22. 22. the query-recommendation problem yandex aug 31, 2012
  23. 23. the query-recommendation problem yandex aug 31, 2012
  24. 24. the query-recommendation problem yandex aug 31, 2012
  25. 25. the query-recommendation problem yandex aug 31, 2012
  26. 26. the recommendation problem model user behavior as a random walk on qfg a user starts at query q0 and follows a path p of reformulations on qfg before terminating consider a reward function w (q) on the nodes of qfg goal: “nudge” users in order to maximize their reward objectives: 1. collect a large reward along the way 2. end the session at a high-reward node applications: a general problem formulation for suggesting shortcuts (web graph, social networks, etc.) yandex aug 31, 2012
  27. 27. probabilistic model we can only suggest, not order the user we do not know how the user will act random walk on qfg is modeled by stochastic matrix P recommendations R modify P to P = P + R yandex aug 31, 2012
  28. 28. utility functions reward function w (q) on queries - quality of search results, user satisfaction, dwell time, monetization, etc. utility function U(p) on paths p = q0 . . . qk−1 T U(p) = w (q) U(p) = w (qk−1 ), q∈p (Cafavy) (Machiavelli) “road to Ithaca” “end justify the means” yandex aug 31, 2012
  29. 29. utility Sum of expected values 1.2 1.0 0.8 0.6 0.4 0.2 0.0 w ρ ρw 1−step heuristic yandex aug 31, 2012
  30. 30. qfg projections for diverse recommendations [Bordino et al., 2010]yandex aug 31, 2012
  31. 31. diverse recommendations [Bordino et al., 2010] we want not only relevant and high-quality recommendations, but also a diverse set we want recommendations that take to different “directions” in the qfg need notions of distance of queries in the qfg use spectral embeddings project a graph in a low dimensional space, so that embedding minimizes total edge distortion finding diverse recommendations reduces to a geometric problem yandex aug 31, 2012
  32. 32. example: time Spectral projection on 2-hop neighborhood time time magazine new york times time zone world time what time is it time warner time warner cable time magazine 0.9953 0.0162 0.1422 0.1049 -0.6071 -0.6056 new york times 0.9953 -0.0051 0.1248 0.0893 -0.6478 -0.6462 time zone 0.0162 -0.0051 0.9903 0.9891 -0.5234 -0.5254 world time 0.1422 0.1248 0.9903 0.9970 -0.6263 -0.6282 what time is it 0.1049 0.0893 0.9891 0.9970 -0.6244 -0.6263 time warner -0.6071 -0.6478 -0.5234 -0.6263 -0.6244 0.9999time warner cable -0.6056 -0.6462 -0.5254 -0.6282 -0.6263 0.9999 yandex aug 31, 2012
  33. 33. improving recommendation for long-tail queries via templates [Szpektor et al., 2011]yandex aug 31, 2012
  34. 34. motivation goal: improve coverage of query-recommendation systems observation: in a typical query log 50 % of query volume are unique queries [Baeza-Yates et al., 2007] most query-recommendation systems are based on finding queries that co-occur frequently inherent limitation on using co-occurrences need to be able to develop methods to reason for rare, and even previously unseen, queries yandex aug 31, 2012
  35. 35. overview of the approach 1 generate candidate query-templates for each query Paris hotels → <city> hotels Paris hotels → <district> hotels Moscow hotels → <city> hotels 2 infer transitions between templates <city> hotels → <city> restaurants 3 infer recommendations for rare queries Yancheng hotels → Yancheng restaurants yandex aug 31, 2012
  36. 36. overview of the approach 1 generate candidate query-templates for each query Paris hotels → <city> hotels Paris hotels → <district> hotels Moscow hotels → <city> hotels 2 infer transitions between templates <city> hotels → <city> restaurants 3 infer recommendations for rare queries Yancheng hotels → Yancheng restaurants yandex aug 31, 2012
  37. 37. overview of the approach 1 generate candidate query-templates for each query Paris hotels → <city> hotels Paris hotels → <district> hotels Moscow hotels → <city> hotels 2 infer transitions between templates <city> hotels → <city> restaurants 3 infer recommendations for rare queries Yancheng hotels → Yancheng restaurants yandex aug 31, 2012
  38. 38. overview of the approach 1 generate candidate query-templates for each query Paris hotels → <city> hotels Paris hotels → <district> hotels Moscow hotels → <city> hotels 2 infer transitions between templates <city> hotels → <city> restaurants 3 infer recommendations for rare queries Yancheng hotels → Yancheng restaurants yandex aug 31, 2012
  39. 39. overview of the approach 1 generate candidate query-templates for each query Paris hotels → <city> hotels Paris hotels → <district> hotels Moscow hotels → <city> hotels 2 infer transitions between templates <city> hotels → <city> restaurants 3 infer recommendations for rare queries Yancheng hotels → Yancheng restaurants yandex aug 31, 2012
  40. 40. overview of the approach 1 generate candidate query-templates for each query Paris hotels → <city> hotels Paris hotels → <district> hotels Moscow hotels → <city> hotels 2 infer transitions between templates <city> hotels → <city> restaurants 3 infer recommendations for rare queries Yancheng hotels → Yancheng restaurants yandex aug 31, 2012
  41. 41. query templates defined over a hierarchy of entity types define a global set of templates over the whole query log do not restrict on specific domains (such as, travel, weather, or movies) examples: jaguar spare parts → <car> spare parts name for salt → name for <compound> a thousand miles notes → <song> notes yandex aug 31, 2012
  42. 42. query templates defined over a hierarchy of entity types define a global set of templates over the whole query log do not restrict on specific domains (such as, travel, weather, or movies) examples: jaguar spare parts → <car> spare parts name for salt → name for <compound> a thousand miles notes → <song> notes yandex aug 31, 2012
  43. 43. candidate templates – example substance food drink dessert instruction chocolate cookie chocolate cookie recipe query: chocolate cookie recipe candidate templates: <food> cookie recipe <drink> cookie recipe <food> recipe <substance> recipe chocolate cookie <instruction> . . . yandex aug 31, 2012
  44. 44. candidate templates – example substance food drink dessert instruction chocolate cookie chocolate cookie recipe query: chocolate cookie recipe candidate templates: <food> cookie recipe <drink> cookie recipe <food> recipe <substance> recipe chocolate cookie <instruction> . . . yandex aug 31, 2012
  45. 45. candidate templates – example substance food drink dessert instruction chocolate cookie chocolate cookie recipe query: chocolate cookie recipe candidate templates: <food> cookie recipe <drink> cookie recipe <food> recipe <substance> recipe chocolate cookie <instruction> . . . yandex aug 31, 2012
  46. 46. ranking candidate templates ambiguity Jaguar spare parts → <car> spare parts Jaguar spare parts → <animal> spare parts focus name for salt → name for <compound> name for salt → <description> for salt right generalization level Paris hotels → <capital> hotels Paris hotels → <city> hotels Paris hotels → <location> hotels yandex aug 31, 2012
  47. 47. ranking candidate templates ambiguity Jaguar spare parts → <car> spare parts Jaguar spare parts → <animal> spare parts focus name for salt → name for <compound> name for salt → <description> for salt right generalization level Paris hotels → <capital> hotels Paris hotels → <city> hotels Paris hotels → <location> hotels yandex aug 31, 2012
  48. 48. ranking candidate templates ambiguity Jaguar spare parts → <car> spare parts Jaguar spare parts → <animal> spare parts focus name for salt → name for <compound> name for salt → <description> for salt right generalization level Paris hotels → <capital> hotels Paris hotels → <city> hotels Paris hotels → <location> hotels yandex aug 31, 2012
  49. 49. construction of query templates – details hierarchy used: WordNet 3.0 hierarchy and Wikipedia category hierarchy, connected via yago mapping queries are tokenized, and n-grams are looked up and mapped to entities in the hierarchy enriched with heuristic generalizations for <email>, <url>, numbers, and noun-phrases not in the taxonomy yandex aug 31, 2012
  50. 50. query-to-template edges mapping from a query q to its set of templates T (q) viewed as query-to-template edges associated edge scores sqt (q, t) = αd when t obtained by generalizing q at distance d in H parameter α set experimentally to 0.9 set sqt (q, q ) = 1, if (q, q ) edge in query-flow graph normalize so that all sqt (q, ·) sum to 1 yandex aug 31, 2012
  51. 51. template-to-templates edges reasoning about transitions between templates <food> recipe → healthy <food> recipe for templates (t1 , t2 ) define the support set of query pairs {(q1 , q2 )}, s.t. t1 ∈ T (q1 ) and t2 ∈ T (q2 ) t1 and t2 substitute the same token in q1 and q2 (e.g., dosa recipe and healthy dosa recipe) define template-to-template edge score as stt (t1 , t2 ) = sqq (q1 , q2 ) (q1 ,q2 )∈Sup(t1 ,t2 ) normalize so that all stt (t, ·) sum to 1 yandex aug 31, 2012
  52. 52. example – ambiguity consider query transition: jaguar transmission → jaguar spare parts template transition <car> transmission → <car> spare parts supported by bmw transmission → bmw spare parts audi transmission → audi spare parts ... template transition <animal> transmission → <animal> spare parts will not be supported by lion transmission → lion spare parts tiger transmission → tiger spare parts ... yandex aug 31, 2012
  53. 53. example – ambiguity consider query transition: jaguar transmission → jaguar spare parts template transition <car> transmission → <car> spare parts supported by bmw transmission → bmw spare parts audi transmission → audi spare parts ... template transition <animal> transmission → <animal> spare parts will not be supported by lion transmission → lion spare parts tiger transmission → tiger spare parts ... yandex aug 31, 2012
  54. 54. the query-template flow graph extension of the query-flow graph superposition of all the concepts we have seen so far: set of nodes consists of queries and templates set of edges consists of query to query edges query to template edges template to template edges associated weights yandex aug 31, 2012
  55. 55. generating recommendations s4 q q s1 s2 s5 q q t1 t3 s6 s3 t2 s7 t4 r (q, q ) = s1 s4 + s2 s5 + s3 s6 + s3 s7 interpretation: probability of a feasible path dashed lines do not really exist, but discovered on-the-fly queries q and q may not have been seen before transitions in the query-flow graph ranked first yandex aug 31, 2012
  56. 56. methodology methods: query-template flow graph query-flow graph evaluation: inspection a sample of the results editorial evaluation automated evaluation yandex aug 31, 2012
  57. 57. training dataset queries templates # nodes 95 279 132 5 382 051 983 # edges 83 513 590 4 345 497 267 avg degree 0.88 0.81 max out-degree 14 145 34 249 (craigslist) (<album>) max in-degree 14 317 133 874 (youtube) (<institution>) yandex aug 31, 2012
  58. 58. anecdotal evidence {“guangzhou flights”, “guangzhou map”} <capital> flights → <capital> map {“a thousand miles notes”, “a thousand miles piano notes”} <single> notes → <single> piano notes {“8 week old weimaraner”, “8 week old weimaraner puppy”} 8 week old <breed> → 8 week old <breed> puppy {“aaa office twin falls idaho”, “aaa twin falls idaho”} aaa office <city> → aaa <city> {“air force titles”, “air force ranks”} <military service> titles → <military service> ranks {“name for salt”, “chemical name for salt”} name for <compound> → chemical name for <compound> yandex aug 31, 2012
  59. 59. editorial evaluation set-A: 300 pairs from each configuration, recommendation in the top-10 set-B: 100 pairs, same queries in each configuration, same position set-C: 100 pairs for which query-flow graph has no recommendation editors labeled query-recommendation pairs as: relevant, not relevant, cannot tell two editors, 100 common queries, kappa-statistic 0.37 qfg qtfg set-A 98.48% 97.84% set-B 97.65% 98.86% set-C — 94.38% yandex aug 31, 2012
  60. 60. automated evaluation – guiding principle extract query pairs {qi , qi+1 } from a testing dataset, such that user submitted qi+1 after qi in the same session measure if qi+1 is predicted by our methods, and in which position assumption: qi+1 should be relevant and useful for qi yandex aug 31, 2012
  61. 61. results qfg qtfg relative increase pair occurrences total pairs 3134388 3134388 coverage 22.65 % 28.17 % 24.37 % # in top-100 16.97 % 25.49 % 50.23 % # in top-10 9.49 % 20.74 % 118.49 % # in top-1 2.86 % 10.01 % 249.5 % MAP 0.050 0.137 avg. position 18.35 8.3 unique pairs total pairs 2755922 2755922 coverage 13.28 % 19.38 % 45.87 % # in top-100 12.06 % 17.25 % 42.96 % # in top-10 8.41 % 13.52 % 60.68 % # in top-1 2.86 % 6.5 % 127.32 % MAP 0.047 0.089 yandex avg. position 12.33 9.43 aug 31, 2012
  62. 62. results 20 QFG 18 QTFG # test-pairs at top-10 (%) 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 16 query length (words) yandex aug 31, 2012
  63. 63. conclusions improve coverage of query recommendation systems recommendations for rare or previously unseen queries well suited for tail queries complements rather than replaces existing methods future work: improve quality of extracted templates yandex aug 31, 2012
  64. 64. yahoo! tips [Weber et al., 2011]yandex aug 31, 2012
  65. 65. motivation provide answers, not links identify “how to” queries and provide tips tip: piece of advice that is 1 short 2 concrete 3 self-contained 4 non-obvious yandex aug 31, 2012
  66. 66. yahoo! tips yandex aug 31, 2012
  67. 67. yahoo! tips yandex aug 31, 2012
  68. 68. yahoo! tips yandex aug 31, 2012
  69. 69. yahoo! tips yandex aug 31, 2012
  70. 70. extract tips from yahoo! answers tip: To tell if your eggs are fresh : place eggs in a bowl/glass of water.....if it floats it’s bad. if it sinks it’s good. yandex aug 31, 2012
  71. 71. system diagram zest lime without zester rule-based extraction 250k candidate tips Does query have no show normal how-to intent? search results Obtain quality labels for 20k candidate tip using CrowdFlower yes machine learning Are there relevant show normal 22k high quality tips no high quality tips? search results yes rank the matching tips and display highest ranking one TIP: To zest a lime if you don‘t have a zester : use a cheese grater yandex aug 31, 2012
  72. 72. mining tips from yahoo! answers consider tips of a specific structure: “X : Y ” X : goal of the tip Y : action of the tip examples To get the mildew smell out of your towels : try soaking it in a salt water solution, then washing with soap and cold water, that tends to get rid of smells To style your hair without heat, gel or straighteners : try coconut oil mark k yandex aug 31, 2012
  73. 73. mining tips from yahoo! answers english only literal “how to” queries answer should start with a verb consider only best answers replace I, my, me, myself, etc. with you, your, you, yourself, etc. yandex aug 31, 2012
  74. 74. quality filtering generated 249 675 tips manually label 20 000 using CrowdFlower classes: very good (25%), ok (48%), bad (27%) algorithms svm (rbf) decision trees k-nn (Euclidean, k = 21 . . . 50) feature families: 18 handcrafted features: e.g., style (Flesch-Kincaid reading level), sentiment, # urls, emoticons, etc. content: SVD on the tip×term matrix yandex aug 31, 2012
  75. 75. quality filtering generated 249 675 tips manually label 20 000 using CrowdFlower classes: very good (25%), ok (48%), bad (27%) algorithms svm (rbf) decision trees k-nn (Euclidean, k = 21 . . . 50) feature families: 18 handcrafted features: e.g., style (Flesch-Kincaid reading level), sentiment, # urls, emoticons, etc. content: SVD on the tip×term matrix yandex aug 31, 2012
  76. 76. quality filtering generated 249 675 tips manually label 20 000 using CrowdFlower classes: very good (25%), ok (48%), bad (27%) algorithms svm (rbf) decision trees k-nn (Euclidean, k = 21 . . . 50) feature families: 18 handcrafted features: e.g., style (Flesch-Kincaid reading level), sentiment, # urls, emoticons, etc. content: SVD on the tip×term matrix yandex aug 31, 2012
  77. 77. quality filtering — machine learning results Method handcrafted content both features features SVM 0.63/0.13 0.60/0.09 0.63/0.16 Hard Decision Tree 0.67/0.07 0.61/0.06 0.66/0.13 k-NN 0.62/0.23 0.56/0.11 0.63/0.11 SVM 0.95/0.11 0.93/0.05 0.95/0.08 Soft Decision Tree 0.95/0.03 0.92/0.03 0.94/0.06 k-NN 0.94/0.11 0.91/0.05 0.94/0.05 yandex aug 31, 2012
  78. 78. quality filtering — machine learning results Category P,R VG size Beauty & Style 0.53,0.08 0.16 0.08 Business & Finance 0.57,0.20 0.20 0.03 Cars & Transportation 0.64,0.12 0.23 0.03 Computers & Internet 0.69,0.33 0.45 0.15 Consumer Electronics 0.70,0.23 0.38 0.06 Entertainment & Music 0.60,0.39 0.15 0.05 Family & Relationships 0.35,0.05 0.06 0.14 Games & Recreation 0.61,0.31 0.24 0.04 Health 0.62,0.07 0.15 0.09 Home & Garden 0.43,0.06 0.27 0.04 Society & Culture 0.50,0.19 0.09 0.03 Sports 0.68,0.24 0.19 0.03 Yahoo! Products 0.73,0.43 0.45 0.07 yandex aug 31, 2012
  79. 79. detecting “how to” queries how many? 2-3% of volume, 3-4% of distinct queries start with “how to” “how do i” or “how can i” how do you fix keys on a laptop P: 96-99%, cover: 1.0% queries start with an action verb play my music on tool bar raido P: 7-14%, cover: 3.2% if exists “how to X” then “X” craft ideas for boys P: 87-94%, cover: 1.1% incoming queries to “how to” web sites fixing a wet cell phone P: 61-75%, cover: 0.08% yandex aug 31, 2012
  80. 80. detecting “how to” queries how many? 2-3% of volume, 3-4% of distinct queries start with “how to” “how do i” or “how can i” how do you fix keys on a laptop P: 96-99%, cover: 1.0% queries start with an action verb play my music on tool bar raido P: 7-14%, cover: 3.2% if exists “how to X” then “X” craft ideas for boys P: 87-94%, cover: 1.1% incoming queries to “how to” web sites fixing a wet cell phone P: 61-75%, cover: 0.08% yandex aug 31, 2012
  81. 81. detecting “how to” queries how many? 2-3% of volume, 3-4% of distinct queries start with “how to” “how do i” or “how can i” how do you fix keys on a laptop P: 96-99%, cover: 1.0% queries start with an action verb play my music on tool bar raido P: 7-14%, cover: 3.2% if exists “how to X” then “X” craft ideas for boys P: 87-94%, cover: 1.1% incoming queries to “how to” web sites fixing a wet cell phone P: 61-75%, cover: 0.08% yandex aug 31, 2012
  82. 82. detecting “how to” queries how many? 2-3% of volume, 3-4% of distinct queries start with “how to” “how do i” or “how can i” how do you fix keys on a laptop P: 96-99%, cover: 1.0% queries start with an action verb play my music on tool bar raido P: 7-14%, cover: 3.2% if exists “how to X” then “X” craft ideas for boys P: 87-94%, cover: 1.1% incoming queries to “how to” web sites fixing a wet cell phone P: 61-75%, cover: 0.08% yandex aug 31, 2012
  83. 83. detecting “how to” queries how many? 2-3% of volume, 3-4% of distinct queries start with “how to” “how do i” or “how can i” how do you fix keys on a laptop P: 96-99%, cover: 1.0% queries start with an action verb play my music on tool bar raido P: 7-14%, cover: 3.2% if exists “how to X” then “X” craft ideas for boys P: 87-94%, cover: 1.1% incoming queries to “how to” web sites fixing a wet cell phone P: 61-75%, cover: 0.08% yandex aug 31, 2012
  84. 84. matching queries to tips precision–recall trade-off index only the “goal” or also “action” use AND or OR mode for query require minimum “span” for the goal ranking rank by number of query tokens in goal, then tf·idf yandex aug 31, 2012
  85. 85. matching queries to tips — evaluation mode min span vol. dist. P@1 median AND .50 8.7% 2.7% .428/.680 1 AND .66 6.8% 1.8% .557/.770 1 AND 1.0 4.4% 0.8% .625/.835 1 OR .50 87.4% 88.4% .048/.110 18 OR .66 36.8% 36.3% .092/.200 2 OR 1.0 13.5% 10.3% .160/.300 1 yandex aug 31, 2012
  86. 86. future work mine tips from other recourses twitter wikitravel improve quality of existing system incorporating more features improving rule extraction classification yandex aug 31, 2012
  87. 87. information dissemination in social networksyandex aug 31, 2012
  88. 88. the information dissemination spectrum news sites content-provider sites web search editorially curated url, images, music, users browse ... no specific info need clear intent social media (twitter, facebook) recommendations (content- or context- or geo-aware) user-generated content (blogs, images, q/a) yandex aug 31, 2012
  89. 89. the information dissemination spectrum news sites content-provider sites web search editorially curated url, images, music, users browse ... no specific info need clear intent social media (twitter, facebook) recommendations (content- or context- or geo-aware) user-generated content (blogs, images, q/a) yandex aug 31, 2012
  90. 90. the information dissemination spectrum news sites content-provider sites web search editorially curated url, images, music, users browse ... no specific info need clear intent social media (twitter, facebook) recommendations (content- or context- or geo-aware) user-generated content (blogs, images, q/a) yandex aug 31, 2012
  91. 91. social media yandex aug 31, 2012
  92. 92. the information overload problem yandex aug 31, 2012
  93. 93. social media and user-generated content paradigm shift from a broadcast one-to-many mechanism to a many-to-many model users at the role of information producers yandex aug 31, 2012
  94. 94. benefits and opportunities wealth of information of extreme volume and diversity wisdom of crowd phenomena accurate profiling and personalization (toolbar, search, clicks) content- and context- information available social and geo information available yandex aug 31, 2012
  95. 95. challenges heterogeneous sources high variability in quality needle-in-the-haystack problems we want to: support users to seek, filter, and disseminate information build efficient platforms that support social-media functionalities yandex aug 31, 2012
  96. 96. challenges heterogeneous sources high variability in quality needle-in-the-haystack problems we want to: support users to seek, filter, and disseminate information build efficient platforms that support social-media functionalities yandex aug 31, 2012
  97. 97. personalized news recommendations by harnessing the real-time web [De Francisci Morales et al., 2012]yandex aug 31, 2012
  98. 98. overview a news recommendation system based on real-time web, e.g., twitter suggest news articles to twitter users infer user preferences from twitter activity yandex aug 31, 2012
  99. 99. yahoo! news yandex aug 31, 2012
  100. 100. yahoo! news yandex aug 31, 2012
  101. 101. yahoo! news yandex aug 31, 2012
  102. 102. sources characteristics news stream + high coverage − sparse and noisy data for user profiling − latency on collecting user feedback twitter stream + much more accurate personalization + news spread very fast yandex aug 31, 2012
  103. 103. otivation 1.2 1.4 news $+*:#,(Q"1%$8:<"*%+>%+8**"$"0 $+*:#,(Q"1%$8:<"*%+>%+8**"$"0 twitter 1.2 1 clicks 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -0.2 -0.2 M M M M M M M M M M ay ay ay ay ay ay ay ay ay ay -0 -0 -0 -0 -0 -0 -0 -0 -0 -010000 1 2 2 2 2 2 2 3 3 3 h2 h0 h0 h0 h1 h1 h2 h0 h0 h0 0 0 4 8 2 6 0 0 4 8 9:;<;=-1>;?$1%9*"$10 yandex aug 31, 2012
  104. 104. ke into account recency: new Motivat pularity45counts of older enti- 1.2e popularity counts using an News-click delay $+*:#,(Q"1%$8:<"*%+>%+8**"$"0":5% 40 1ails in Section 5.3.1. However,-% 35 0.8 $8:<"*%+>%+8**"$"0 30 dent of 25 recommendation +405 our 0.6 0.4n be used.20 15 0.2for recommending news arti- 10 0 r combination of the scoring 5 -0.2 05 investigate the effect of100non- 0 1 10 1000 10000 Minutes R"?0V,(-%1",#E%1(09*(<89(+$ yandex aug 31, 2012
  105. 105. yandex aug 31, 2012
  106. 106. challenges scale to large volumes of news and tweets high dynamicity of news and tweets news have short life-cycle twitter users use jargon language find the right degree of personalization cope with inactive twitter users yandex aug 31, 2012
  107. 107. relate users, tweets, and news articles yandex aug 31, 2012
  108. 108. 9:;<;=-1>;?$1%9*"$10 @ABC-1!AD1;?A T.rex architecture"*0+$#,(Q#9(+$5%R"?0%<"+:"%09#,"%3"*E%>#09%#$1%0)*"#1%>#09"*%+$%9?(99"*5%P?(99"*%(0%#%4++1 Method T.Rex Followee User User tweets tweets ModelΠ " Personalized ranked list of"% Followee news articles !1/5 tweets twitter # tweets FolloweeI- tweets news articles R ECE C LICE% S OCI T.Rex C ON$% !"#$%%<8(,10%80"*%)*+=,"0%>*+:%9?(99"*5 P OPU yandex aug 31, 2012
  109. 109. recommendation model Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n) social model Σ(i, j) social relevance of news j to user i content model Γ(i, j) content relevance of news j to user i popularity model Π(j) popularity model of news article j yandex aug 31, 2012
  110. 110. recommendation model Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n) social model Σ(i, j) social relevance of news j to user i content model Γ(i, j) content relevance of news j to user i popularity model Π(j) popularity model of news article j yandex aug 31, 2012
  111. 111. recommendation model Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n) social model Σ(i, j) social relevance of news j to user i content model Γ(i, j) content relevance of news j to user i popularity model Π(j) popularity model of news article j yandex aug 31, 2012
  112. 112. recommendation model Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n) social model Σ(i, j) social relevance of news j to user i content model Γ(i, j) content relevance of news j to user i popularity model Π(j) popularity model of news article j yandex aug 31, 2012
  113. 113. Personalized News Recommendation popularity update rule orales Aristides Gionis Claudio Lucche gionis@yahoo-inc.om claudio.lucchese@isti.c take into account recency: new Motivation popularity45counts of older enti- 1.2 1.4e the popularity counts using an News-click delay news news $+*:#,(Q"1%$8:<"*%+>%+8**"$"0 $+*:#,(Q"1%$8:<"*%+>%+8**"$"0 twitter twitter %0E09":5% 40 1 clicks 1.2 clicksdetails in Section 5.3.1. However, V*#$-% 35 0.8 1 $8:<"*%+>%+8**"$"0 5 ,(-%,+405 30pendent of 25 recommendation our 0.6 news become stale after two 0.8 0.6n can be used. 0.4 20 15 0.2 days 0.4 on for recommending news arti- 0.2 10 0 near combination of the scoring 5 -0.2 track mentions in news and 0 -0.2#*%,+405 M M M M M M M M M M M M M M M M to investigate the effect of100non- ay ay ay ay ay ay ay ay ay ay ay ay ay ay ay a 0 tweets with exponential -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -2 -2 -2 -2 -2 1 10 1000 10000 1 2 2 2 2 2 2 3 3 3 2 2 3 3 4 h2 h0 h0 h0 h1 h1 h2 h0 h0 h0 h0 h1 h0 h1 h0 Minutes 0 0 4 8 2 6 0 0 4 8 0 2 0 2 0 R"?0V,(-%1",#E%1(09*(<89(+$ 9:;<;=-1>;?$1%9*"$10 @ABC-1!AD1;?A9*"$10 #E% decay$1%g Rτ (u, n)). Given the components,"05 Why Twitter?%%P(:",($"00%#$1%)"*0+$#,(Q#9(+$5%R"?0%<"+:"%09#,"%3"*E%>#09%#$1%0)*"#1%>#09"*%+$%9?(99"*5%P?(99"*%(0%#%4++1%)*"1 news N and a stream of tweets Tmmendation score of a news article as τ Method Z = λZτ −1 + wT HT + wN HNModel R· Γτ (u, n) + γ · Πτ (n), T.Rex Alg Followee User tweets tweets User R EC Model C LIe relative weight of the components.del Γ Popularity Model Π " Personalized S OC ranked list of0%9@"%+$9"$9% 6(7*8%?@"*"6,/0%(0%9@"% Followee news articles C ON r system produces a set of news*%80"*%2-5 )+)8,#*(9E%+>%$"?0%#*9(,"%1/5 tweets ! P OP T.Randidate yandex e.g., the most re- news, twitter # aug 31, 2012 T.R
  114. 114. model learning and evaluation Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n) Yahoo! toolbar data the recommendation model should rank high news articles that users click learn the model using SVM use clicks and twitter profiles of 3K users to train and test the system yandex aug 31, 2012
  115. 115. systems evaluated T.rex: basic model using only user profiles Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n) T.rex+: additional features entity hotness news click count news article age yandex aug 31, 2012
  116. 116. 0%#%4++1%)*"1(9+*%+>%($9"*"095 $(3.!4)/!5.(/!&!2&!&#-(τ6 results Results Table 5.2: MRR, precision and coverage. Algorithm MRR P@1 P@5 P@10 Coverage R ECENCY 0.020 0.002 0.018 0.036 1.000 C LICK C OUNT 0.059 0.024 0.086 0.135 1.000 S OCIAL 0.017 0.002 0.018 0.036 0.606 C ONTENT 0.107 0.029 0.171 0.286 0.158 P OPULARITY 0.008 0.003 0.005 0.012 1.000 T.R EX 0.107 0.073 0.130 0.168 1.000 T.R EX+ 0.109 0.062 0.146 0.189 1.000 !"#$%&"()*+#,%&#$-.%/*"(0(+$%#$1%2+3"*#4"5 R ECENCY: it ranks news articles by time of publication (most recent first); C LICK C OUNT: it ranks news articles by click count (highest count first); S OCIAL:14 ranks news articles by using T.R EX with β = γ = 0; it yandex T.Rex+ aug 31, 2012
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×