0
Intelligent Search
Intelligent Search
   (or at least really clever)
Some Preliminaries
• Text retrieval = matrix multiplication
     A: our corpus
     documents are rows
     terms are colu...
Some Preliminaries
• Text retrieval = matrix multiplication
     A: our corpus
     documents are rows
     terms are colu...
Some Preliminaries
• Text retrieval = matrix multiplication
     A: our corpus
     documents are rows
     terms are colu...
Some Preliminaries
• Text retrieval = matrix multiplication
     A: our corpus
     documents are rows
     terms are colu...
More Preliminaries
• Recommendation = Matrix multiply
    A: our users’ histories
    users are rows
    items are columns
More Preliminaries
• Recommendation = Matrix multiply
    A: our users’ histories
    users are rows
    items are columns...
More Preliminaries
• Recommendation = Matrix multiply
    A: our users’ histories
    users are rows
    items are columns...
More Preliminaries
• Recommendation = Matrix multiply
    A: our users’ histories
    users are rows
    items are columns...
More Preliminaries
• Recommendation = Matrix multiply
    A: our users’ histories
    users are rows
    items are columns...
More Preliminaries
• Recommendation = Matrix multiply
    A: our users’ histories
    users are rows
    items are columns...
More Preliminaries
• Recommendation = Matrix multiply
    A: our users’ histories
    users are rows
    items are columns...
Why so ish?

• In real life, ish happens because:
 • Big data ... so we selectively sample
 • Sparse data ... so we smooth...
The same in spite of ish

• The shape of the computation is unchanged
• The cost of the computation is unchanged
• Broad a...
Back to recommendations ...
Dyadic Structure
●   Functional
    – Interaction:   actor -> item*
●   Relational
    – Interaction    ⊆ Actors x Items
●...
Fundamental Algorithmics
●   Cooccurrence

●   A is actors x items, K is items x items
●   Product has general shape of ma...
Fundamental Algorithmic Structure
●   Cooccurrence

●   Matrix approximation by factoring




●   LLR
But Wait ...
But Wait ...
Does it have to be that way?
What we have:

For a user who watched/bought/listened to this
What we have:

  For a user who watched/bought/listened to this



Sum over all other users who watched/bought/...
What we have:

  For a user who watched/bought/listened to this



Sum over all other users who watched/bought/...



  Ad...
What we have:

  For a user who watched/bought/listened to this



Sum over all other users who watched/bought/...



  Ad...
What we have:

  For a user who watched/bought/listened to this



Sum over all other users who watched/bought/...



  Ad...
What we have:

Add up what they watched/bought/listened to
What we have:

Add up what they watched/bought/listened to



               But wait, we can do that faster
What we have:

Add up what they watched/bought/listened to



               But wait, we can do that faster
But why not ...
But why not ...
But why not ...




Why just dyadic learning?
But why not ...




Why just dyadic learning?

Why not triadic learning?
But why not ...




Why just dyadic learning?

Why not p-adic learning?
For example
●   Users enter queries (A)
    – (actor   = user, item=query)
●   Users view videos (B)
    – (actor   = user...
The punch-line
●   BʼA recommends videos in response to a query
    – (isnʼt   that a search engine?)
    – (not   quite, ...
Real-life example
●   Query: “Paco de Lucia”
●   Conventional meta-data search results:
    – “hombres   del paco” times 4...
Real-life example
Real-life example
System Diagram
     Viewing Logs      selective
                                   count
       t user video    sampler


...
Indexing
        Related terms
         v => t1 t2...


    Related videos
     v => v1 v2...
                         joi...
Hypothetical Example
●   Want a navigational ontology?
●   Just put labels on a web page with traffic
    – This   gives A ...
Resources
●   My blog
    – http://tdunning.blogspot.com/

●   The original LLR in NLP paper
    –   Accurate Methods for ...
Upcoming SlideShare
Loading in...5
×

Intelligent Search

1,457

Published on

ApacheCon 2009 talk describing methods for doing intelligent (well, really clever at least) search on items with no or poor meta-data.

The video of the talk should be available shortly on the ApacheCon web-site.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,457
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
35
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Intelligent Search"

  1. 1. Intelligent Search
  2. 2. Intelligent Search (or at least really clever)
  3. 3. Some Preliminaries • Text retrieval = matrix multiplication A: our corpus documents are rows terms are columns
  4. 4. Some Preliminaries • Text retrieval = matrix multiplication A: our corpus documents are rows terms are columns for each document d: for each term t: sd += adt qt
  5. 5. Some Preliminaries • Text retrieval = matrix multiplication A: our corpus documents are rows terms are columns sd = Σt adt qt
  6. 6. Some Preliminaries • Text retrieval = matrix multiplication A: our corpus documents are rows terms are columns s=Aq
  7. 7. More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns
  8. 8. More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns Users who bought items in the list h also bought items in the list r
  9. 9. More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns for each user u: for each item t1: for each item t2: rt1 += au,t1 au,t2 ht2
  10. 10. More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns sd = Σt2 Σu au,t1 au,t2 qt2
  11. 11. More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns s = A’ (A q)
  12. 12. More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns s = (A’ A) q
  13. 13. More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns s = (A’ A) q ish!
  14. 14. Why so ish? • In real life, ish happens because: • Big data ... so we selectively sample • Sparse data ... so we smooth • Finite computers ... so we sparsify • Top-40 effect ... so we use some stats
  15. 15. The same in spite of ish • The shape of the computation is unchanged • The cost of the computation is unchanged • Broad algebraic conclusions still hold
  16. 16. Back to recommendations ...
  17. 17. Dyadic Structure ● Functional – Interaction: actor -> item* ● Relational – Interaction ⊆ Actors x Items ● Matrix – Rows indexed by actor, columns by item – Value is count of interactions ● Predict missing observations
  18. 18. Fundamental Algorithmics ● Cooccurrence ● A is actors x items, K is items x items ● Product has general shape of matrix ● K tells us “users who interacted with x also interacted with y”
  19. 19. Fundamental Algorithmic Structure ● Cooccurrence ● Matrix approximation by factoring ● LLR
  20. 20. But Wait ...
  21. 21. But Wait ... Does it have to be that way?
  22. 22. What we have: For a user who watched/bought/listened to this
  23. 23. What we have: For a user who watched/bought/listened to this Sum over all other users who watched/bought/...
  24. 24. What we have: For a user who watched/bought/listened to this Sum over all other users who watched/bought/... Add up what they watched/bought/listened to
  25. 25. What we have: For a user who watched/bought/listened to this Sum over all other users who watched/bought/... Add up what they watched/bought/listened to And recommend that
  26. 26. What we have: For a user who watched/bought/listened to this Sum over all other users who watched/bought/... Add up what they watched/bought/listened to And recommend that ish
  27. 27. What we have: Add up what they watched/bought/listened to
  28. 28. What we have: Add up what they watched/bought/listened to But wait, we can do that faster
  29. 29. What we have: Add up what they watched/bought/listened to But wait, we can do that faster
  30. 30. But why not ...
  31. 31. But why not ...
  32. 32. But why not ... Why just dyadic learning?
  33. 33. But why not ... Why just dyadic learning? Why not triadic learning?
  34. 34. But why not ... Why just dyadic learning? Why not p-adic learning?
  35. 35. For example ● Users enter queries (A) – (actor = user, item=query) ● Users view videos (B) – (actor = user, item=video) ● AʼA gives query recommendation – “did you mean to ask for” ● BʼB gives video recommendation – “you might like these videos”
  36. 36. The punch-line ● BʼA recommends videos in response to a query – (isnʼt that a search engine?) – (not quite, it doesnʼt look at content or meta-data)
  37. 37. Real-life example ● Query: “Paco de Lucia” ● Conventional meta-data search results: – “hombres del paco” times 400 – not much else ● Recommendation based search: – Flamenco guitar and dancers – Spanish and classical guitar – Van Halen doing a classical/flamenco riff
  38. 38. Real-life example
  39. 39. Real-life example
  40. 40. System Diagram Viewing Logs selective count t user video sampler Search Logs selective llr + Related videos count t user query-term sampler sparsify v => v1 v2... Related terms join on v => t1 t2... count user Hadoop
  41. 41. Indexing Related terms v => t1 t2... Related videos v => v1 v2... join on Lucene Index video Video meta v => url title... Hadoop Lucene (+Katta?)
  42. 42. Hypothetical Example ● Want a navigational ontology? ● Just put labels on a web page with traffic – This gives A = users x label clicks ● Remember viewing history – This gives B = users x items ● Cross recommend – BʼA = click to item mapping ● After several users click, results are whatever users think they should be
  43. 43. Resources ● My blog – http://tdunning.blogspot.com/ ● The original LLR in NLP paper – Accurate Methods for the Statistics of Surprise and Coincidence (check on citeseer) ● Source code – Mahout project – contact me (tdunning@apache.org)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×