• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Intelligent Search
 

Intelligent Search

on

  • 122 views

ApacheCon 2009 talk describing methods for doing intelligent (well, really clever at least) search on items with no or poor meta-data.

ApacheCon 2009 talk describing methods for doing intelligent (well, really clever at least) search on items with no or poor meta-data.

Statistics

Views

Total Views
122
Views on SlideShare
122
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Intelligent Search Intelligent Search Presentation Transcript

    • Intelligent Search
    • Intelligent Search (or at least really clever)
    • Some Preliminaries • Text retrieval = matrix multiplication A: our corpus documents are rows terms are columns
    • Some Preliminaries • Text retrieval = matrix multiplication for each document d: for each term t: sd += adt qt A: our corpus documents are rows terms are columns
    • Some Preliminaries • Text retrieval = matrix multiplication A: our corpus documents are rows terms are columns sd = Σt adt qt
    • Some Preliminaries • Text retrieval = matrix multiplication A: our corpus documents are rows terms are columns s = A q
    • More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns
    • More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns Users who bought items in the list h also bought items in the list r
    • More Preliminaries • Recommendation = Matrix multiply for each user u: for each item t1: for each item t2: rt1 += au,t1 au,t2 ht2 A: our users’ histories users are rows items are columns
    • More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns sd = Σt2 Σu au,t1 au,t2 qt2
    • More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns s = A’ (A q)
    • More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns s = (A’A) q
    • More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns s = (A’A) q ish!
    • Why so ish? • In real life, ish happens because: • Big data ... so we selectively sample • Sparse data ... so we smooth • Finite computers ... so we sparsify • Top-40 effect ... so we use some stats
    • The same in spite of ish • The shape of the computation is unchanged • The cost of the computation is unchanged • Broad algebraic conclusions still hold
    • Back to recommendations ...
    • Dyadic Structure ● Functional – Interaction: actor -> item* ● Relational – Interaction ⊆ Actors x Items ● Matrix – Rows indexed by actor, columns by item – Value is count of interactions ● Predict missing observations
    • Fundamental Algorithmics ● Cooccurrence ● A is actors x items, K is items x items ● Product has general shape of matrix ● K tells us “users who interacted with x also interacted with y”
    • Fundamental Algorithmic Structure ● Cooccurrence ● Matrix approximation by factoring ● LLR
    • But Wait ...
    • But Wait ... Does it have to be that way?
    • What we have: For a user who watched/bought/listened to this
    • What we have: For a user who watched/bought/listened to this Sum over all other users who watched/bought/...
    • What we have: For a user who watched/bought/listened to this Sum over all other users who watched/bought/... Add up what they watched/bought/listened to
    • What we have: For a user who watched/bought/listened to this Sum over all other users who watched/bought/... Add up what they watched/bought/listened to And recommend that
    • What we have: For a user who watched/bought/listened to this Sum over all other users who watched/bought/... Add up what they watched/bought/listened to And recommend that ish
    • What we have: Add up what they watched/bought/listened to
    • What we have: Add up what they watched/bought/listened to But wait, we can do that faster
    • What we have: Add up what they watched/bought/listened to But wait, we can do that faster
    • But why not ...
    • But why not ...
    • But why not ... Why just dyadic learning?
    • But why not ... Why just dyadic learning? Why not triadic learning?
    • But why not ... Why just dyadic learning? Why not p-adic learning?
    • For example ● Users enter queries (A) – (actor = user, item=query) ● Users view videos (B) – (actor = user, item=video) ● AʼA gives query recommendation – “did you mean to ask for” ● BʼB gives video recommendation – “you might like these videos”
    • The punch-line ● BʼA recommends videos in response to a query – (isnʼt that a search engine?) – (not quite, it doesnʼt look at content or meta-data)
    • Real-life example ● Query: “Paco de Lucia” ● Conventional meta-data search results: – “hombres del paco” times 400 – not much else ● Recommendation based search: – Flamenco guitar and dancers – Spanish and classical guitar – Van Halen doing a classical/flamenco riff
    • Real-life example
    • Real-life example
    • System Diagram Viewing Logs t user video Search Logs t user query-term selective sampler selective sampler count count join on user count Related videos v => v1 v2... Related terms v => t1 t2... llr + sparsify Hadoop
    • Indexing Related videos v => v1 v2... Video meta v => url title... join on video Lucene Index Related terms v => t1 t2... Hadoop Lucene (+Katta?)
    • Hypothetical Example ● Want a navigational ontology? ● Just put labels on a web page with traffic – This gives A = users x label clicks ● Remember viewing history – This gives B = users x items ● Cross recommend – BʼA = click to item mapping ● After several users click, results are whatever users think they should be
    • Resources ● My blog – http://tdunning.blogspot.com/ ● The original LLR in NLP paper – Accurate Methods for the Statistics of Surprise and Coincidence (check on citeseer) ● Source code – Mahout project – contact me (tdunning@apache.org)