Recommendation as Search: Reflections on Symmetry


Published on

When recommendation is described in mathematical terms as a matrix equation, a striking symmetry in the form of the equation becomes apparent.

Exploiting this symmetry allows us to build search engines that don't need meta-data and self-organizing web-sites.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • MapR combines the best of the open source technology with our own deep innovations to provide the most advanced distribution for Apache Hadoop.MapR’s team has a deep bench of enterprise software experience with proven success across storage, networking, virtualization, analytics, and open source technologies.Our CEO has driven multiple companies to successful outcomes in the analytic, storage, and virtualization spaces.Our CTO and co-founder M.C. Srivas was most recently at Google in BigTable. He understands the challenges of MapReduce at huge scale. Srivas was also the chief software architect at Spinnaker Networks which came out of stealth with the fastest NAS storage on the market and was acquired quickly by NetAppThe team includes experience with enterprise storage at Cisco, VmWare, IBM and EMC. Our VP of Engineering led emerging technologies and a 600 person for EMC’s NAS engineering team. We also have experience in Business Intelligence and Analytic companies and open source committers in Hadoop, Zookeeper and Mahout including PMC members.MapR is proven technology with installs by leading Hadoop installations across industries and OEM by EMC and Cisco.
  • Recommendation as Search: Reflections on Symmetry

    1. 1. 1©MapR Technologies - Confidential Recommendation as Search Reflections on Symmetry
    2. 2. 2©MapR Technologies - Confidential Company Background  MapR provides the industry’s best Hadoop Distribution – Combines the best of the Hadoop community contributions with significant internally financed infrastructure development  Background of Team – Deep management bench with extensive analytic, storage, virtualization, and open source experience – Google, EMC, Cisco, VMWare, Network Appliance, IBM, Microsoft, Apache Foundation, Aster Data, Brio, ParAccel  Proven – MapR used across industries (Financial Services, Media, Telcom, Health Care, Internet Services, Government) – Strategic OEM relationship with EMC and Cisco – Over 1,000 installs
    3. 3. 3©MapR Technologies - Confidential What is Hadoop?  A new style of computation  A new style of combining computation and storage  Allows very large computations  Used by all large internet companies, many other industries  Fundamentally changes the economics of large-scale computation
    4. 4. 4©MapR Technologies - Confidential Why Big Data?  Because we can  Because we can learn new things  Because new economics of computation favors large scale  Because big data can be simpler than small data
    5. 5. 5©MapR Technologies - Confidential Recommendations  Often known as collaborative filtering “People who bought x also bought y”  Actors (people) interact (bought) with items (x and y) – observe successful interaction  We want to suggest additional successful interactions  Observations are inherently very sparse
    6. 6. 6©MapR Technologies - Confidential Examples  Customers buying books (Linden et al)  Web visitors rating music (Shardanand and Maes) or movies (Riedl, et al), (Netflix)  Internet radio listeners not skipping songs (Musicmatch)  Internet video watchers watching >30 s (Veoh)  iTunes song purchases or plays (Apple)
    7. 7. 7©MapR Technologies - Confidential Fundamental Algorithm  History matrix A has the shape of actors x items  Cooccurrence matrix K has the shape of items x items an actor interacted with both x and y sum over all actors  A is also a linear operator  K tells us “users who interacted with x also interacted with y”
    8. 8. 8©MapR Technologies - Confidential … Warning …
    9. 9. 9©MapR Technologies - Confidential … Warning … Mathematics ahead
    10. 10. 10©MapR Technologies - Confidential Fundamental Algorithmic Structure  Cooccurrence  For very large data-sets K = AT A r = AT (Ah) = (AT A)h r =sparsify(AT A)h
    11. 11. 11©MapR Technologies - Confidential But Wait ... Does it have to be that way?
    12. 12. 12©MapR Technologies - Confidential But why not ... (AT A)h
    13. 13. 13©MapR Technologies - Confidential But why not ... Why just dyadic learning? (AT A)h
    14. 14. 14©MapR Technologies - Confidential But why not ... Why just dyadic learning? Why not triadic learning? (BT A)h
    15. 15. 15©MapR Technologies - Confidential But why not ... Why just dyadic learning? Why not p-adic learning? (BT A)h
    16. 16. 16©MapR Technologies - Confidential For example  Users enter queries (A) – (actor = user, item=query)  Users view videos (B) – (actor = user, item=video)  A’A gives query recommendation – “did you mean to ask for”  B’B gives video recommendation – “you might like these videos”
    17. 17. 17©MapR Technologies - Confidential The punch-line  B’A recommends videos in response to a query – (isn’t that a search engine?) – (not quite, it doesn’t look at content or meta-data)
    18. 18. 18©MapR Technologies - Confidential Real-life example  Query: “Paco de Lucia”  Conventional meta-data search results: – “hombres del paco” times 400 – not much else  Recommendation based search: – Flamenco guitar and dancers – Spanish and classical guitar – Van Halen doing a classical/flamenco riff
    19. 19. 19©MapR Technologies - Confidential Real-life example
    20. 20. 20©MapR Technologies - Confidential Real-life example
    21. 21. 21©MapR Technologies - Confidential Hypothetical Example  Want a navigational ontology?  Just put labels on a web page with traffic – This gives A = users x label clicks  Remember viewing history – This gives B = users x items  Cross recommend – B’A = label to item mapping  After several users click, results are whatever users think they should be
    22. 22. 22©MapR Technologies - Confidential Resources  Me @ted_dunning  Slides and such: –  The original paper – Accurate Methods for the Statistics of Surprise and Coincidence – (check on citeseer)  Source code – Mahout project – contact me