Your SlideShare is downloading. ×
Seattle Scalability Mahout
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Seattle Scalability Mahout


Published on

Talk given at the Seattle Scalability / NoSQL / Hadoop / etc MeetUp on March 31, 2010

Talk given at the Seattle Scalability / NoSQL / Hadoop / etc MeetUp on March 31, 2010

Published in: Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • And the usual references for LSI and Spectral Decomposition
  • Transcript

    • 1. Numerical Recipes
      Jake Mannix
      Principal SDE, LinkedIn
      Committer, Apache Mahout, Zoie, Bobo-Browse, Decomposer
      Author, Lucene in Depth (Manning MM/DD/2010)
    • 2. A Mathematician’s Apology
      What mathematical structure describes all of these?
      Full-text search:
      Score documents matching “query string”
      Collaborative filtering recommendation:
      Users who liked {those} also liked {these}
      (Social/web)-graph proximity:
      People/pages “close” to {this} are {these}
    • 3. Matrix Multiplication!
    • 4. Full-text Search
      Vector Space Model of IR
      Corpus as term-document matrix
      Query as bag-of-words vector
      Full-text search is just:
    • 5. Collaborative Filtering
      User preference matrix
      (and item-item similarity matrix )
      Input user as vector of preferences
      (simple) Item-based CF recommendations are:
    • 6. Graph Proximity
      Adjacency matrix:
      2nd degree adjacency matrix:
      Input all of a user’s “friends” or page links:
      (weighted) distance measure of 1st – 3rd degree connections is then:
    • 7. Dictionary
      Applications Linear Algebra
    • 8. How does this help?
      In Search:
      Latent Semantic Indexing (LSI)
      probabalistic LSI
      Latent Dirichlet Allocation
      In Recommenders:
      Singular Value Decomposition
      Layered Restricted Boltzmann Machines
      (Deep Belief Networks)
      In Graphs:
      Spectral Decomposition / Spectral Clustering
    • 9. Often use “Dimensional Reduction”
      To alleviate the sparse Big Data problem of “the curse of dimensionality”
      Used to improve recall and relevance
      in general: smooth the metric on your data set
    • 10. New applications with Matrices
      If Search is finding doc-vector by:
      and users query with data represented: Q =
      Giving implicit feedback based on click-through per session: C =
    • 11. … continued
      Then has the form (docs-by-terms) for search!
      Approach has been used by Ted Dunning at Veoh
      (and probably others)
    • 12. Linear Algebra performance tricks
      Naïve item-based recommendations:
      Calculate item similarity matrix:
      Calculate item recs:
      Express in one step:
      In matrix notation:
      Re-writing as:
      is the vector of preferences for user “v”,
      is the vector of preferences of item “i”
      The result is the matrix sum of the outer (tensor) products of these vectors, scaled by the entry they intersect at.
    • 13. Item Recommender via Hadoop
    • 14. Apache Mahout
      Apache Mahout currently on release 0.3
      Will be a “Top Level Project” soon (before 0.4)
      ( )
      “Scalable Machine Learning with commercially friendly licensing”
    • 15. Mahout Features
      absorbed the Taste project
      Classification (Naïve Bayes, C-Bayes, more)
      Clustering (Canopy, fuzzy-K-means, Dirichlet, etc…)
      Fast non-distributed linear mathematics
      absorbed the classic CERN Colt project
      Distributed Matrices and decomposition
      absorbed the Decomposer project
      mahout shell-script analogous to $HADOOP_HOME/bin/hadoop
      $MAHOUT_HOME/bin/mahout kmeans –i “in” –o “out” –k 100
      $MAHOUT_HOME/bin/mahout svd –i “in” –o “out” –k 300
      Taste web-app for real-time recommendations
    • 16. DistributedRowMatrix
      Wrapper around a SequenceFile<IntWritable,VectorWritable>
      Distributed methods like:
      Matrix transpose();
      Matrix times(Matrix other);
      Vector times(Vectorv);
      Vector timesSquared(Vectorv);
      To get SVD: pass into DistributedLanczosSolver:
      LanczosSolver.solve(Matrix input, Matrix eigenVectors, List<Double> eigenValues, int rank);
    • 17. Questions?
    • 18. Appendix
      There are lots of ways to deal with sparse Big Data, and many (not all) need to deal with the dimensionality of the feature-space growing beyond reasonable limits, and techniques to deal with this depend heavily on your data…
      That having been said, there are some general techniques
    • 19. Dealing with Curse of Dimensionality
      Sparseness means fast, but overlap is too small
      Can we reduce the dimensionality (from “all possible text tokens” or “all userIds”) while keeping the nice aspects of the search problem?
      If possible, collapse “similar” vectors (synonymous terms, userIds with high overlap, etc…) towards each other while keeping “dissimilar” vectors far apart…
    • 20. Solution A: Matrix decomposition
      Singular Value Decomposition (truncated)
      “best” approximation to your matrix
      Used in Latent Semantic Indexing (LSI)
      For graphs: spectral decomposition
      Collaborative filtering (Netflix leaderboard)
      Issues: very computation intensive
      no parallelized open-source packages see Apache Mahout
      Makes things too dense
    • 21. SVD: continued
      Hadoopimpl. in Mahout (Lanczos)
      O(N*d*k) for rank-k SVD on N docs, delt’s each
      Density can be dealt with by doing Canopy Clustering offline
      But only extracting linear feature mixes
      Also, still very computation intensive and I/O intensive (k-passes over data set), are there better dimensional reduction methods?
    • 22. Solution B: Stochastic Decomposition co-ocurrence-based kernel + online Random Projection + SVD
    • 23. Co-ocurrence-based kernel
      Extract bigram phrases / pairs of items rated by the same person (using Log-Likelihood Ratio test to pick the best)
      “Disney on Ice was Amazing!” -> {“disney”, “disney on ice”, “ice”, “was” “amazing”}
      {item1:4, item2:5, item5:3, item9:1} -> {item1:4, (items1+2):4.5, item2:5, item5:3,…}
      Dim(features) goes from 105to 108+(yikes!)
    • 24. Online Random Projection
      Randomly project kernelized text vectors down to “merely” 103dimensions with a Gaussian matrix
      Or project eachnGram down to an random (but sparse) 103-dim vector:
      V= {123876244 =>1.3} (tf-IDF of “disney”)
      V’= c*{h(i) => 1, h(h(i)) =>1, h(h(h(i))) =>1}
      (c= 1.3 / sqrt(3))
    • 25. Outer-product and Sum
      Take the 103-dim projected vectors and outer-product with themselves,
      result is 103x103-dim matrix
      • sum these in a Combiner
      All results go to single Reducer, where you compute…
    • 26. SVD
      SVD-them quickly (they fit in memory)
      Over and over again (as new data comes in)
      Use the most recent SVD to project your (already randomly projected) text still further (now encoding “semantic” similarity).
      SVD-projected vectors can be assigned immediately to nearest clusters if desired
    • 27. References
      Randomized matrix decomposition review:
      Sparse hashing/projection:
      John Langford et al. “VowpalWabbit”