4.
Full-text Search<br />Vector Space Model of IR<br />Corpus as term-document matrix<br />Query as bag-of-words vector<br />Full-text search is just: <br />
5.
Collaborative Filtering<br />User preference matrix <br />(and item-item similarity matrix )<br />Input user as vector of preferences <br />(simple) Item-based CF recommendations are:<br />T<br />
6.
Graph Proximity<br />Adjacency matrix:<br />2nd degree adjacency matrix: <br /> Input all of a user’s “friends” or page links:<br />(weighted) distance measure of 1st – 3rd degree connections is then:<br />
7.
Dictionary<br />Applications Linear Algebra<br />
8.
How does this help?<br />In Search:<br />Latent Semantic Indexing (LSI)<br />probabalistic LSI<br />Latent Dirichlet Allocation<br />In Recommenders:<br />Singular Value Decomposition<br />Layered Restricted Boltzmann Machines <br />(Deep Belief Networks)<br />In Graphs:<br />PageRank<br />Spectral Decomposition / Spectral Clustering<br />
9.
Often use “Dimensional Reduction”<br />To alleviate the sparse Big Data problem of “the curse of dimensionality”<br />Used to improve recall and relevance <br />in general: smooth the metric on your data set<br />
10.
New applications with Matrices<br />If Search is finding doc-vector by: <br />and users query with data represented: Q = <br />Giving implicit feedback based on click-through per session: C =<br />
11.
… continued<br />Then has the form (docs-by-terms) for search!<br />Approach has been used by Ted Dunning at Veoh<br />(and probably others)<br />
12.
Linear Algebra performance tricks<br />Naïve item-based recommendations:<br />Calculate item similarity matrix:<br />Calculate item recs:<br />Express in one step:<br />In matrix notation:<br />Re-writing as:<br /> is the vector of preferences for user “v”, <br /> is the vector of preferences of item “i”<br />The result is the matrix sum of the outer (tensor) products of these vectors, scaled by the entry they intersect at.<br />
18.
Appendix<br />There are lots of ways to deal with sparse Big Data, and many (not all) need to deal with the dimensionality of the feature-space growing beyond reasonable limits, and techniques to deal with this depend heavily on your data…<br />That having been said, there are some general techniques<br />
19.
Dealing with Curse of Dimensionality<br />Sparseness means fast, but overlap is too small<br />Can we reduce the dimensionality (from “all possible text tokens” or “all userIds”) while keeping the nice aspects of the search problem?<br />If possible, collapse “similar” vectors (synonymous terms, userIds with high overlap, etc…) towards each other while keeping “dissimilar” vectors far apart…<br />
20.
Solution A: Matrix decomposition<br />Singular Value Decomposition (truncated)<br />“best” approximation to your matrix<br />Used in Latent Semantic Indexing (LSI)<br />For graphs: spectral decomposition<br />Collaborative filtering (Netflix leaderboard)<br />Issues: very computation intensive <br />no parallelized open-source packages see Apache Mahout<br />Makes things too dense<br />
21.
SVD: continued<br />Hadoopimpl. in Mahout (Lanczos)<br />O(N*d*k) for rank-k SVD on N docs, delt’s each <br />Density can be dealt with by doing Canopy Clustering offline<br />But only extracting linear feature mixes<br />Also, still very computation intensive and I/O intensive (k-passes over data set), are there better dimensional reduction methods?<br />
23.
Co-ocurrence-based kernel<br />Extract bigram phrases / pairs of items rated by the same person (using Log-Likelihood Ratio test to pick the best)<br />“Disney on Ice was Amazing!” -> {“disney”, “disney on ice”, “ice”, “was” “amazing”}<br />{item1:4, item2:5, item5:3, item9:1} -> {item1:4, (items1+2):4.5, item2:5, item5:3,…}<br />Dim(features) goes from 105to 108+(yikes!)<br />
24.
Online Random Projection<br />Randomly project kernelized text vectors down to “merely” 103dimensions with a Gaussian matrix <br />Or project eachnGram down to an random (but sparse) 103-dim vector:<br />V= {123876244 =>1.3} (tf-IDF of “disney”)<br />V’= c*{h(i) => 1, h(h(i)) =>1, h(h(h(i))) =>1}<br /> (c= 1.3 / sqrt(3)) <br />
25.
Outer-product and Sum<br />Take the 103-dim projected vectors and outer-product with themselves,<br />result is 103x103-dim matrix<br /><ul><li>sum these in a Combiner</li></ul>All results go to single Reducer, where you compute…<br />
26.
SVD <br />SVD-them quickly (they fit in memory) <br />Over and over again (as new data comes in)<br />Use the most recent SVD to project your (already randomly projected) text still further (now encoding “semantic” similarity).<br />SVD-projected vectors can be assigned immediately to nearest clusters if desired<br />
Be the first to comment