Singular Value Decomposition (SVD) is a matrix decomposition technique with many applications in areas like genetics, natural language processing (NLP), and social network analysis. All these application areas result in very large matrices with millions of rows and Features.
In genetics, matrix entries represent gene response for an individual, while in NLP these entries represent a term frequency per document. Datasets in these areas are growing rapidly and one processor cannot compute SVD in a feasible amount of time. Randomized algorithms for SVD with sketching have gained traction as they perform significantly better than classical deterministic algorithms in speed, accuracy, and robustness. In addition, these algorithms can be implemented to exploit multi-processor architectures and run on large-scale clusters with 1000s of cores on large datasets.
This all sounds good. But there is a challenge! In data streaming applications data arrives in random order and is not directly suitable for randomized algorithms as they expect the whole dataset to be available. In this talk, we will present a hybrid approach of applying frequent directions algorithms that are well suited for data streaming applications and randomized algorithm for fast SVD computation. We will present results in various applications including video and NLP for datasets with billions of nonzero entries on clusters with 1000s of cores. We will also discuss how Spark performs for these large-scale machine learning challenge.
6. Others in the same family
Principal component analysis (PCA)
Eigenvalue decomposition
Latent Semantic Indexing (LSI)
Latent Semantic Analysis (LSA)
#AI2SAIS
27. Steps
Create a data matrix A using normal images and
compute SVD
Image 1
Image 2
Image N
#AI2SAIS
28. Anomaly Score
For a new image y compute anomaly score = | 𝐼 − 𝑉𝑉E 𝑦 |
Intuition: Find the closest point to y that can be formed as a linear combination of
vectors in V
#AI2SAIS
29. 𝐴 → 𝑀 𝑥 𝑁
𝐵 −> 2𝑛 𝑥 𝑁
𝑈
Σ 𝑉′
Zeros
K
Randomized
SVD
𝐵 = 𝛼𝑉′
Randomized
SVD
𝑛
𝑛
K𝑛
𝑛
𝐵 = 𝛼𝑉′
n rows from A
Anomaly detection
#AI2SAIS
Anomaly
32. Problem formulation
Forward Problem Inverse problem
#AI2SAIS
ab
ac
=Δ𝑢 + 𝑢(𝑢 − 𝑎)(𝑢 − 1)
Solving for u
Given a
ab
ac
=Δ𝑢 + 𝑢(𝑢 − 𝑎)(𝑢 − 1)
Given u (ECG) Solving for a
35. SVD based solver 100x faster!
#AI2SAIS
Use SVD to exploit the structure of the problem
and design solvers
ab
ac
=Δ𝑢 + 𝑢(𝑢 − 𝑎)(𝑢 − 1)
Output ECG
Solving for a