Santi Adavani/Vinay Rao
Rocketml.net
@rocketml
From Genomics to NLP: One
algorithm to rule them all
#AI2SAIS
#AI2SAIS
Outline
• Introduction
• Making sense of SVD
• One algorithm to rule them all
• Scale out SVD
#AI2SAIS
#AI2SAIS
Singular Value Decomposition (SVD)
#AI2SAIS
𝑈, 𝑉	𝑎𝑟𝑒	𝑜𝑟𝑡ℎ𝑜𝑛𝑜𝑟𝑚𝑎𝑙
Σ	is		a	diagonal	matrix
	with	singular	values
Others in the same family
Principal component analysis (PCA)
Eigenvalue decomposition
Latent Semantic Indexing (LSI)
Latent Semantic Analysis (LSA)
#AI2SAIS
Outline
• Introduction
• Making sense of SVD
• One algorithm to rule them all
• Scale out SVD
#AI2SAIS
Making sense of SVD
#AI2SAIS
Dimensionality reduction
UABCDACCΣACCDACC 𝑉ACCDACC
E
𝑈 Σ 𝑉′
#AI2SAIS
Reconstruct the matrix
UABCDGΣGDG 𝑉GDACC
E
𝑈 Σ 𝑉′
#AI2SAIS
Singular values and cumulative sum
#AI2SAIS
UABCDHΣHDH 𝑉HDACC
E UABCDIΣIDI 𝑉IDACC
E
𝐴
#AI2SAIS
UHCΣHCDHC 𝑉HC
E UKCCΣKCCDKCC 𝑉KCC
E
180000 8510017020
9.4% 47%
𝐴
#AI2SAIS
UHCCΣHCCDHCC 𝑉HCC
E
180000 170200
94%
#AI2SAIS
Key points
#AI2SAIS
Reduce
dimension
without losing
information
Drop
components
that are noisy
or do not
contribute
Identify
components
that contribute
the most
If eigenvalues
are decaying
fast then there
is scope for
dimensionality
reduction
Outline
• Introduction
• Making sense of SVD
• One algorithm to rule them all
• Scale out SVD
#AI2SAIS
#AI2SAIS
Supervised
Learning
Unsupervised
Learning
Inverse
problems
#AI2SAIS
Supervised
Learning
Unsupervised
Learning
Inverse
problems
Supervised learning
#AI2SAIS
𝐴𝑀	𝑠𝑎𝑚𝑝𝑙𝑒𝑠
	𝑁	𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠
𝑀	𝑙𝑎𝑏𝑒𝑙𝑠
𝑦
#AI2SAIS
𝐴′
𝑀	𝑠𝑎𝑚𝑝𝑙𝑒𝑠
	𝑛 ≪ 𝑁	𝑠𝑖𝑛𝑔𝑢𝑙𝑎𝑟	
𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠
𝑀	𝑙𝑎𝑏𝑒𝑙𝑠 𝑦
𝐴 = UΣ𝑉′
𝐴′ = 𝐴𝑉
Use the new matrix A’ to solve supervised learning problems using
SVM, Logistic Regression, Decision Trees, Neural Networks etc.
#AI2SAIS
Supervised
Learning
Unsupervised
Learning
Inverse
problems
Unsupervised learning
#AI2SAIS
𝐴𝑀	𝑠𝑎𝑚𝑝𝑙𝑒𝑠
	𝑁	𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠
#AI2SAIS
𝐴′
𝑀	𝑠𝑎𝑚𝑝𝑙𝑒𝑠
	𝑛 ≪ 𝑁	𝑠𝑖𝑛𝑔𝑢𝑙𝑎𝑟	
𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠
𝐴 = UΣ𝑉′
𝐴′ = 𝐴𝑉
Use the new matrix A’ for clustering, nearest-neighbors, anomaly
detection
𝐴	 → 𝑀	𝑥	𝑁
𝐵	 −> 2𝑛	𝑥	𝑁 𝑈
Σ 𝑉′
Zeros
K
Randomized
SVD
𝐵 = 𝛼𝑉′
Randomized
SVD
𝑛
𝑛
K𝑛
𝑛
𝐵 = 𝛼𝑉′
n rows from A
SVD on streaming data
#AI2SAIS
#AI2SAIS
Anomaly detection
Anomaly
#AI2SAIS
Steps
Create a data matrix A using normal images and
compute SVD
Image 1
Image 2
Image N
#AI2SAIS
Anomaly Score
For a new image y compute anomaly score = | 𝐼	 − 𝑉𝑉E 𝑦 |
Intuition: Find the closest point to y that can be formed as a linear combination of
vectors in V
#AI2SAIS
𝐴	 → 𝑀	𝑥	𝑁
𝐵	 −> 2𝑛	𝑥	𝑁
𝑈
Σ 𝑉′
Zeros
K
Randomized
SVD
𝐵 = 𝛼𝑉′
Randomized
SVD
𝑛
𝑛
K𝑛
𝑛
𝐵 = 𝛼𝑉′
n rows from A
Anomaly detection
#AI2SAIS
Anomaly
#AI2SAIS
Supervised
Learning
Unsupervised
Learning
Inverse
problems
AI to assist cardiologists
#AI2SAIS
Problem formulation
Forward Problem Inverse problem
#AI2SAIS
ab
ac
=Δ𝑢 + 𝑢(𝑢 − 𝑎)(𝑢 − 1)
Solving for u
Given a
ab
ac
=Δ𝑢 + 𝑢(𝑢 − 𝑎)(𝑢 − 1)
Given u (ECG) Solving for a
#AI2SAIS
Healthy Tissue
#AI2SAIS
Ischemic Tissue
SVD based solver 100x faster!
#AI2SAIS
Use SVD to exploit the structure of the problem
and design solvers
ab
ac
=Δ𝑢 + 𝑢(𝑢 − 𝑎)(𝑢 − 1)
Output ECG
Solving for a
Outline
• Introduction
• Making sense of SVD
• One algorithm to rule them all
• Scale out SVD
#AI2SAIS
SVD on 14M x 1M matrix
#AI2SAIS
Distributed SVD on 1008 cores
#AI2SAIS
#AI2SAIS
Thank you!
#AI2SAIS

From Genomics to NLP: One algorithm to rule them all