SlideShare a Scribd company logo
Efficient Anomaly Detection
via Matrix Sketching
32nd Conference on Neural Information Processing Systems
(NeurIPS 2018), Montréal, Canada.
PPT maker: 謝幸娟 (Hsing-Chuan Hsieh)
connie0915549431@hotmail.com
Vatsal Sharan
Stanford University
vsharan@stanford.edu
Parikshit Gopalan
VMware Research
pgopalan@vmware.com
Udi Wieder
VMware Research
uwieder@vmware.com
1. Introduction
1. Introduction
• Context
Anomaly detection (AD) in high-dimensional numeric (streaming)
data is a ubiquitous problem in machine learning
• E.g., parameters regarding the health of machines in a data-center
Popular PCA/subspace based AD is used
• E.g. of anomaly scores: rank-k leverage scores , rank-k projection distance…
1. Introduction
• Computational challenge
Dimension of the data matrix 𝐴 ∈ 𝑅 𝑛×𝑑 may be very large
• E.g. of the data center: 𝑑 ≈ 106 and 𝑛 ≫ 𝑑
 The quadratic dependence on d renders standard approach of AD
inefficient in high dimensions
• Note: the standard approach to compute anomaly scores from k PC’s
in a streaming setting:
1. Compute covariance matrix 𝐴 𝑇 𝐴 ∈ 𝑅 𝑑×𝑑  space ~ 𝑂(𝑑2) ; time ~ 𝑂(𝑛𝑑2)
2. then compute the top k PC’s for anomaly scores
1. Introduction
• Requirement for an algorithm to be efficient
1. As n too large:
the algorithm must work in a streaming fashion where it only gets a
constant number of passes over the dataset stored in memory.
2. As d too large:
the algorithm should ideally use memory linear (i.e. 𝑂(𝑑)) or even
sublinear in d (e.g. 𝑂 log(𝑑) )
1. Introduction
• Purpose
1. Provide simple and practical algorithms at a significantly lower cost in
terms of time and memory. I.e., 𝑂(𝑑2)  𝑂(𝑑) or 𝑂 log(𝑑)
using popular matrix sketching techniques: 𝐴 ∈ 𝑅 𝑛×𝑑  𝐴 ∈ 𝑅 𝑙×𝑑 l ≪ n or 𝐴 ∈
𝑅 𝑛×𝑙
l ≪ 𝑑
 𝐴 preserves some desirable properties of the large matrix 𝐴
2. Prove that estimated subspace-based anomaly scores computed from
𝐴 approximate the true anomaly scores
2. Notation and Setup
2. Notation and Setup
• Data matrix 𝐴 ∈ 𝑅 𝑛×𝑑 s.t.
𝐴 = 𝑎 1
𝑇
; … ; 𝑎 𝑛
𝑇
𝑤ℎ𝑒𝑟𝑒 𝑎 𝑖 ∈ 𝑅 𝑑
= 𝑎(1)
, … , 𝑎(𝑑)
𝑤ℎ𝑒𝑟𝑒 𝑎
(𝑖)
∈ 𝑅 𝑛
• SVD of A: 𝐴 = 𝑈Σ𝑉 𝑇 𝑤ℎ𝑒𝑟𝑒
Σ = 𝑑𝑖𝑎𝑔 𝜎1, … , 𝜎 𝑑 , 𝜎1 ≥ ⋯ ≥ 𝜎 𝑑 > 0
𝑉 = [𝑣 1
, … , 𝑣(𝑑)
]
• Condition number of the top k subspace of A: 𝜅 𝑘 = 𝜎1
2
/𝜎 𝑘
2
• Frobenius norm: 𝐴 𝐹 = 𝑖=1
𝑛
𝑗=1
𝑑
|𝑎𝑖𝑗|2
• Operator norm: 𝐴 = 𝜎1
2. Notation and Setup
• Subspace based measures of anomalies for each instance 𝑎(𝑖) ∈ 𝑅 𝑑
Mahalanobis distance/ leverage score:
𝐿 𝑖 =
𝑗=1
𝑑
(𝑎 𝑖
𝑇
𝑣 𝑗
/𝜎𝑖)2
• 𝐿 𝑖 is highly sensitive to smaller singular values
Rank k leverage score (𝑘 ≪ 𝑑):
𝐿 𝑘 𝑖 =
𝑗=1
𝑘
(𝑎 𝑖
𝑇
𝑣 𝑗 )2/𝜎𝑗
2
• real world data sets often have most of their signal in the top singular values
Rank k projection distance:
𝑇 𝑘
𝑖 =
𝑗=𝑘+1
𝑑
(𝑎 𝑖
𝑇
𝑣 𝑗
)2
• to catch anomalies that are far from the principal subspace
2. Notation and Setup
• Subspace based measures of anomalies for each instance 𝑎(𝑖) ∈ 𝑅 𝑑
Illustration of Rank k leverage score/projection distance
2. Notation and Setup
• Assumptions
1. Separation assumption
In case of degeneracy (i.e., 𝜎 𝑘
2
= 𝜎 𝑘+1
2
)  𝐿 𝑘
𝑖 , 𝑇 𝑘
𝑖 misdefined
2. Approximate low-rank assumption
that the top-k principal subspace captures a constant fraction (at least 0.1) of the total variance in
the data
• Assumption 2 captures the setting where the scores 𝐿 𝑘and 𝑇 𝑘 are most meaningful.
• Our algorithms work best on data sets where relatively few principal components explain most of
the variance.
3. Guarantees for anomaly detection via sketching
• Our main results say that given 𝜇 > 0 and a (k, ∆)-separated matrix A
∈ 𝑅 𝑛×𝑑 with top singular value 𝜎1, any sketch 𝐴 ∈ 𝑅 𝑙×𝑑 satisfying
𝐴 𝑇 𝐴 − 𝐴 𝑇 𝐴 ≤ 𝜇𝜎1
2
or a sketch 𝐴 ∈ 𝑅 𝑛×𝑙 satisfying
𝐴𝐴 𝑇
− 𝐴 𝐴 𝑇
≤ 𝜇𝜎1
2
can be used to approximate rank k leverage scores and the projection
distance from the principal k-dimensional subspace.
• Explanation:
1. Equation (2): an approximation to the covariance matrix of the row vectors
2. Equation (3): an approximation to the covariance matrix of the column
vectors.
(2)
(3)
d× 𝑑
𝑛 × 𝑛
3. Guarantees for anomaly detection via sketching
• Next, we show how to design efficient algorithms for finding
anomalies in a streaming fashion.
• Pointwise guarantees from row space approximations
Algorithm 1: random column projection/ row space approximation
• Sketching techs: Frequent Directions sketch [18], row-sampling [19], random
column projection, …
(𝑛 → 𝑙)
3. Guarantees for anomaly detection via sketching
•Pointwise guarantees from row space approximations
• Thm 1 improves on the trivial 𝑂(𝑑2) bound of memory/ time
 𝑂(𝑑)  𝑂(𝑛𝑑) ,since 𝑙 is independent of d
3. Guarantees for anomaly detection via sketching
• Next we show that substantial savings are unlikely for any algorithm
with strong pointwise guarantees:
there is an Ω(d) lower bound for any approximation
3. Guarantees for anomaly detection via sketching
• Average-case guarantees from columns space approximations
Even though the sketch gives column space approximations (i.e.
𝐴𝐴 𝑇 approximation satisfying Equ. (3))
Our goal is still to compute the row anomaly scores from 𝐴 𝑇 𝐴
• E.g., by random matrix 𝑹 ∈ 𝑹 𝑑×𝑙 s.t. 𝑟𝑖𝑗 ~
𝑖.𝑖.𝑑 𝑈{±1}) where 𝑙 = 𝑂(𝑘/𝜇2) [27]
sketch 𝑨 = 𝑨𝑹 for which returns the anomaly scores
Space consumption
• For 𝐴 𝑇
𝐴  𝑂 𝑙2
• For 𝑹 able to be pseudorandom [28-30]  𝑂 log(𝑑)
3. Guarantees for anomaly detection via sketching
• Average-case guarantees from columns space approximations
• Algorithm 2: random row projection/ column space approximation
3. Guarantees for anomaly detection via sketching
• Average-case guarantees from columns space approximations
• i.e., on average random projections can preserve leverage scores and
distances from the principal subspace
• 𝑙 = poly(k, 𝜀−1, ∆), indep. of both n and d.
4. Experimental evaluation
• Goal
1. to test whether our algorithms give comparable results to exact anomaly
score computation based on full SVD.
2. to determine how large the parameter 𝑙 (determining the size of the sketch)
needs to be to get close to the exact scores
Note: The experiment is in backend mode (i.e. not online mode)
4. Experimental evaluation
• Data: (1) p53 mutants [32], (2) Dorothea [33], (3) RCV1 [34]
All available from the UCI Machine Learning Repository,
• Ground truth: to decide anomalies
• We compute the rand k anomaly scores using a full SVD, and then label the η
fraction of points with the highest anomaly scores to be outliers.
1. k chosen by examining the explained variance of the datatset: typically between (10, 125)
2. η chosen by examining the histogram of the anomaly score; typically between (0.01, 0.1)
4. Experimental evaluation
• Experimental design
• 3 datasets x 2 algorithms x 2 anomaly scores
• Parameters (depending on dataset) :
 ∴ 3*2*2*3*10=360 combinations
 iterate for 5 runs for each combination
• Measuring accuracy
1. After computing anomaly scores for each dataset, we then declare the
points with the top η’ fraction of scores to be anomalies (without knowing
η)
2. Then compute the F1 score
Note: we choose the value of η’ which maximizes the F1 score
3. Report the average F1 score over 5 runs
1. k: 3 settings 2. 𝑙: 10 settings ranging (2k, 20k)
4. Experimental evaluation
• Performance of dataset: p53 mutants
4. Experimental evaluation
• Performance of dataset: Dorothea
4. Experimental evaluation
• Performance of dataset: RCV1
4. Experimental evaluation
• Takeaways:
1. Taking 𝑙 = Ck with a fairly small C ≈ 10 suffices to get F1 scores >
0.75 in most settings
2. Algorithm 1 generally outperforms Algorithm 2 for a given value of 𝑙
5. Conclusion
1. Any sketch 𝐴 ∈ 𝑅 𝑙×𝑑of A with the property that 𝐴 𝑇 𝐴 − 𝐴 𝑇 𝐴 is
small, can be used to additively approximate the 𝐿 𝑘 and 𝑇 𝑘 for each
row. We get a streaming algorithm that uses O(d) memory and O(nd)
time.
2. Q: Can we get such an additive approximation using memory only
𝑂(𝑑)?
1. The answer is no.
2. Lower bound: must use Ω(d) working space for approximating the outlier
scores for every data point
5. Conclusion
3. Using random row random projection, we give a streaming
algorithm that can preserve the outlier scores for the rows up to
small additive error on average
• and preserve most outliers. The space required by this algorithm is only
poly(k)log(d).
4. In our experiments, we found that choosing 𝑙 to be a small multiple
of k was sufficient to get good results.
1. Our results comparable full-blown SVD using sketches
2. significantly smaller in memory footprint
3. faster to compute and easy to implement
• 老師回饋
1. Code on Github?
2. SVD tool package
3. How to apply online-AD

More Related Content

What's hot

2021 06-02-tabnet
2021 06-02-tabnet2021 06-02-tabnet
2021 06-02-tabnet
JAEMINJEONG5
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
Ketaki Patwari
 
Image segmentation using advanced fuzzy c-mean algorithm [FYP @ IITR, obtaine...
Image segmentation using advanced fuzzy c-mean algorithm [FYP @ IITR, obtaine...Image segmentation using advanced fuzzy c-mean algorithm [FYP @ IITR, obtaine...
Image segmentation using advanced fuzzy c-mean algorithm [FYP @ IITR, obtaine...
Koteswar Rao Jerripothula
 
Image parts and segmentation
Image parts and segmentation Image parts and segmentation
Image parts and segmentation
Rappy Saha
 
Swin transformer
Swin transformerSwin transformer
Swin transformer
JAEMINJEONG5
 
2021 03-02-transformer interpretability
2021 03-02-transformer interpretability2021 03-02-transformer interpretability
2021 03-02-transformer interpretability
JAEMINJEONG5
 
PERFORMANCE EVALUATION OF DIFFERENT TECHNIQUES FOR TEXTURE CLASSIFICATION
PERFORMANCE EVALUATION OF DIFFERENT TECHNIQUES FOR TEXTURE CLASSIFICATION PERFORMANCE EVALUATION OF DIFFERENT TECHNIQUES FOR TEXTURE CLASSIFICATION
PERFORMANCE EVALUATION OF DIFFERENT TECHNIQUES FOR TEXTURE CLASSIFICATION
cscpconf
 
Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...
Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...
Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...Habibur Rahman
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
MojammilHusain
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
Mohamed Loey
 
Cnn
CnnCnn
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
Sangmin Woo
 
Efficiency and capability of fractal image compression with adaptive quardtre...
Efficiency and capability of fractal image compression with adaptive quardtre...Efficiency and capability of fractal image compression with adaptive quardtre...
Efficiency and capability of fractal image compression with adaptive quardtre...ijma
 
Data-Driven Motion Estimation With Spatial Adaptation
Data-Driven Motion Estimation With Spatial AdaptationData-Driven Motion Estimation With Spatial Adaptation
Data-Driven Motion Estimation With Spatial Adaptation
CSCJournals
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
Rania H
 
Images Analysis  in matlab
Images Analysis  in matlabImages Analysis  in matlab
Images Analysis  in matlab
mustafa_92
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processing
Data Science Thailand
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Vishal Mishra
 
WAVELET BASED AUTHENTICATION/SECRET TRANSMISSION THROUGH IMAGE RESIZING (WA...
WAVELET BASED AUTHENTICATION/SECRET  TRANSMISSION THROUGH IMAGE RESIZING  (WA...WAVELET BASED AUTHENTICATION/SECRET  TRANSMISSION THROUGH IMAGE RESIZING  (WA...
WAVELET BASED AUTHENTICATION/SECRET TRANSMISSION THROUGH IMAGE RESIZING (WA...
sipij
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
Sungchul Kim
 

What's hot (20)

2021 06-02-tabnet
2021 06-02-tabnet2021 06-02-tabnet
2021 06-02-tabnet
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
 
Image segmentation using advanced fuzzy c-mean algorithm [FYP @ IITR, obtaine...
Image segmentation using advanced fuzzy c-mean algorithm [FYP @ IITR, obtaine...Image segmentation using advanced fuzzy c-mean algorithm [FYP @ IITR, obtaine...
Image segmentation using advanced fuzzy c-mean algorithm [FYP @ IITR, obtaine...
 
Image parts and segmentation
Image parts and segmentation Image parts and segmentation
Image parts and segmentation
 
Swin transformer
Swin transformerSwin transformer
Swin transformer
 
2021 03-02-transformer interpretability
2021 03-02-transformer interpretability2021 03-02-transformer interpretability
2021 03-02-transformer interpretability
 
PERFORMANCE EVALUATION OF DIFFERENT TECHNIQUES FOR TEXTURE CLASSIFICATION
PERFORMANCE EVALUATION OF DIFFERENT TECHNIQUES FOR TEXTURE CLASSIFICATION PERFORMANCE EVALUATION OF DIFFERENT TECHNIQUES FOR TEXTURE CLASSIFICATION
PERFORMANCE EVALUATION OF DIFFERENT TECHNIQUES FOR TEXTURE CLASSIFICATION
 
Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...
Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...
Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
Cnn
CnnCnn
Cnn
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Efficiency and capability of fractal image compression with adaptive quardtre...
Efficiency and capability of fractal image compression with adaptive quardtre...Efficiency and capability of fractal image compression with adaptive quardtre...
Efficiency and capability of fractal image compression with adaptive quardtre...
 
Data-Driven Motion Estimation With Spatial Adaptation
Data-Driven Motion Estimation With Spatial AdaptationData-Driven Motion Estimation With Spatial Adaptation
Data-Driven Motion Estimation With Spatial Adaptation
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
Images Analysis  in matlab
Images Analysis  in matlabImages Analysis  in matlab
Images Analysis  in matlab
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processing
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
 
WAVELET BASED AUTHENTICATION/SECRET TRANSMISSION THROUGH IMAGE RESIZING (WA...
WAVELET BASED AUTHENTICATION/SECRET  TRANSMISSION THROUGH IMAGE RESIZING  (WA...WAVELET BASED AUTHENTICATION/SECRET  TRANSMISSION THROUGH IMAGE RESIZING  (WA...
WAVELET BASED AUTHENTICATION/SECRET TRANSMISSION THROUGH IMAGE RESIZING (WA...
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 

Similar to Efficient anomaly detection via matrix sketching

SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15
Hao Zhuang
 
05 contours seg_matching
05 contours seg_matching05 contours seg_matching
05 contours seg_matching
ankit_ppt
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
ChenYiHuang5
 
lecture_01.ppt
lecture_01.pptlecture_01.ppt
lecture_01.ppt
ssuserd3cf02
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
홍배 김
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
ChenYiHuang5
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
홍배 김
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...
AmirParnianifard1
 
Evaluation of programs codes using machine learning
Evaluation of programs codes using machine learningEvaluation of programs codes using machine learning
Evaluation of programs codes using machine learning
Vivek Maskara
 
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
ssuser4b1f48
 
svm-proyekt.pptx
svm-proyekt.pptxsvm-proyekt.pptx
svm-proyekt.pptx
ElinEliyev
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
Vahid Mirjalili
 
CS345-Algorithms-II-Lecture-1-CS345-2016.pdf
CS345-Algorithms-II-Lecture-1-CS345-2016.pdfCS345-Algorithms-II-Lecture-1-CS345-2016.pdf
CS345-Algorithms-II-Lecture-1-CS345-2016.pdf
OpenWorld6
 
Compressed learning for time series classification
Compressed learning for time series classificationCompressed learning for time series classification
Compressed learning for time series classification
學翰 施
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章
Tsuyoshi Sakama
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
ChenYiHuang5
 
Randomized Algorithm- Advanced Algorithm
Randomized Algorithm- Advanced AlgorithmRandomized Algorithm- Advanced Algorithm
Randomized Algorithm- Advanced Algorithm
Mahbubur Rahman
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
CastLabKAIST
 

Similar to Efficient anomaly detection via matrix sketching (20)

SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15
 
05 contours seg_matching
05 contours seg_matching05 contours seg_matching
05 contours seg_matching
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
lecture_01.ppt
lecture_01.pptlecture_01.ppt
lecture_01.ppt
 
xldb-2015
xldb-2015xldb-2015
xldb-2015
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...
 
Evaluation of programs codes using machine learning
Evaluation of programs codes using machine learningEvaluation of programs codes using machine learning
Evaluation of programs codes using machine learning
 
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
 
svm-proyekt.pptx
svm-proyekt.pptxsvm-proyekt.pptx
svm-proyekt.pptx
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
 
CS345-Algorithms-II-Lecture-1-CS345-2016.pdf
CS345-Algorithms-II-Lecture-1-CS345-2016.pdfCS345-Algorithms-II-Lecture-1-CS345-2016.pdf
CS345-Algorithms-II-Lecture-1-CS345-2016.pdf
 
Compressed learning for time series classification
Compressed learning for time series classificationCompressed learning for time series classification
Compressed learning for time series classification
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Randomized Algorithm- Advanced Algorithm
Randomized Algorithm- Advanced AlgorithmRandomized Algorithm- Advanced Algorithm
Randomized Algorithm- Advanced Algorithm
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 

Recently uploaded

Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 

Recently uploaded (20)

Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 

Efficient anomaly detection via matrix sketching

  • 1. Efficient Anomaly Detection via Matrix Sketching 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada. PPT maker: 謝幸娟 (Hsing-Chuan Hsieh) connie0915549431@hotmail.com Vatsal Sharan Stanford University vsharan@stanford.edu Parikshit Gopalan VMware Research pgopalan@vmware.com Udi Wieder VMware Research uwieder@vmware.com
  • 3. 1. Introduction • Context Anomaly detection (AD) in high-dimensional numeric (streaming) data is a ubiquitous problem in machine learning • E.g., parameters regarding the health of machines in a data-center Popular PCA/subspace based AD is used • E.g. of anomaly scores: rank-k leverage scores , rank-k projection distance…
  • 4. 1. Introduction • Computational challenge Dimension of the data matrix 𝐴 ∈ 𝑅 𝑛×𝑑 may be very large • E.g. of the data center: 𝑑 ≈ 106 and 𝑛 ≫ 𝑑  The quadratic dependence on d renders standard approach of AD inefficient in high dimensions • Note: the standard approach to compute anomaly scores from k PC’s in a streaming setting: 1. Compute covariance matrix 𝐴 𝑇 𝐴 ∈ 𝑅 𝑑×𝑑  space ~ 𝑂(𝑑2) ; time ~ 𝑂(𝑛𝑑2) 2. then compute the top k PC’s for anomaly scores
  • 5. 1. Introduction • Requirement for an algorithm to be efficient 1. As n too large: the algorithm must work in a streaming fashion where it only gets a constant number of passes over the dataset stored in memory. 2. As d too large: the algorithm should ideally use memory linear (i.e. 𝑂(𝑑)) or even sublinear in d (e.g. 𝑂 log(𝑑) )
  • 6. 1. Introduction • Purpose 1. Provide simple and practical algorithms at a significantly lower cost in terms of time and memory. I.e., 𝑂(𝑑2)  𝑂(𝑑) or 𝑂 log(𝑑) using popular matrix sketching techniques: 𝐴 ∈ 𝑅 𝑛×𝑑  𝐴 ∈ 𝑅 𝑙×𝑑 l ≪ n or 𝐴 ∈ 𝑅 𝑛×𝑙 l ≪ 𝑑  𝐴 preserves some desirable properties of the large matrix 𝐴 2. Prove that estimated subspace-based anomaly scores computed from 𝐴 approximate the true anomaly scores
  • 8. 2. Notation and Setup • Data matrix 𝐴 ∈ 𝑅 𝑛×𝑑 s.t. 𝐴 = 𝑎 1 𝑇 ; … ; 𝑎 𝑛 𝑇 𝑤ℎ𝑒𝑟𝑒 𝑎 𝑖 ∈ 𝑅 𝑑 = 𝑎(1) , … , 𝑎(𝑑) 𝑤ℎ𝑒𝑟𝑒 𝑎 (𝑖) ∈ 𝑅 𝑛 • SVD of A: 𝐴 = 𝑈Σ𝑉 𝑇 𝑤ℎ𝑒𝑟𝑒 Σ = 𝑑𝑖𝑎𝑔 𝜎1, … , 𝜎 𝑑 , 𝜎1 ≥ ⋯ ≥ 𝜎 𝑑 > 0 𝑉 = [𝑣 1 , … , 𝑣(𝑑) ] • Condition number of the top k subspace of A: 𝜅 𝑘 = 𝜎1 2 /𝜎 𝑘 2 • Frobenius norm: 𝐴 𝐹 = 𝑖=1 𝑛 𝑗=1 𝑑 |𝑎𝑖𝑗|2 • Operator norm: 𝐴 = 𝜎1
  • 9. 2. Notation and Setup • Subspace based measures of anomalies for each instance 𝑎(𝑖) ∈ 𝑅 𝑑 Mahalanobis distance/ leverage score: 𝐿 𝑖 = 𝑗=1 𝑑 (𝑎 𝑖 𝑇 𝑣 𝑗 /𝜎𝑖)2 • 𝐿 𝑖 is highly sensitive to smaller singular values Rank k leverage score (𝑘 ≪ 𝑑): 𝐿 𝑘 𝑖 = 𝑗=1 𝑘 (𝑎 𝑖 𝑇 𝑣 𝑗 )2/𝜎𝑗 2 • real world data sets often have most of their signal in the top singular values Rank k projection distance: 𝑇 𝑘 𝑖 = 𝑗=𝑘+1 𝑑 (𝑎 𝑖 𝑇 𝑣 𝑗 )2 • to catch anomalies that are far from the principal subspace
  • 10. 2. Notation and Setup • Subspace based measures of anomalies for each instance 𝑎(𝑖) ∈ 𝑅 𝑑 Illustration of Rank k leverage score/projection distance
  • 11. 2. Notation and Setup • Assumptions 1. Separation assumption In case of degeneracy (i.e., 𝜎 𝑘 2 = 𝜎 𝑘+1 2 )  𝐿 𝑘 𝑖 , 𝑇 𝑘 𝑖 misdefined 2. Approximate low-rank assumption that the top-k principal subspace captures a constant fraction (at least 0.1) of the total variance in the data • Assumption 2 captures the setting where the scores 𝐿 𝑘and 𝑇 𝑘 are most meaningful. • Our algorithms work best on data sets where relatively few principal components explain most of the variance.
  • 12. 3. Guarantees for anomaly detection via sketching • Our main results say that given 𝜇 > 0 and a (k, ∆)-separated matrix A ∈ 𝑅 𝑛×𝑑 with top singular value 𝜎1, any sketch 𝐴 ∈ 𝑅 𝑙×𝑑 satisfying 𝐴 𝑇 𝐴 − 𝐴 𝑇 𝐴 ≤ 𝜇𝜎1 2 or a sketch 𝐴 ∈ 𝑅 𝑛×𝑙 satisfying 𝐴𝐴 𝑇 − 𝐴 𝐴 𝑇 ≤ 𝜇𝜎1 2 can be used to approximate rank k leverage scores and the projection distance from the principal k-dimensional subspace. • Explanation: 1. Equation (2): an approximation to the covariance matrix of the row vectors 2. Equation (3): an approximation to the covariance matrix of the column vectors. (2) (3) d× 𝑑 𝑛 × 𝑛
  • 13. 3. Guarantees for anomaly detection via sketching • Next, we show how to design efficient algorithms for finding anomalies in a streaming fashion. • Pointwise guarantees from row space approximations Algorithm 1: random column projection/ row space approximation • Sketching techs: Frequent Directions sketch [18], row-sampling [19], random column projection, … (𝑛 → 𝑙)
  • 14. 3. Guarantees for anomaly detection via sketching •Pointwise guarantees from row space approximations • Thm 1 improves on the trivial 𝑂(𝑑2) bound of memory/ time  𝑂(𝑑)  𝑂(𝑛𝑑) ,since 𝑙 is independent of d
  • 15. 3. Guarantees for anomaly detection via sketching • Next we show that substantial savings are unlikely for any algorithm with strong pointwise guarantees: there is an Ω(d) lower bound for any approximation
  • 16. 3. Guarantees for anomaly detection via sketching • Average-case guarantees from columns space approximations Even though the sketch gives column space approximations (i.e. 𝐴𝐴 𝑇 approximation satisfying Equ. (3)) Our goal is still to compute the row anomaly scores from 𝐴 𝑇 𝐴 • E.g., by random matrix 𝑹 ∈ 𝑹 𝑑×𝑙 s.t. 𝑟𝑖𝑗 ~ 𝑖.𝑖.𝑑 𝑈{±1}) where 𝑙 = 𝑂(𝑘/𝜇2) [27] sketch 𝑨 = 𝑨𝑹 for which returns the anomaly scores Space consumption • For 𝐴 𝑇 𝐴  𝑂 𝑙2 • For 𝑹 able to be pseudorandom [28-30]  𝑂 log(𝑑)
  • 17. 3. Guarantees for anomaly detection via sketching • Average-case guarantees from columns space approximations • Algorithm 2: random row projection/ column space approximation
  • 18. 3. Guarantees for anomaly detection via sketching • Average-case guarantees from columns space approximations • i.e., on average random projections can preserve leverage scores and distances from the principal subspace • 𝑙 = poly(k, 𝜀−1, ∆), indep. of both n and d.
  • 19. 4. Experimental evaluation • Goal 1. to test whether our algorithms give comparable results to exact anomaly score computation based on full SVD. 2. to determine how large the parameter 𝑙 (determining the size of the sketch) needs to be to get close to the exact scores Note: The experiment is in backend mode (i.e. not online mode)
  • 20. 4. Experimental evaluation • Data: (1) p53 mutants [32], (2) Dorothea [33], (3) RCV1 [34] All available from the UCI Machine Learning Repository, • Ground truth: to decide anomalies • We compute the rand k anomaly scores using a full SVD, and then label the η fraction of points with the highest anomaly scores to be outliers. 1. k chosen by examining the explained variance of the datatset: typically between (10, 125) 2. η chosen by examining the histogram of the anomaly score; typically between (0.01, 0.1)
  • 21. 4. Experimental evaluation • Experimental design • 3 datasets x 2 algorithms x 2 anomaly scores • Parameters (depending on dataset) :  ∴ 3*2*2*3*10=360 combinations  iterate for 5 runs for each combination • Measuring accuracy 1. After computing anomaly scores for each dataset, we then declare the points with the top η’ fraction of scores to be anomalies (without knowing η) 2. Then compute the F1 score Note: we choose the value of η’ which maximizes the F1 score 3. Report the average F1 score over 5 runs 1. k: 3 settings 2. 𝑙: 10 settings ranging (2k, 20k)
  • 22. 4. Experimental evaluation • Performance of dataset: p53 mutants
  • 23. 4. Experimental evaluation • Performance of dataset: Dorothea
  • 24. 4. Experimental evaluation • Performance of dataset: RCV1
  • 25. 4. Experimental evaluation • Takeaways: 1. Taking 𝑙 = Ck with a fairly small C ≈ 10 suffices to get F1 scores > 0.75 in most settings 2. Algorithm 1 generally outperforms Algorithm 2 for a given value of 𝑙
  • 26. 5. Conclusion 1. Any sketch 𝐴 ∈ 𝑅 𝑙×𝑑of A with the property that 𝐴 𝑇 𝐴 − 𝐴 𝑇 𝐴 is small, can be used to additively approximate the 𝐿 𝑘 and 𝑇 𝑘 for each row. We get a streaming algorithm that uses O(d) memory and O(nd) time. 2. Q: Can we get such an additive approximation using memory only 𝑂(𝑑)? 1. The answer is no. 2. Lower bound: must use Ω(d) working space for approximating the outlier scores for every data point
  • 27. 5. Conclusion 3. Using random row random projection, we give a streaming algorithm that can preserve the outlier scores for the rows up to small additive error on average • and preserve most outliers. The space required by this algorithm is only poly(k)log(d). 4. In our experiments, we found that choosing 𝑙 to be a small multiple of k was sufficient to get good results. 1. Our results comparable full-blown SVD using sketches 2. significantly smaller in memory footprint 3. faster to compute and easy to implement
  • 28. • 老師回饋 1. Code on Github? 2. SVD tool package 3. How to apply online-AD