Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Anomaly Detection by ADGM / LVAE


Published on

Naoto Mizuno, PFN Summer Internship 2016

Published in: Technology
  • Be the first to comment

Anomaly Detection by ADGM / LVAE

  1. 1. Anomaly Detection by ADGM / LVAE Naoto Mizuno Mentor : Tanaka-san, Okanohara-san
  2. 2. Introduction • Anomaly detection • Data • NAB Dataset (Artificial) • (Other data are not open to this presentation) • Model • Auxiliary VAE (ADGM) • Ladder VAE • VAE (previous work)
  3. 3. Variational Auto-Encoder (VAE) • We assume that the data 𝑥 are generated from the latent variables 𝑧. • We use neural network as encoder and decoder. x x zz 𝑞$(𝑧|𝑥) 𝑝)(𝑥|𝑧) Encoder Decoder Data Latent Variable
  4. 4. VAE • We use lower bound of log 𝑝) 𝑥 as loss function. log 𝑝) 𝑥 ≥ 𝐸9: 𝑧 𝑥 log 𝑝) 𝑥, 𝑧 𝑞$ 𝑧 𝑥 𝑝) 𝑥, 𝑧 = 𝑝) 𝑥|𝑧 𝑝) 𝑧 • 𝑝) 𝑧 : Standard normal distribution • In training, 𝑧 is chosen from 𝑞$ 𝑧 𝑥 .
  5. 5. ADGM • Semi-supervised Learning • Detect label 𝑦 and reconstruct data 𝑥. • Auxiliary variable increase the flexibility of the model. x a z y Data Latent Variable Auxiliary Variable Label x a z y x a z y SDGM
  6. 6. Objective function of ADGM • For labeled data • Lower bound + classification loss 𝐿 𝑥, 𝑦 = −𝐸9: 𝑎, 𝑧 𝑥, 𝑦 log 𝑝) 𝑥, 𝑦, 𝑎, 𝑧 𝑞$ 𝑎, 𝑧 𝑥, 𝑦 − 𝛼𝐸9: 𝑎 𝑥 log 𝑞$ (𝑦|𝑎, 𝑥) • For unlabeled data 𝑈 𝑥 = −𝐸9: 𝑎, 𝑦, 𝑧 𝑥 log 𝑝) 𝑥, 𝑦, 𝑎, 𝑧 𝑞$ 𝑎, 𝑦, 𝑧 𝑥 • Total 𝐽 = J 𝐿(𝑥K, 𝑦K) MN,ON + J 𝑈(𝑥Q) MR
  7. 7. ADGM for MNIST • Semi-supervised learning • 100 labeled, 60000 unlabeled • Test error ADGM : 0.96 % SDGM : 1.32% • Generate image • Choosing 𝑧 from Gaussian • Generate with each 𝑦 SDGM Without auxiliary variable
  8. 8. Auxiliary VAE • Unsupervised Learning • Several sampling layers (1 or 2) x z a a z x z a a z
  9. 9. Ladder VAE • Several sampling layers (~5) • VAE with several sampling layers is difficult to train. • Sharing the information between decoder and encoder. x x zd d z d z
  10. 10. Ladder VAE • Encoder use decoder output as prior. 𝜎9 T = 1 𝜎V9 WT + 𝜎X WT 𝜇9 = 𝜇V9 𝜎V9 WT + 𝜇X 𝜎X WT 𝜎V9 WT + 𝜎X WT z d 𝜇X, 𝜎X Prior Likelihood Posterior Sampling 𝜇V9, 𝜎V9 𝜇9, 𝜎9
  11. 11. Anomaly detection • Model is trained without anomaly data. • Model cannot reconstruct anomaly data . 𝐴𝑛𝑜𝑚𝑎𝑙𝑦 𝑆𝑐𝑜𝑟𝑒 = log 𝐸 log 𝑝)(𝑥|𝑧) • MNIST with noise
  12. 12. NAB Dataset (Artificial) • We convert raw data to spectrogram. • Spectrogram : the amplitudes at a particular frequency and time. • Input : the amplitudes at a time. raw data anomalyspectrogram single input time frequency
  13. 13. NAB • Scores increase at anomaly. train test ADGM LVAE
  14. 14. NAB • In this case models cannot detect anomaly. • Small input value tends to result in small score. train test ADGM LVAE
  15. 15. Conclusion • Anomaly detection using ADGM / LVAE. • Anomaly is detected as low probability data. • Performances are almost same as VAE. • Many sampling layers are better (?)