Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Autoencoding RNN for inference on unevenly sampled time-series data

For the past decade, feature-engineering-based approaches applied to the discovery of transients and the characterization of tens of thousands of variable stars led the way to novel astronomical inference. Here I will show that new auto-encoder recurrent neural network architectures, without hand-crafted features, rival those traditional methods. Autonomous discovery and inference are part of a larger worldwide onus to federate precious (and heterogeneous) follow-up resources to maximize our collective scientific returns.

  • Login to see the comments

Autoencoding RNN for inference on unevenly sampled time-series data

  1. 1. Josh Bloom UC Berkeley Astronomy @profjsb Autoencoding RNN for inference on unevenly sampled time-series data Data Driven Discovery Investigator Workshop on Applying Advanced AI Workflows In Astronomy and Microscopy 11 Sept 2018 (UCSC, Santa Clara)
  2. 2. Discovery in images: Real or spurious sources? (Ever) Increasing need for ML methods in Time-Domain Astronomy Bloom+12, Goldstein+16, … Inference: What is this event and is it worth following up? Levitan+14 Surrogate modelling & parameter estimation Supernova (Thomas/Nugent); Exoplanets (Ford+11)
  3. 3. Supernova Discovery in the Pinwheel Galaxy 11 hr after explosion nearest SN Ia in >3 decades ML-assisted discovery ©Peter Nugent Nugent+11, Li, Bloom+12, Bloom+12…
  4. 4. Probabilistic Classification of 50k+ Variable Stars Shivvers,JSB,Richards MNRAS,2014 106 “DEB” candidates 12 new mass-radii 15 “RCB/DYP”
 candidates 8 new discoveries Triple # of Galactic DYPer Stars Miller, Richards, JSB,..ApJ 2012 5400 Spectroscopic Targets Miller, JSB, Richards,..ApJ 2015 Turn synoptic imagers into ~spectrographs
  5. 5. Challenges with Traditional ("Hand-Crafted Featurization") Approaches • Feature engineering is expensive (people/compute), needs a lot of domain knowledge • "Small data" domain with only 1000s of labelled training examples • Traditional ML techniques don't account for feature uncertainty • Ideally would like to learn on one survey and apply that knowledge to another (e.g., ASAS→ZTF→LSST) https://github.com/cesium-ml/cesium
  6. 6. 1. Build an autoencoder network to learn to reproduce irregularly sampled light curves using an information bottleneck (B) E( (→ B D→ ( ( ≈ 2. Use B as features and learn a traditional classifier (random forest)
  7. 7. len(B) = 64 Example Reconstructions of the Autoencoder
  8. 8. Bottleneck clearly learns important features underlying the "physics" that generates the data
  9. 9. Results rival best-in-class approaches Code/Data: https://github.com/bnaul/IrregularTimeSeriesAutoencoderPaper
  10. 10. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2 • Natively handles irregularly sampling Novelties & Improvements
  11. 11. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2 • Natively handles irregularly sampling • Learning loss accounts for uncertainty Novelties & Improvements
  12. 12. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2 • Natively handles irregularly sampling • Learning loss accounts for uncertainty • Natural data augmentation with bootstrap resampling Novelties & Improvements
  13. 13. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2 • unsupervised feature learning → leverage large corpus of unlabelled light curves Novelties & Improvements
  14. 14. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2 • unsupervised feature learning → leverage large corpus of unlabelled light curves • transfer learning appears to work Novelties & Improvements
  15. 15. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2 • unsupervised feature learning → leverage large corpus of unlabelled light curves • transfer learning appears to work • learning scales linearly in training examples Novelties & Improvements
  16. 16. Extensions/Active Research • Anomaly detection (on the bottleneck features) • Hyperspectral topology UMAP applied to L2-normed autoencoder for MNIST Ellie Schwab Abrahams Also, with Sara Jamal
  17. 17. • New layer types: explore Temporal Convnet (TCNs) • Co-training across surveys • Semi-supervised topology + metadata Loss ~ Lts + λ Lclass Source Metadata Source Time series Bottleneck Unsupervised SupervisedClassification Time series Reconstruction FC LSTM LSTM Extensions/Active Research Ellie Schwab Abrahams Also, with Sara Jamal
  18. 18. Josh Bloom UC Berkeley Astronomy @profjsb Autoencoding RNN for inference on unevenly sampled time-series data Data Driven Discovery Investigator Thanks! Workshop on Applying Advanced AI Workflows In Astronomy and Microscopy 11 Sept 2018 (UCSC, Santa Clara)
  19. 19. 50k variables, 810 with known labels (timeseries, colors) Challenge: classification on large sets Richards+11, 12

×