Autoencoding RNN for inference on unevenly sampled time-series data

•

2 likes•620 views

For the past decade, feature-engineering-based approaches applied to the discovery of transients and the characterization of tens of thousands of variable stars led the way to novel astronomical inference. Here I will show that new auto-encoder recurrent neural network architectures, without hand-crafted features, rival those traditional methods. Autonomous discovery and inference are part of a larger worldwide onus to federate precious (and heterogeneous) follow-up resources to maximize our collective scientific returns.

Science

Josh Bloom
UC Berkeley Astronomy
@profjsb
Autoencoding RNN for inference on
unevenly sampled time-series data
Data Driven Discovery Investigator
Workshop on Applying Advanced AI Workﬂows
In Astronomy and Microscopy
11 Sept 2018 (UCSC, Santa Clara)

Discovery in images:
Real or spurious sources?
(Ever) Increasing need for ML methods
in Time-Domain Astronomy
Bloom+12, Goldstein+16, …
Inference: What is
this event and is it
worth following up?
Levitan+14
Surrogate modelling &
parameter estimation
Supernova (Thomas/Nugent);
Exoplanets (Ford+11)

Supernova Discovery in the Pinwheel Galaxy
11 hr after explosion
nearest SN Ia in >3 decades
ML-assisted discovery
©Peter Nugent
Nugent+11, Li, Bloom+12, Bloom+12…

Probabilistic Classification of
50k+ Variable Stars
Shivvers,JSB,Richards MNRAS,2014
106 “DEB” candidates
12 new
mass-radii
15 “RCB/DYP” 
candidates
8 new discoveries
Triple # of
Galactic
DYPer Stars
Miller, Richards, JSB,..ApJ 2012
5400
Spectroscopic
Targets
Miller, JSB, Richards,..ApJ 2015
Turn synoptic
imagers into
~spectrographs

Challenges with Traditional ("Hand-Crafted Featurization")
Approaches
• Feature engineering is expensive (people/compute), needs
a lot of domain knowledge
• "Small data" domain with only 1000s of labelled training
examples
• Traditional ML techniques don't account for feature
uncertainty
• Ideally would like to learn on one survey and apply that
knowledge to another (e.g., ASAS→ZTF→LSST)
https://github.com/cesium-ml/cesium

1. Build an autoencoder network to
learn to reproduce irregularly sampled
light curves using an information
bottleneck (B)
E( (→
B
D→ ( ( ≈
2. Use B as features and learn a
traditional classifier (random forest)

len(B) = 64
Example Reconstructions
of the Autoencoder

Bottleneck clearly learns
important features
underlying the "physics"
that generates the data

Results rival best-in-class approaches
Code/Data: https://github.com/bnaul/IrregularTimeSeriesAutoencoderPaper

Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser
data. This network uses two RNN layers (speciﬁcally, bidirectional gated recurrent units (GRU) [6, 2
• Natively handles
irregularly sampling
Novelties & Improvements

Extensions/Active Research
• Anomaly detection (on the bottleneck features)
• Hyperspectral topology
UMAP applied to
L2-normed autoencoder
for MNIST
Ellie Schwab Abrahams
Also, with Sara Jamal

• New layer types: explore Temporal Convnet (TCNs)
• Co-training across surveys
• Semi-supervised topology + metadata
Loss ~ Lts + λ Lclass
Source
Metadata
Source
Time series
Bottleneck
Unsupervised
SupervisedClassification
Time series
Reconstruction
FC
LSTM
LSTM
Extensions/Active Research
Ellie Schwab Abrahams
Also, with Sara Jamal

50k variables, 810 with known labels (timeseries, colors)
Challenge: classification on large sets
Richards+11, 12

What's hot

Detecting solar farms with deep learningJason Brown

FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkRob Emanuele

Bioclouds CAMDA (Robert Grossman) 09-v9pRobert Grossman

LocationTech ProjectsJody Garnett

Big Data for Big DiscoveriesGovnet Events

OCC Overview OMG Clouds Meeting 07-13-09 v3Robert Grossman

Climate data in r with the raster packageAlberto Labarga

Automatic Features Generation And Model Training On Spark: A Bayesian ApproachSpark Summit

SSD: Single Shot MultiBox Detector (UPC Reading Group)Universitat Politècnica de Catalunya

Detectionsimplyinsimple

Super COMPUTING JournalPandey_G

Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind RaoSpark Summit

Secondary Spectrum Usage for Mobile DevicesAmjed Majid

February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network

Mask R-CNNJaehyun Jun

Faster R-CNNanna8885

Advanced deep learning based object detection methodsBrodmann17

Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Jen Aman

Of Sampling and Smoothing: Approximating Distributions over Linked Open DataThomas Gottron

Artificial Neural Networks for Storm Surge Prediction in North CarolinaAnton Bezuglov

What's hot (20)

Detecting solar farms with deep learning

FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark

Bioclouds CAMDA (Robert Grossman) 09-v9p

LocationTech Projects

Big Data for Big Discoveries

OCC Overview OMG Clouds Meeting 07-13-09 v3

Climate data in r with the raster package

Automatic Features Generation And Model Training On Spark: A Bayesian Approach

SSD: Single Shot MultiBox Detector (UPC Reading Group)

Detection

Super COMPUTING Journal

Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao

Secondary Spectrum Usage for Mobile Devices

February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics

Mask R-CNN

Faster R-CNN

Advanced deep learning based object detection methods

Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...

Of Sampling and Smoothing: Approximating Distributions over Linked Open Data

Artificial Neural Networks for Storm Surge Prediction in North Carolina

Similar to Autoencoding RNN for inference on unevenly sampled time-series data

Computational Training and Data Literacy for Domain ScientistsJoshua Bloom

Data Science Education: Needs & Opportunities in AstronomyJoshua Bloom

The Emerging Cyberinfrastructure for Earth and Ocean SciencesLarry Smarr

Identifying Exoplanets with Machine Learning Methods: A Preliminary StudyIJCI JOURNAL

Astronomical Data Processing on the LSST Scale with Apache SparkDatabricks

IRJET- Deep Convolution Neural Networks for Galaxy Morphology ClassificationIRJET Journal

Cyberinfrastructure to Support Ocean ObservatoriesLarry Smarr

(Research Note) Delving deeper into convolutional neural networks for camera ...Jacky Liu

Computational Training for Domain Scientists & Data LiteracyJoshua Bloom

ExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training AlgorithmsIRJET Journal

Science and Cyberinfrastructure in the Data-Dominated EraLarry Smarr

[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...NAVER Engineering

AstroCV: A computer vision library for AstronomyRoberto Muñoz

myashar_research_2016Mark Yashar

Ieee 2016 nss mic poster N30-21Dae Woon Kim

Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...Globus

Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...PyData

$Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...$ $Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...$

Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...EarthCube

120_SEM_Special_Topics.pptzaki194502

BurstCube Poster Final DraftYkeshia Zamore

Similar to Autoencoding RNN for inference on unevenly sampled time-series data (20)

Computational Training and Data Literacy for Domain Scientists

Data Science Education: Needs & Opportunities in Astronomy

The Emerging Cyberinfrastructure for Earth and Ocean Sciences

Identifying Exoplanets with Machine Learning Methods: A Preliminary Study

Astronomical Data Processing on the LSST Scale with Apache Spark

IRJET- Deep Convolution Neural Networks for Galaxy Morphology Classification

Cyberinfrastructure to Support Ocean Observatories

(Research Note) Delving deeper into convolutional neural networks for camera ...

Computational Training for Domain Scientists & Data Literacy

ExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training Algorithms

Science and Cyberinfrastructure in the Data-Dominated Era

[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...

AstroCV: A computer vision library for Astronomy

myashar_research_2016

Ieee 2016 nss mic poster N30-21

Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...

Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...

$Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...$ $Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...$

Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...

120_SEM_Special_Topics.ppt

BurstCube Poster Final Draft

Recently uploaded

CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823

Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009

Animal Communication- Auditory and Visual.pptxUmerFayaz5

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani

Nanoparticles synthesis and characterization kaibalyasahoo82800

Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav

Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari

Zoology 4th semester series (krishna).pdfSumit Kumar yadav

Isotopic evidence of long-lived volcanism on IoSérgio Sacani

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314

Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal

Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani

Biological Classification BioHack (3).pdfmuntazimhurra

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823

Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani

Orientation, design and principles of polyhousejana861314

Recently uploaded (20)

CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡

Presentation Vikram Lander by Vedansh Gupta.pptx

Animal Communication- Auditory and Visual.pptx

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...

Nanoparticles synthesis and characterization

Botany 4th semester file By Sumit Kumar yadav.pdf

Recombinant DNA technology (Immunological screening)

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...

Zoology 4th semester series (krishna).pdf

Isotopic evidence of long-lived volcanism on Io

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...

Spermiogenesis or Spermateleosis or metamorphosis of spermatid

Grafana in space: Monitoring Japan's SLIM moon lander in real time

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...

Biological Classification BioHack (3).pdf

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...

Hubble Asteroid Hunter III. Physical properties of newly found asteroids

Orientation, design and principles of polyhouse

Autoencoding RNN for inference on unevenly sampled time-series data

1. Josh Bloom UC Berkeley Astronomy @profjsb Autoencoding RNN for inference on unevenly sampled time-series data Data Driven Discovery Investigator Workshop on Applying Advanced AI Workﬂows In Astronomy and Microscopy 11 Sept 2018 (UCSC, Santa Clara)

2. Discovery in images: Real or spurious sources? (Ever) Increasing need for ML methods in Time-Domain Astronomy Bloom+12, Goldstein+16, … Inference: What is this event and is it worth following up? Levitan+14 Surrogate modelling & parameter estimation Supernova (Thomas/Nugent); Exoplanets (Ford+11)

3. Supernova Discovery in the Pinwheel Galaxy 11 hr after explosion nearest SN Ia in >3 decades ML-assisted discovery ©Peter Nugent Nugent+11, Li, Bloom+12, Bloom+12…

4. Probabilistic Classification of 50k+ Variable Stars Shivvers,JSB,Richards MNRAS,2014 106 “DEB” candidates 12 new mass-radii 15 “RCB/DYP”  candidates 8 new discoveries Triple # of Galactic DYPer Stars Miller, Richards, JSB,..ApJ 2012 5400 Spectroscopic Targets Miller, JSB, Richards,..ApJ 2015 Turn synoptic imagers into ~spectrographs

5. Challenges with Traditional ("Hand-Crafted Featurization") Approaches • Feature engineering is expensive (people/compute), needs a lot of domain knowledge • "Small data" domain with only 1000s of labelled training examples • Traditional ML techniques don't account for feature uncertainty • Ideally would like to learn on one survey and apply that knowledge to another (e.g., ASAS→ZTF→LSST) https://github.com/cesium-ml/cesium

6. 1. Build an autoencoder network to learn to reproduce irregularly sampled light curves using an information bottleneck (B) E( (→ B D→ ( ( ≈ 2. Use B as features and learn a traditional classifier (random forest)

7. len(B) = 64 Example Reconstructions of the Autoencoder

8. Bottleneck clearly learns important features underlying the "physics" that generates the data

9. Results rival best-in-class approaches Code/Data: https://github.com/bnaul/IrregularTimeSeriesAutoencoderPaper

10. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (speciﬁcally, bidirectional gated recurrent units (GRU) [6, 2 • Natively handles irregularly sampling Novelties & Improvements

11. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (speciﬁcally, bidirectional gated recurrent units (GRU) [6, 2 • Natively handles irregularly sampling • Learning loss accounts for uncertainty Novelties & Improvements

12. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (speciﬁcally, bidirectional gated recurrent units (GRU) [6, 2 • Natively handles irregularly sampling • Learning loss accounts for uncertainty • Natural data augmentation with bootstrap resampling Novelties & Improvements

13. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (speciﬁcally, bidirectional gated recurrent units (GRU) [6, 2 • unsupervised feature learning → leverage large corpus of unlabelled light curves Novelties & Improvements

14. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (speciﬁcally, bidirectional gated recurrent units (GRU) [6, 2 • unsupervised feature learning → leverage large corpus of unlabelled light curves • transfer learning appears to work Novelties & Improvements

15. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (speciﬁcally, bidirectional gated recurrent units (GRU) [6, 2 • unsupervised feature learning → leverage large corpus of unlabelled light curves • transfer learning appears to work • learning scales linearly in training examples Novelties & Improvements

16. Extensions/Active Research • Anomaly detection (on the bottleneck features) • Hyperspectral topology UMAP applied to L2-normed autoencoder for MNIST Ellie Schwab Abrahams Also, with Sara Jamal

17. • New layer types: explore Temporal Convnet (TCNs) • Co-training across surveys • Semi-supervised topology + metadata Loss ~ Lts + λ Lclass Source Metadata Source Time series Bottleneck Unsupervised SupervisedClassification Time series Reconstruction FC LSTM LSTM Extensions/Active Research Ellie Schwab Abrahams Also, with Sara Jamal

18. Josh Bloom UC Berkeley Astronomy @profjsb Autoencoding RNN for inference on unevenly sampled time-series data Data Driven Discovery Investigator Thanks! Workshop on Applying Advanced AI Workﬂows In Astronomy and Microscopy 11 Sept 2018 (UCSC, Santa Clara)

19.

20. 50k variables, 810 with known labels (timeseries, colors) Challenge: classification on large sets Richards+11, 12

Autoencoding RNN for inference on unevenly sampled time-series data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Autoencoding RNN for inference on unevenly sampled time-series data

Similar to Autoencoding RNN for inference on unevenly sampled time-series data (20)

More from Joshua Bloom

More from Joshua Bloom (6)

Recently uploaded

Recently uploaded (20)

Autoencoding RNN for inference on unevenly sampled time-series data