Discovering exoplanets with Deep Leaning

© Cloudera, Inc. All rights reserved.
DISCOVERING EXOPLANETS
WITH DEEP LEARNING & CDSW
Rafael Arana – Senior Solutions Architect

© Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved.
“WE ARE JUST AN ADVANCED BREED OF
MONKEYS ON A MINOR PLANET OF A
VERY AVERAGE STAR.
BUT WE CAN UNDERSTAND THE
UNIVERSE. THAT MAKE US SOMETHING
VERY SPECIAL”
Stephen Hawking

DISCLAIMER #1
https://github.com/google-research/exoplanet-ml

DISCLAIMER #2
I’m not a Data Scientist

KEPLER
NASA’s First Mission Capable of Finding Earth-Size Planets

KEPLER DATA SET
150K 35K
3735
Possible planetary signals
Confirmed planets
Stars
614Multi Planet systems

THE KEPLER DATA SET
Threshold Crossing Events (TCEs)

DATA PREPARATION & FEATURE ENGINEERING

NORMALIZE YOUR INPUT DATA

.

After diving the flux by the median per segment
• Diving the flux by the
median per segment
• Normalize to 1

SCRUBBING
“Fix" bad examples by removing them from the data set
• We've assumed that all the data used for training and testing was trustworthy.
• In real-life, many examples in data sets are unreliable due to one or more of
the following minimize the cross-entropy error function over the training set
• Omitted values.
• Duplicate examples. For example, a server mistakenly uploaded the same logs twice.
• Bad labels. For instance, an astronomer mislabeled an event as a planet
• Bad feature values. For example, someone typed in an extra digit

OUTLIERS DATA POINTS
Remove all points over 3 times the deviation

BINNING, FOLD & SPLINE
Removing the noise

SPLITTING THE DATA SET
Training, test and validation

STANDARD TENSOR FLOWFILE FORMAT
TFRecords
• The recommended format for TensorFlow is a TFRecords file containing
tf.train.Example protocol buffers (which contain Features as a field).
• Optimized for large datasets
• It reads in memory only the data required for each batch

MODELING

MODELING
1st approach - Fully connected neural network (FCC)

MODELING
2nd approach - Convolutional neural network (CNN)

MODELING
Combining the two sets of input features

ARCHITECTURE
• Naming conventions:
• Convolutional layers
• conv [kernel size]n‐[number of feature map]̃
• Max pooling layers
• maxpool [window length]n‐[stride length]
• Fully connected layers
• FC-[number of units]

EVALUATION – NETWORK PERFORMANCE

MODEL ANALYSIS
Metrics to assess our model’s performance.
• Precision: the fraction of signals classified as planets that are true planets
(also known as reliability; see, e.g., S. E. Thompson et al. 2017, in preparation).
• Recall: the fraction of true planets that are classified as planets (also known
as completeness).
• Accuracy: the fraction of correct classifications.
• AUC: the area under the receiver operating characteristic curve, which is
equivalent to the probability that a randomly selected planet is scored higher
than a randomly selected false positive.

METRICS
Assessing models performance
• Precision:
• fraction of signals classified as planets that are
true.
• Recall:
• the fraction of true planets that are classified as
planets (also known as completeness).
• Accuracy:
• the fraction of correct classifications.
• AUC:
• the area under the receiver operating characteristic
curve, which is equivalent to the probability that a
randomly selected planet is scored higher than a
randomly selected false positive.

TENSORBOARD
Assessing models performance
• From a Terminal run:
• tensorboard --port 8080 --logdir /home/cdsw/model_checkpoint

TENSORFLOW CHECKPOINTS
• Critical when you start training larger again
• They allow you to continue training, resume on failure, and predict from a train
model.
• Save: Specify a folder, when you instantiate the model and checkpoints will be
saved there periodically.
• Restore: Specify a folder when you instantiated, if a checkpoint is found there
it is loaded, and the estimator is ready for predictions.
• If you want to restart from scratch, just delete this folder

NETWORK OPTIMIZATION

OPTIMIZATION TECHNIQUES
• Adam optimization algorithm
• minimize the cross-entropy error function over the training set
• Data augmentation
• We augmented our training data by applying random horizontal reflections to the light
curves during training
• Dropout regularization to the fully connected layers
• which helps prevent over fitting by randomly “dropping” some of the output neurons from
each layer during training to prevent the model from becoming overly reliant on any of its
features

TENSORFLOW OPTIMIZERS
• Implementations
• tf.train.MomentumOptimizer( momentum, use nesterov)
• tf.train.GradientDescentOptimizer( learning rate )
• tf.train.AdagradOptimizer (learning rate )
• tf.train.AdamOptimizer (learning rate)
• tfRMSPropOptimizer: learning rate
• TPU:
• tf.contrib.tpu.CrossShardOptimizer(optimizer)
• Clip Gradient Norms:
• tf.contrib.training.clip_gradient_norms_fn(max_norm)

DROPOUT
Techniques to prevent over fitting

DROPOUT
• Adjust dropout per layers
• Initial Layers normally have more hidden units
• The more hidden units you have more over fitting
• Apply more dropout
• Can be applied on the Conv Layers or the FC layers
• Don’t use it during evaluation (test set)
• We want predictability
• Downside:
• The noise of dropout avoid that the Cost Function (J) decrease in every step.
• Healthcheck: Disable dropout, check it drops constantly, enable again

GOOGLE-VIZIER
A Google internal Service for Black-Box Optimization
• Automatically tune the hyperparameters
• input representations (e.g., number of bins, bin width)
• model architecture (e.g., number of fully connected layers, number
of convolutional layers, kernel size)
• and training (e.g., dropout probability).
• Each Vizier “study” trained several thousand models to find the
hyperparameter
• Each model was trained on a single central processing unit (CPU)
• Used 100 CPUs per study to train individual models in parallel

KEPLER 90
The star known to host the most planets
https://www.nasa.gov/image-feature/ames/kepler-90-system-planet-sizes

DEMO TIME

Discovering exoplanets with Deep Leaning

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to Discovering exoplanets with Deep Leaning

Similar to Discovering exoplanets with Deep Leaning (20)

Recently uploaded

Recently uploaded (20)

Discovering exoplanets with Deep Leaning