06 ashish mahabal bse2

BigSkyEarth II:
Feature extraction from time-series data
Ashish Mahabal
Center for Data-Driven Discovery, Caltech
5 April 2016

Ashish Mahabal - BSE II
Transformations in astronomy
Shallow to deep; Small to large; Sporadic to repeated; more wavelengths
Move towards
digital movies!
Finding
transients
Doing new
science2014-06-16 Ashish Mahabal
3

Something that has a large brightness
change (delta-magnitude) within a short
timespan (small delta-time)
What is a transient?
4

Phenomenological variety
Flare Star Dwarf Nova Blazar
Catalina Real-time Transient Survey (CRTS)
5

Supernova from SN Hunt
6

Binary Black-holes
PG 1302-102
Graham et al.
7

Variability Tree
8

irregular light curves●●●●
●
● ●
●
●
●
16
17
18
19
20
55250 55500 55750 56000
Mean Julian Date
Magnitude
9

GPR fitted to light curves
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
19
20
21
0 1000 2000
Day
Magnitude
AGN
●
●
●
●
●
●
●
●
●
19.0
19.5
20.0
20.5
21.0
0 1000 2000
Day
Magnitude
SN
●
●
●
●
●●●●
●●●●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●●
●●
● ●●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●
●●
●●
14
16
18
20
0 1000 2000
Day
Magnitude Flare
●●●
● ●●●●●●●● ●
●
●● ●●
●
● ●●●●
●
●●●●●
●●●●●●●●●● ●●
●
● ● ●●
●●●●●●●●
16
17
18
19
20
21
0 1000 2000
Day
Magnitude
Non Transient
10

A
separate
Kepler
AGN
CRTS
AGN
Juxtaposition of surveys to get training sets for newer
ones
11

Regular versus irregular
• Financial time series and predictions
• Variety of tools
• Statistical features
12

Challenge: A Variety of Parameters
•  Discovery: magnitudes, delta-magnitudes
•  Contextual:
–  Distance to nearest star
–  Magnitude of the star
–  Color of that star
–  Normalized distance to nearest galaxy
–  Distance to nearest radio source
–  Flux of nearest radio source
–  Galactic latitude
•  Follow-up
–  Colors (g-r, r-I, i-z etc.)
•  Prior classifications (event type)
•  Characteristics from light-curve
–  Amplitude
–  Median buffer range percentage
–  Standard deviation
–  Stetson k
–  Flux percentile ratio mid80
–  Prior outburst statistic
Not all parameters are
always present leading to
swiss-cheese like data
sets
http://ki-media.blogspot.com/
Measures from Feigelson and Babu
(Graham)
New lightcurve-based parameters:
(Faraway)
• Whole curve measures
• Fitted curve measures
• Residual from fit measures
• Cluster measures
• Other
13

beyond1std
skew
Amplitude
freq_signif
freq_varrat
freq_y_offset
freq_model_max_delta_mag
freq_model_min_delta_mag
freq_model_phi1_phi2
freq_rrd
freq_n_alias
flux_%_mid20
flux_%_mid35
flux_%_mid50
flux_%_mid65
flux_%_mid80
linear_trend
max_slope
MAD
median_buffer_range_percentage
pair_slope_trend
percent_amplitude
percent_difference_flux_percentile
QSO
non_QSO
std
small_kurtosis
stetson_j
stetson_k
scatter_res_raw
p2p_scatter_2praw
p2p_scatter_over_mad
p2p_scatter_pfold_over_mad
medperc90_p2_p
fold_2p_slope_10%
fold_2p_slope_90%
p2p_ssqr_diff_over_var
Many features
- not all are independent
Adam Miller
15 Jan 2015 Ashish Mahabal 20
14

Importance of cadence
• Cadence wars with the LSST
• Asteroids versus fast transients versus periodic
objects versus cosmology
• Constraints due to promises made
• Mechanical constraints
15

continuing cadence meetings
Sensitivity (visit? coadd?) by filter (especially u and g), needed for several
(many? all?) variable types
Phased uniformity (periodic variables): for a given period how uniformly
would the lightcurve be sampled?*
Window function (per filter/all filters) FWHM, ...
statistics of revisit time histogram (per filter/all filters) e.g. min/max/
median/5th & 95th percentiles
Hour angle distribution (to check aliasing), at a given sky position,
maximum difference, rms ...
17

Follow-up a huge issue
• Depth is a big difference: faint sources
• Characterization/ML needed: part of it only from LSST
• A lot of LSST science may get done before LSST
• Synergy with GMT/TMT
• LSST as a follow-up device in the days of aLIGO
18

Optimization more than in Tzolk’in
Rohit Gawande
Victory points == science
Temples
Technologies
Currency
Buildings
Resources
Large number of variables
and each player wants to
win.
19

Optimizing is (generally) a zero-sum game
Easy to make the survey “greatest” in one science
Optimization means compromise
BUT, the sum of parts is GREATER than the whole i.e.
compromise does NOT mean sacriﬁce
In other words, the players are NOT playing AGAINST each other
It’s the best middle ground we are seeking
LSST is its own follow-up machine in a proactive way.
By coming up with a good cadence we can minimize the follow-up needed.
And you can help. And get the science you love done in the process.
20

CRTS light-curves
• 500M, most with hundreds of points over 10 years
• Most are non-variable (as a rule)
• Processing “done” for all SSS (100M+)
• Great training set for many procedures
21

Caltech service
• Not for mass-computation: http://nirgun.caltech.edu:
8000/scripts/description.html
• Mass processing: xsede, IUCAA, other …
22

Faraway, Mahabal et al. methods
Recursive Partitioning
Param n
Type n
Numbers/
names not
for
reading J Faraway
10/17/2014 Ashish Mahabal, IACS 58
23

domain knowledge • Peakiness - SNe separator
• Period ﬁnding variations
• Detailed features for a subset
Non-SNe (1) SNe
(2)
1
2
2
2
1
1
Using 900 non-SNe and 600
SNe
80-90% completeness
using just these
parameters
Mahabal, Ball
24

dimensionality reduction
Feature selection strategies
Donalek et al. arXiv:1310.1976
•  Fast Relief Algorithm (wt and
threshold)
•  Fisher Discriminant Ratio
•  Correlation based Feature
Selection
•  Fast Correlation Based Filter
•  Multi Class Feature Selection
25

Using features for clustering
• Using similarity of features to seed larger unknown
sets
• Exercise based on that …
• Deep learning?
Open
questions
26

Augmenting light-curves
with newer points
• Millions of sources observed each night
• Appending those points to all those light-curves
• Recomputing stats and follow-up characterizations
Open
questions
27

Summary
• Astronomical time-series tend to be more gappy,
heteroskedastic, diverse
• A great variety of statistical and domain-based
features extractable
• Many challenges and interesting problems remain
28

06 ashish mahabal bse2

Recommended

Recommended

More Related Content

Similar to 06 ashish mahabal bse2

Similar to 06 ashish mahabal bse2 (16)

Recently uploaded

Recently uploaded (20)

06 ashish mahabal bse2