The document discusses using machine learning techniques to classify astronomical objects from large surveys. It notes that surveys are producing huge amounts of data that conventional methods cannot fully process. Machine learning can be used to help classify objects and sort candidates. Specifically, the document discusses using machine learning on photometric data from the Sloan Digital Sky Survey (SDSS) to identify low-redshift quasars. It notes challenges including the large size and dimensionality of the data, and proposes using a boosted ensemble method to learn weights for different regions of feature space rather than trying to estimate probabilities. This would help classify objects from the SDSS into categories like quasars, stars or galaxies.
6.
Machine Learning
● Objective : Mimic human ability to learn and make
decisions.
● Need : Surveys are producing huge data and
conventional methods are insufficient to process
them.
Eg: SDSS survey could spectroscopically confirm the nature of
only less than 1% of its photometric detections.
● Limitation : The goodness of the outputs depend
on the discriminative power of the inputs.
● Advantage : First level sorting of candidates.
7.
Machine Learning
● Objective : Mimic human ability to learn and make
decisions.
● Need : Surveys are producing huge data and
conventional methods are insufficient to process
them.
Eg: SDSS survey could spectroscopically confirm the nature of
only less than 1% of its photometric detections.
● Limitation : The goodness of the outputs depend
on the discriminative power of the inputs.
● Advantage : First level sorting of candidates.
8.
Machine Learning
● Objective : Mimic human ability to learn and make
decisions.
● Need : Surveys are producing huge data and
conventional methods are insufficient to process
them.
Eg: SDSS survey could spectroscopically confirm the nature of
only less than 1% of its photometric detections.
● Limitation : The goodness of the outputs depend
on the discriminative power of the inputs.
● Advantage : First level sorting of candidates.
9.
Machine Learning
● Objective : Mimic human ability to learn and make
decisions.
● Need : Surveys are producing huge data and
conventional methods are insufficient to process
them.
Eg: SDSS survey could spectroscopically confirm the nature of
only less than 1% of its photometric detections.
● Limitation : The goodness of the outputs depend
on the discriminative power of the inputs.
● Advantage : First level sorting of candidates.
10.
Machine Learning
● Objective : Mimic human ability to learn and make
decisions.
● Need : Surveys are producing huge data and
conventional methods are insufficient to process
them.
Eg: SDSS survey could spectroscopically confirm the nature of
only less than 1% of its photometric detections.
● Limitation : The goodness of the outputs depend
on the discriminative power of the inputs.
● Advantage : First level sorting of candidates.
11.
Large Data Issues
● Multidimensional data – requires more memory
● Diversity in the data – rare to rich populations
● Overlapping features – observational limitations
● Missing values – observational limitations
● Uncertainties – inherent and observational
● Processing Power – silicon limitations
● Storage and retrieval – bandwidth limitations
12.
Large Data Issues
● Multidimensional data – requires more memory
● Diversity in the data – rare to rich populations
● Overlapping features – observational limitations
● Missing values – observational limitations
● Uncertainties – inherent and observational
● Processing Power – silicon limitations
● Storage and retrieval – bandwidth limitations
13.
Large Data Issues
● Multidimensional data – requires more memory
● Diversity in the data – rare to rich populations
● Overlapping features – observational limitations
● Missing values – observational limitations
● Uncertainties – inherent and observational
● Processing Power – silicon limitations
● Storage and retrieval – bandwidth limitations
14.
Large Data Issues
● Multidimensional data – requires more memory
● Diversity in the data – rare to rich populations
● Overlapping features – observational limitations
● Missing values – observational limitations
● Uncertainties – inherent and observational
● Processing Power – silicon limitations
● Storage and retrieval – bandwidth limitations
15.
Large Data Issues
● Multidimensional data – requires more memory
● Diversity in the data – rare to rich populations
● Overlapping features – observational limitations
● Missing values – observational limitations
● Uncertainties – inherent and observational
● Processing Power – silicon limitations
● Storage and retrieval – bandwidth limitations
16.
Large Data Issues
● Multidimensional data – requires more memory
● Diversity in the data – rare to rich populations
● Overlapping features – observational limitations
● Missing values – observational limitations
● Uncertainties – inherent and observational
● Processing Power – silicon limitations
● Storage and retrieval – bandwidth limitations
17.
Large Data Issues
● Multidimensional data – requires more memory
● Diversity in the data – rare to rich populations
● Overlapping features – observational limitations
● Missing values – observational limitations
● Uncertainties – inherent and observational
● Processing Power – silicon limitations
● Storage and retrieval – bandwidth limitations
19.
A Real Example
Composed of about a
million points showing
clustering of Quasars
(blue and red), main
sequence stars (green),
late type stars (yellow)
and unresolved galaxies
(pink) in a colour colour
plot of SDSS colours.
SDSS
colourcolour plot
20.
A Real Example
Composed of about a
million points showing
clustering of Quasars
(blue and red), main
sequence stars (green),
late type stars (yellow)
and unresolved galaxies
(pink) in a colour colour
plot of SDSS colours.
Blue are low redshift
Quasars and our goal is
to identify them and
verify whether the actual
number count match
with the estimated
values.
24.
Feature Space
● SDSS provides 5 magnitudes for each object in
bands u, g, r, i and z that can be used to
construct a ten dimensional colour space.
● A subset of the 150,000 objects with confirmed
spectroscopic classification can be used to
estimate the likelihood.
● The classifier can be tested on remaining data
to verify the accuracy of the model.
25.
Feature Space
● SDSS provides 5 magnitudes for each object in
bands u, g, r, i and z that can be used to
construct a ten dimensional colour space.
● A subset of the 150,000 objects with confirmed
spectroscopic classification can be used to
estimate the likelihood.
● The classifier can be tested on remaining data
to verify the accuracy of the model.
The distribution is not smooth
26.
Feature Space
● SDSS provides 5 magnitudes for each object in
bands u, g, r, i and z that can be used to construct a
ten dimensional colour space.
● A subset of the 150,000 objects with confirmed
spectroscopic classification can be used to estimate
the likelihood.
The colour space need to be binned to approximate
the distribution.
● The classifier can be tested on remaining data to
verify the accuracy of the model.
The distribution is not smooth
27.
Feature Space
● SDSS provides 5 magnitudes for each object in bands u,
g, r, i and z that can be used to construct a ten
dimensional colour space.
● A subset of the 150,000 objects with confirmed
spectroscopic classification can be used to estimate the
likelihood.
The colour space need to be binned to approximate the
distribution. Computing conditional likelihood of the binned
high dimensional feature space is nearly impossible.
● The classifier can be tested on remaining data to verify the
accuracy of the model.
The distribution is not smooth
28.
Two issues with Bayesian Formalism
● How would you guess the True value of the
Prior for each bin?
● Conditional dependency of the input feature
space – likelihood is conditionally dependent on
input values Naive Bayesian models fail on
even simple XOR problems.
30.
Bayesian Methods
● Ensemble methods: Multiple models, same data
● Bagging: each model in ensemble vote for the
probable candidate
● Boosting: Emphasise the failing models with
weights
● Bayesian Model Averaging (BMA): Sampling
Hypothesis from Hypothesis Space
● Bayesian Model Combination (BMC): Seek
combination of models closest to a distribution.
31.
Bayesian Methods
● Ensemble methods: Multiple models, same data
● Bagging: each model in ensemble vote for the
probable candidate
● Boosting: Emphasise the failing models with
weights
● Bayesian Model Averaging (BMA): Sampling
Hypothesis from Hypothesis Space
● Bayesian Model Combination (BMC): Seek
combination of models closest to a distribution.
32.
Bayesian Methods
● Ensemble methods: Multiple models, same data
● Bagging: each model in ensemble vote for the
probable candidate
● Boosting: Emphasise the failing models with
weights
● Bayesian Model Averaging (BMA): Sampling
Hypothesis from Hypothesis Space
● Bayesian Model Combination (BMC): Seek
combination of models closest to a distribution.
33.
Bayesian Methods
● Ensemble methods: Multiple models, same data
● Bagging: each model in ensemble vote for the
probable candidate
● Boosting: Emphasise the failing models with
weights
● Bayesian Model Averaging (BMA): Sampling
Hypothesis from Hypothesis Space
● Bayesian Model Combination (BMC): Seek a
combination of models closest to a distribution.
40.
Likelihood estimation of Binned Space
● Likelihood estimation becomes an issue because
we want to know the conditional likelihood of the
binned feature space. There may not be sufficient
samples in each bin to estimate likelihood when
conditional dependence constrains are imposed on
them.
● We adopted an imposed conditional independence
formula that approximate the likelihood for a
conditionally dependent event as the product of the
likelihood for pairs of input features.
41.
Likelihood estimation of Binned Space
● Likelihood estimation becomes an issue because
we want to know the conditional likelihood of the
binned feature space. There may not be sufficient
samples to estimate likelihood when constrains on
conditional dependence is imposed on them.
● We adopted an imposed conditional independence
formula that approximate the likelihood for a
conditionally dependent event as the product of the
likelihood for pairs of input features.
53.
A Challenging Problem
● Generate alerts on optical transient detections
● Minimize false alarms
● Customize alarms to user demands
● Send the alarms immediately – given minimal or
sometime very little information about it.
Example : Nearest distance to a galaxy or star
: Distance to nearest known radio object
: Distance to nearest known xray detections
: Magnitudes in archives and in earlier detections
54.
A Challenging Problem
● Generate alerts on optical transient detections
● Minimize false alarms
● Customize alarms to user demands
● Send the alarms immediately – given minimal or
sometime very little information about it.
Example : Nearest distance to a galaxy or star
: Distance to nearest known radio object
: Distance to nearest known xray detections
: Magnitudes in archives and in earlier detections
55.
A Challenging Problem
● Generate alerts on optical transient detections
● Minimize false alarms
● Customize alerts to user demands
● Send the alarms immediately – given minimal
or sometime very little information about it.
Example : Nearest distance to a galaxy or star
: Distance to nearest known radio object
: Distance to nearest known xray detections
: Magnitudes in archives and in earlier
detections
56.
A Challenging Problem
● Generate alerts on optical transient detections
● Minimize false alarms
● Customize alarms to user demands
● Send the alerts immediately – given minimal or
sometime very little information about it.
Example : Nearest distance to a galaxy or star
: Distance to nearest known radio object
: Distance to nearest known xray detections
: Magnitudes in archives and in earlier
detections
57.
A Challenging Problem
● Generate alerts on optical transient detections
● Minimize false alarms
● Customize alarms to user demands
● Send the alerts immediately – given minimal or
sometime very little information about it.
Example : Nearest distance to a galaxy or star
: Distance to nearest known radio object
: Distance to nearest known xray detections
: Magnitudes in archives and in earlier
detections
75.
Dreaming Computers
● We now have many input features but not so
many examples to learn from. This can lead to
overfitting the data and Memorising rather than
generalising the situation.
● Dreams are synthetic inputs our brain uses to
teach us how to react to plausible situations.
Can we create dreams for computers?
76.
Dreaming Computers
● We now have many input features but not so
many examples to learn from. This can lead to
overfitting the data and Memorising rather than
generalising the situation.
● Dreams are synthetic inputs our brain uses to
teach us how to react to plausible situations.
Can we create dreams for computers?
80.
DBNN Annotator
A collaborative project with Ashish Mahabal A collaborative project with Ashish Mahabal
(Caltech), IUCAA, Pune and the CRTS Team (Caltech), IUCAA, Pune and the CRTS Team
with funding from IUSSTF and ISRO.with funding from IUSSTF and ISRO.
84.
Parallel DBNN
● Since DBNN split likelihood as the product of
individual pairs, computation of likelihood may
be independently carried out by a different node
in a HPC system.
● Broadcast likelihoods to nodes
● Compute
● Gather Bayesian belief for each outcome
85.
Parallel DBNN
● Since DBNN split likelihood as the product of
individual pairs, computation of likelihood may
be independently carried out by a different node
in a HPC system.
● Broadcast likelihoods to nodes
● Compute
● Gather Bayesian belief for each outcome
86.
Parallel DBNN
● Since DBNN split likelihood as the product of
individual pairs, computation of likelihood may
be independently carried out by a different node
in a HPC system.
● Broadcast likelihoods to nodes
● Compute
● Gather Bayesian belief for each outcome
87.
Parallel DBNN
● Since DBNN split likelihood as the product of
individual pairs, computation of likelihood may
be independently carried out by a different node
in a HPC system.
● Broadcast likelihoods to nodes
● Compute
● Gather Bayesian belief for each outcome