Modern astronomical surveys are producing datasets of unprecedented size and richness, increasing the potential for high-impact
scientific discovery. This possibility, coupled with the challenge of exploring a large number of sources, has led to the development
of novel machine-learning-based anomaly detection approaches, such as astronomaly. For the first time, we test the scalability
of astronomaly by applying it to almost 4 million images of galaxies from the Dark Energy Camera Legacy Survey. We use a
trained deep learning algorithm to learn useful representations of the images and pass these to the anomaly detection algorithm
isolation forest, coupled with astronomaly’s active learning method, to discover interesting sources.We find that data selection
criteria have a significant impact on the trade-off between finding rare sources such as strong lenses and introducing artefacts into
the dataset. We demonstrate that active learning is required to identify the most interesting sources and reduce artefacts, while
anomaly detection methods alone are insufficient. Using astronomaly, we find 1635 anomalies among the top 2000 sources in
the dataset after applying active learning, including 8 strong gravitational lens candidates, 1609 galaxy merger candidates, and 18
previously unidentified sources exhibiting highly unusual morphology. Our results show that by leveraging the human-machine
interface, astronomaly is able to rapidly identify sources of scientific interest even in large datasets.
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
Context. Determining the size distribution of asteroids is key to understanding the collisional history and evolution of the inner Solar System. Aims. We aim to improve our knowledge of the size distribution of small asteroids in the main belt by determining the parallaxes of newly detected asteroids in the Hubble Space Telescope (HST) archive and subsequently their absolute magnitudes and sizes. Methods. Asteroids appear as curved trails in HST images because of the parallax induced by the fast orbital motion of the spacecraft. Taking into account the trajectory of this latter, the parallax effect can be computed to obtain the distance to the asteroids by fitting simulated trajectories to the observed trails. Using distance, we can obtain the absolute magnitude of an object and an estimation of its size assuming an albedo value, along with some boundaries for its orbital parameters. Results. In this work, we analyse a set of 632 serendipitously imaged asteroids found in the ESA HST archive. Images were captured with the ACS/WFC and WFC3/UVIS instruments. A machine learning algorithm (trained with the results of a citizen science project) was used to detect objects in these images as part of a previous study. Our raw data consist of 1031 asteroid trails from unknown objects, not matching any entries in the Minor Planet Center (MPC) database using their coordinates and imaging time. We also found 670 trails from known objects (objects featuring matching entries in the MPC). After an accuracy assessment and filtering process, our analysed HST asteroid set consists of 454 unknown objects and 178 known objects. We obtain a sample dominated by potential main belt objects featuring absolute magnitudes (H) mostly between 15 and 22 mag. The absolute magnitude cumulative distribution logN(H > H0) ∝ αlog(H0) confirms the previously reported slope change for 15 < H < 18, from α ≈ 0.56 to α ≈ 0.26, maintained in our case down to absolute magnitudes of around H ≈ 20, and therefore expanding the previous result by approximately two magnitudes. Conclusions. HST archival observations can be used as an asteroid survey because the telescope pointings are statistically randomly oriented in the sky and cover long periods of time. They allow us to expand the current best samples of astronomical objects at no extra cost in regard to telescope time.
ASTRONOMICAL OBJECTS DETECTION IN CELESTIAL BODIES USING COMPUTER VISION ALGO...csandit
Computer vision, astronomy, and astrophysics function quite productively together to the point where they are completely logical for each other. Out of computer vision algorithms the
progress of astronomy and astrophysics would have slowed down to reasonably a deadlock. The new researches and calculations can lead to more information as well as higher quality of data. Consequently, an organized view on planetary surfaces can change all in the long run. A new
discovery would be a puzzling complexity or a possible branching of paths, yet the quest to know more about the celestial bodies by dint of computer vision algorithms will continue. The detection of astronomical objects in celestial bodies is a challenging task. This paper presents
an implementation of how to detect astronomical objects in celestial bodies using computer vision algorithm with satisfactory performance. It also puts forward some observations linked
among computer vision, astronomy, and astrophysics.
We present the 2020 version of the Siena Galaxy Atlas (SGA-2020), a multiwavelength optical and infrared
imaging atlas of 383,620 nearby galaxies. The SGA-2020 uses optical grz imaging over ≈20,000 deg2 from the
Dark Energy Spectroscopic Instrument (DESI) Legacy Imaging Surveys Data Release 9 and infrared imaging in
four bands (spanning 3.4–22 μm) from the 6 year unWISE coadds; it is more than 95% complete for galaxies larger
than R(26) ≈ 25″ and r < 18 measured at the 26 mag arcsec−2 isophote in the r band. The atlas delivers precise
coordinates, multiwavelength mosaics, azimuthally averaged optical surface-brightness profiles, model images and
photometry, and additional ancillary metadata for the full sample. Coupled with existing and forthcoming optical
spectroscopy from the DESI, the SGA-2020 will facilitate new detailed studies of the star formation and mass
assembly histories of nearby galaxies; enable precise measurements of the local velocity field via the Tully–Fisher
and fundamental plane relations; serve as a reference sample of lasting legacy value for time-domain and
multimessenger astronomical events; and more.
Identifying Exoplanets with Machine Learning Methods: A Preliminary StudyIJCI JOURNAL
The discovery of habitable exoplanets has long been a heated topic in astronomy. Traditional methods for exoplanet identification include the wobble method, direct imaging, gravitational microlensing, etc., which not only require a considerable investment of manpower, time, and money, but also are limited by the performance of astronomical telescopes. In this study, we proposed the idea of using machine learning methods to identify exoplanets. We used the Kepler dataset collected by NASA from the Kepler Space Observatory to conduct supervised learning, which predicts the existence of exoplanet candidates as a three-categorical classification task, using decision tree, random forest, naïve Bayes, and neural network; we used another NASA dataset consisted of the confirmed exoplanets data to conduct unsupervised learning, which divides the confirmed exoplanets into different clusters, using k-means clustering. As a result, our models achieved accuracies of 99.06%, 92.11%, 88.50%, and 99.79%, respectively, in the supervised learning task and successfully obtained reasonable clusters in the unsupervised learning task.
The ASTRODEEP Frontier Fields catalogues II. Photometric redshifts and rest f...Sérgio Sacani
Aims. We present the first public release of photometric redshifts, galaxy rest frame properties and associated magnification values
in the cluster and parallel pointings of the first two Frontier Fields, Abell-2744 and MACS-J0416. The released catalogues aim to
provide a reference for future investigations of extragalactic populations in these legacy fields: from lensed high-redshift galaxies to
cluster members themselves.
Methods.We exploit a multiwavelength catalogue, ranging from Hubble Space Telescope (HST) to ground-based K and Spitzer IRAC,
which is specifically designed to enable detection and measurement of accurate fluxes in crowded cluster regions. The multiband
information is used to derive photometric redshifts and physical properties of sources detected either in the H-band image alone, or
from a stack of four WFC3 bands. To minimize systematics, median photometric redshifts are assembled from six dierent approaches
to photo-z estimates. Their reliability is assessed through a comparison with available spectroscopic samples. State-of-the-art lensing
models are used to derive magnification values on an object-by-object basis by taking into account sources positions and redshifts.
Results. We show that photometric redshifts reach a remarkable 3–5% accuracy. After accounting for magnification, the H-band
number counts are found to be in agreement at bright magnitudes with number counts from the CANDELS fields, while extending
the presently available samples to galaxies that, intrinsically, are as faint as H 32 33, thanks to strong gravitational lensing. The
Frontier Fields allow the galaxy stellar mass distribution to be probed, depending on magnification, at 0.5–1.5 dex lower masses with
respect to extragalactic wide fields, including sources at Mstar 107–108 M at z > 5. Similarly, they allow the detection of objects
with intrinsic star formation rates (SFRs) >1 dex lower than in the CANDELS fields reaching 0.1–1 M=yr at z 6–10.
Hubble Space Telescope Observations of NGC 253 Dwarf Satellites: Three Ultra-...Sérgio Sacani
We present deep Hubble Space Telescope (HST) imaging of five faint dwarf galaxies associated with the nearby
spiral NGC 253 (D ≈ 3.5 Mpc). Three of these are newly discovered dwarf galaxies, while all five were found in
the Panoramic Imaging Survey of Centaurus and Sculptor, a Magellan+Megacam survey to identify faint dwarfs
and other substructures in resolved stellar light around massive galaxies outside of the Local Group. Our HST data
reach 3 magnitudes below the tip of the red giant branch for each dwarf, allowing us to derive their distances,
structural parameters, and luminosities. All five systems contain mostly old, metal-poor stellar populations
(age ∼12 Gyr, [M/H] −1.5) and have sizes (rh ∼ 110–3000 pc) and luminosities (MV ∼ −7 to −12 mag) largely
consistent with Local Group dwarfs. The three new NGC 253 satellites are among the faintest systems discovered
beyond the Local Group. We also use archival H I data to place limits on the gas content of our discoveries. Deep
imaging surveys such as our program around NGC 253 promise to elucidate the faint end of the satellite luminosity
function and its scatter across a range of galaxy masses, morphologies, and environments in the decade to come
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
Context. Determining the size distribution of asteroids is key to understanding the collisional history and evolution of the inner Solar System. Aims. We aim to improve our knowledge of the size distribution of small asteroids in the main belt by determining the parallaxes of newly detected asteroids in the Hubble Space Telescope (HST) archive and subsequently their absolute magnitudes and sizes. Methods. Asteroids appear as curved trails in HST images because of the parallax induced by the fast orbital motion of the spacecraft. Taking into account the trajectory of this latter, the parallax effect can be computed to obtain the distance to the asteroids by fitting simulated trajectories to the observed trails. Using distance, we can obtain the absolute magnitude of an object and an estimation of its size assuming an albedo value, along with some boundaries for its orbital parameters. Results. In this work, we analyse a set of 632 serendipitously imaged asteroids found in the ESA HST archive. Images were captured with the ACS/WFC and WFC3/UVIS instruments. A machine learning algorithm (trained with the results of a citizen science project) was used to detect objects in these images as part of a previous study. Our raw data consist of 1031 asteroid trails from unknown objects, not matching any entries in the Minor Planet Center (MPC) database using their coordinates and imaging time. We also found 670 trails from known objects (objects featuring matching entries in the MPC). After an accuracy assessment and filtering process, our analysed HST asteroid set consists of 454 unknown objects and 178 known objects. We obtain a sample dominated by potential main belt objects featuring absolute magnitudes (H) mostly between 15 and 22 mag. The absolute magnitude cumulative distribution logN(H > H0) ∝ αlog(H0) confirms the previously reported slope change for 15 < H < 18, from α ≈ 0.56 to α ≈ 0.26, maintained in our case down to absolute magnitudes of around H ≈ 20, and therefore expanding the previous result by approximately two magnitudes. Conclusions. HST archival observations can be used as an asteroid survey because the telescope pointings are statistically randomly oriented in the sky and cover long periods of time. They allow us to expand the current best samples of astronomical objects at no extra cost in regard to telescope time.
ASTRONOMICAL OBJECTS DETECTION IN CELESTIAL BODIES USING COMPUTER VISION ALGO...csandit
Computer vision, astronomy, and astrophysics function quite productively together to the point where they are completely logical for each other. Out of computer vision algorithms the
progress of astronomy and astrophysics would have slowed down to reasonably a deadlock. The new researches and calculations can lead to more information as well as higher quality of data. Consequently, an organized view on planetary surfaces can change all in the long run. A new
discovery would be a puzzling complexity or a possible branching of paths, yet the quest to know more about the celestial bodies by dint of computer vision algorithms will continue. The detection of astronomical objects in celestial bodies is a challenging task. This paper presents
an implementation of how to detect astronomical objects in celestial bodies using computer vision algorithm with satisfactory performance. It also puts forward some observations linked
among computer vision, astronomy, and astrophysics.
We present the 2020 version of the Siena Galaxy Atlas (SGA-2020), a multiwavelength optical and infrared
imaging atlas of 383,620 nearby galaxies. The SGA-2020 uses optical grz imaging over ≈20,000 deg2 from the
Dark Energy Spectroscopic Instrument (DESI) Legacy Imaging Surveys Data Release 9 and infrared imaging in
four bands (spanning 3.4–22 μm) from the 6 year unWISE coadds; it is more than 95% complete for galaxies larger
than R(26) ≈ 25″ and r < 18 measured at the 26 mag arcsec−2 isophote in the r band. The atlas delivers precise
coordinates, multiwavelength mosaics, azimuthally averaged optical surface-brightness profiles, model images and
photometry, and additional ancillary metadata for the full sample. Coupled with existing and forthcoming optical
spectroscopy from the DESI, the SGA-2020 will facilitate new detailed studies of the star formation and mass
assembly histories of nearby galaxies; enable precise measurements of the local velocity field via the Tully–Fisher
and fundamental plane relations; serve as a reference sample of lasting legacy value for time-domain and
multimessenger astronomical events; and more.
Identifying Exoplanets with Machine Learning Methods: A Preliminary StudyIJCI JOURNAL
The discovery of habitable exoplanets has long been a heated topic in astronomy. Traditional methods for exoplanet identification include the wobble method, direct imaging, gravitational microlensing, etc., which not only require a considerable investment of manpower, time, and money, but also are limited by the performance of astronomical telescopes. In this study, we proposed the idea of using machine learning methods to identify exoplanets. We used the Kepler dataset collected by NASA from the Kepler Space Observatory to conduct supervised learning, which predicts the existence of exoplanet candidates as a three-categorical classification task, using decision tree, random forest, naïve Bayes, and neural network; we used another NASA dataset consisted of the confirmed exoplanets data to conduct unsupervised learning, which divides the confirmed exoplanets into different clusters, using k-means clustering. As a result, our models achieved accuracies of 99.06%, 92.11%, 88.50%, and 99.79%, respectively, in the supervised learning task and successfully obtained reasonable clusters in the unsupervised learning task.
The ASTRODEEP Frontier Fields catalogues II. Photometric redshifts and rest f...Sérgio Sacani
Aims. We present the first public release of photometric redshifts, galaxy rest frame properties and associated magnification values
in the cluster and parallel pointings of the first two Frontier Fields, Abell-2744 and MACS-J0416. The released catalogues aim to
provide a reference for future investigations of extragalactic populations in these legacy fields: from lensed high-redshift galaxies to
cluster members themselves.
Methods.We exploit a multiwavelength catalogue, ranging from Hubble Space Telescope (HST) to ground-based K and Spitzer IRAC,
which is specifically designed to enable detection and measurement of accurate fluxes in crowded cluster regions. The multiband
information is used to derive photometric redshifts and physical properties of sources detected either in the H-band image alone, or
from a stack of four WFC3 bands. To minimize systematics, median photometric redshifts are assembled from six dierent approaches
to photo-z estimates. Their reliability is assessed through a comparison with available spectroscopic samples. State-of-the-art lensing
models are used to derive magnification values on an object-by-object basis by taking into account sources positions and redshifts.
Results. We show that photometric redshifts reach a remarkable 3–5% accuracy. After accounting for magnification, the H-band
number counts are found to be in agreement at bright magnitudes with number counts from the CANDELS fields, while extending
the presently available samples to galaxies that, intrinsically, are as faint as H 32 33, thanks to strong gravitational lensing. The
Frontier Fields allow the galaxy stellar mass distribution to be probed, depending on magnification, at 0.5–1.5 dex lower masses with
respect to extragalactic wide fields, including sources at Mstar 107–108 M at z > 5. Similarly, they allow the detection of objects
with intrinsic star formation rates (SFRs) >1 dex lower than in the CANDELS fields reaching 0.1–1 M=yr at z 6–10.
Hubble Space Telescope Observations of NGC 253 Dwarf Satellites: Three Ultra-...Sérgio Sacani
We present deep Hubble Space Telescope (HST) imaging of five faint dwarf galaxies associated with the nearby
spiral NGC 253 (D ≈ 3.5 Mpc). Three of these are newly discovered dwarf galaxies, while all five were found in
the Panoramic Imaging Survey of Centaurus and Sculptor, a Magellan+Megacam survey to identify faint dwarfs
and other substructures in resolved stellar light around massive galaxies outside of the Local Group. Our HST data
reach 3 magnitudes below the tip of the red giant branch for each dwarf, allowing us to derive their distances,
structural parameters, and luminosities. All five systems contain mostly old, metal-poor stellar populations
(age ∼12 Gyr, [M/H] −1.5) and have sizes (rh ∼ 110–3000 pc) and luminosities (MV ∼ −7 to −12 mag) largely
consistent with Local Group dwarfs. The three new NGC 253 satellites are among the faintest systems discovered
beyond the Local Group. We also use archival H I data to place limits on the gas content of our discoveries. Deep
imaging surveys such as our program around NGC 253 promise to elucidate the faint end of the satellite luminosity
function and its scatter across a range of galaxy masses, morphologies, and environments in the decade to come
Imaging the Milky Way with Millihertz Gravitational WavesSérgio Sacani
Modern astronomers enjoy access to all-sky images across a wide range of the electromagnetic spectrum from
long-wavelength radio to high-energy gamma rays. The most prominent feature in many of these images is our
own Galaxy, with different features revealed in each wave band. Gravitational waves (GWs) have recently been
added to the astronomers’ toolkit as a nonelectromagnetic messenger. To date, all identified GW sources have been
extra-Galactic and transient. However, the Milky Way hosts a population of ultracompact binaries (UCBs), which
radiate persistent GWs in the milliHertz band that is not observable with today’s terrestrial gravitational-wave
detectors. Space-based detectors such as the Laser Interferometer Space Antenna will measure this population and
provide a census of their location, masses, and orbital properties. In this work, we will show how this data can be
used to form a false-color image of the Galaxy that represents the intensity and frequency of the gravitational
waves produced by the UCB population. Such images can be used to study the morphology of the Galaxy, identify
interesting multimessenger sources through cross-matching, and for educational and outreach purposes.
First light of VLT/HiRISE: High-resolution spectroscopy of young giant exopla...Sérgio Sacani
A major endeavor of this decade is the direct characterization of young giant exoplanets at high spectral resolution to determine the composition of
their atmosphere and infer their formation processes and evolution. Such a goal represents a major challenge owing to their small angular separation
and luminosity contrast with respect to their parent stars. Instead of designing and implementing completely new facilities, it has been proposed
to leverage the capabilities of existing instruments that offer either high contrast imaging or high dispersion spectroscopy, by coupling them using
optical fibers. In this work we present the implementation and first on-sky results of the HiRISE instrument at the very large telescope (VLT),
which combines the exoplanet imager SPHERE with the recently upgraded high resolution spectrograph CRIRES using single-mode fibers. The
goal of HiRISE is to enable the characterization of known companions in the H band, at a spectral resolution of the order of R = λ/∆λ = 100 000,
in a few hours of observing time. We present the main design choices and the technical implementation of the system, which is constituted of three
major parts: the fiber injection module inside of SPHERE, the fiber bundle around the telescope, and the fiber extraction module at the entrance
of CRIRES. We also detail the specific calibrations required for HiRISE and the operations of the instrument for science observations. Finally, we
detail the performance of the system in terms of astrometry, temporal stability, optical aberrations, and transmission, for which we report a peak
value of ∼3.9% based on sky measurements in median observing conditions. Finally, we report on the first astrophysical detection of HiRISE to
illustrate its potential.
AUTOMATIC SPECTRAL CLASSIFICATION OF STARS USING MACHINE LEARNING: AN APPROAC...mlaij
With the increase in astronomical surveys, astronomers are faced with the challenging task of analyzing a
large amount of data in order to classify observed objects into hard-to-distinguish classes. This article
presents a machine learning-based method for the automatic spectral classification of stars from the latest
release of the SDSS database. We propose the combinatorial use of spectral data, derived stellar data, and
calculated data to create patterns. Using these patterns as inputs, we develop a Random Forest model that
outputs the spectral class of the observed star. Our model is able to classify data into six complex classes:
A, F, G, K, M, and Carbon stars. Due to the unbalanced nature of the data, we train our model considering
three data use cases: using the original data, using under-sampling, and over-sampling data techniques.
We further test our model by using a fixed dataset and a stratified dataset. From this, we analyze the
performance of our model through statistical metrics. The experimental results showed that the
combinatorial use of data as an input pattern contributes to improve the prediction scores in all data use
cases, meanwhile, the model trained with augmented data outperforms the other cases. Our results suggest
that machine learning-based spectral classification of stars may be useful for astronomers.
EXOPLANETS IDENTIFICATION AND CLUSTERING WITH MACHINE LEARNING METHODSmlaij
The discovery of habitable exoplanets has long been a heated topic in astronomy. Traditional methods for
exoplanet identification include the wobble method, direct imaging, gravitational microlensing, etc., which
not only require a considerable investment of manpower, time, and money, but also are limited by the
performance of astronomical telescopes. In this study, we proposed the idea of using machine learning
methods to identify exoplanets. We used the Kepler dataset collected by NASA from the Kepler Space
Observatory to conduct supervised learning, which predicts the existence of exoplanet candidates as a
three-categorical classification task, using decision tree, random forest, naïve Bayes, and neural network;
we used another NASA dataset consisted of the confirmed exoplanets data to conduct unsupervised
learning, which divides the confirmed exoplanets into different clusters, using k-means clustering. As a
result, our models achieved accuracies of 99.06%, 92.11%, 88.50%, and 99.79%, respectively, in the
supervised learning task and successfully obtained reasonable clusters in the unsupervised learning task.
A giant thin stellar stream in the Coma Galaxy ClusterSérgio Sacani
The study of dynamically cold stellar streams reveals information about the gravitational potential where they reside and provides
important constraints on the properties of dark matter. However, the intrinsic faintness of these streams makes their detection beyond
Local environments highly challenging. Here, we report the detection of an extremely faint stellar stream (µg,max = 29.5 mag arcsec−2
)
with an extraordinarily coherent and thin morphology in the Coma Galaxy Cluster. This Giant Coma Stream spans ∼510 kpc in length
and appears as a free-floating structure located at a projected distance of 0.8 Mpc from the center of Coma. We do not identify any
potential galaxy remnant or core, and the stream structure appears featureless in our data. We interpret the Giant Coma Stream as
being a recently accreted, tidally disrupting passive dwarf. Using the Illustris-TNG50 simulation, we identify a case with similar
characteristics, showing that, although rare, these types of streams are predicted to exist in Λ-CDM. Our work unveils the presence
of free-floating, extremely faint and thin stellar streams in galaxy clusters, widening the environmental context in which these objects
are found ahead of their promising future application in the study of the properties of dark matter.
Prospects for Detecting Gaps in Globular Cluster Stellar Streams in External ...Sérgio Sacani
Stellar streams form through the tidal disruption of satellite galaxies or globular clusters orbiting a
host galaxy. Globular cluster streams are exciting since they are thin (dynamically cold) and, therefore
sensitive to perturbations from low-mass subhalos. Since the subhalo mass function differs depending
on the dark matter composition, these gaps can provide unique constraints on dark matter models.
However, current samples are limited to the Milky Way. With its large field of view, deep imaging
sensitivity, and high angular resolution, the upcoming Nancy Grace Roman Space Telescope (Roman)
presents a unique opportunity to increase the number of observed streams and gaps significantly. This
paper presents a first exploration of the prospects for detecting gaps in streams in M31 and other
nearby galaxies with resolved stars. We simulate the formation of gaps in a Palomar-5-like stream
and generate mock observations of these gaps with background stars in M31 and the foreground Milky
Way stellar fields. We assess Roman’s ability to detect gaps out to 10 Mpc through visual inspection
and with the gap-finding tool FindTheGap. We conclude that gaps of ≈ 1.5 kpc in streams that are
created from subhalos of masses ≥ 5×106 M⊙ are detectable within a 2–3 Mpc volume in exposures of
1000s–1 hour. This volume contains ≈ 150 galaxies, including ≈ 8 galaxies with luminosities > 109 L⊙.
Large samples of stream gaps in external galaxies will open up a new era of statistical analyses of gap
characteristics in stellar streams and help constrain dark matter models.
A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...Sérgio Sacani
We conducted a search for narrowband radio signals over four observing sessions in 2020–2023 with
the L-band receiver (1.15–1.73 GHz) of the 100 m diameter Green Bank Telescope. We pointed the
telescope in the directions of 62 TESS Objects of Interest, capturing radio emissions from a total of
∼11,860 stars and planetary systems in the ∼9 arcminute beam of the telescope. All detections were
either automatically rejected or visually inspected and confirmed to be of anthropogenic nature. In
this work, we also quantified the end-to-end efficiency of radio SETI pipelines with a signal injection
and recovery analysis. The UCLA SETI pipeline recovers 94.0% of the injected signals over the usable
frequency range of the receiver and 98.7% of the injections when regions of dense RFI are excluded. In
another pipeline that uses incoherent sums of 51 consecutive spectra, the recovery rate is ∼15 times
smaller at ∼6%. The pipeline efficiency affects SETI search volume calculations as well as calculations
of upper bounds on the number of transmitting civilizations. We developed an improved Drake Figure
of Merit for SETI search volume calculations that includes the pipeline efficiency and frequency drift
rate coverage. Based on our observations, we found that there is a high probability (94.0–98.7%) that
fewer than ∼0.014% of stars earlier than M8 within 100 pc host a transmitter that is detectable in
our search (EIRP > 1012 W). Finally, we showed that the UCLA SETI pipeline natively detects the
signals detected with AI techniques by Ma et al. (2023).
DYNAMICAL ANALYSIS OF THE DARK MATTER AND CENTRAL BLACK HOLE MASS IN THE DWAR...Sérgio Sacani
We measure the central kinematics for the dwarf spheroidal galaxy Leo I using integrated-light measurements and
previously published data. We find a steady rise in the velocity dispersion from 30000 into the center. The integratedlight kinematics provide a velocity dispersion of 11.76±0.66 km s−1
inside 7500. After applying appropriate corrections
to crowding in the central regions, we achieve consistent velocity dispersion values using velocities from individual stars.
Crowding corrections need to be applied when targeting individual stars in high density stellar environments. From
integrated light, we measure the surface brightness profile and find a shallow cusp towards the center. Axisymmetric,
orbit-based models measure the stellar mass-to-light ratio, black hole mass and parameters for a dark matter halo. At
large radii it is important to consider possible tidal effects from the Milky Way so we include a variety of assumptions
regarding the tidal radius. For every set of assumptions, models require a central black hole consistent with a mass
3.3 ± 2×106 M. The no-black-hole case for any of our assumptions is excluded at over 95% significance, with
6.4 < ∆χ
2 < 14. A black hole of this mass would have significant effect on dwarf galaxy formation and evolution.
The dark halo parameters are heavily affected by the assumptions for the tidal radii, with the circular velocity only
constrained to be above 30 km s−1
. Reasonable assumptions for the tidal radius result in stellar orbits consistent with
an isotropic distribution in the velocities. These more realistic models only show strong constraints for the mass of
the central black hole.
Computational Training and Data Literacy for Domain ScientistsJoshua Bloom
Presented at the National Academy of Sciences (11 April 2014, Washington, D.C.) at the workshop "Training Students to Extract Value from Big Data.” Discussion of computational and programming education at UC Berkeley. Emphasis on Python as a glue/gateway language. An advocation for the notion of first teaching "Data Literacy" to domain scientists before teaching Big Data proficiency.
Signal Synchronization Strategies and Time Domain SETI with Gaia DR3Sérgio Sacani
Spatiotemporal techniques for signal coordination with actively transmitting extraterrestrial civilizations, without the need for prior communication, can constrain technosignature searches to a significantly smaller coordinate space. With the variable star
catalog from Gaia Data Release 3, we explore two related signaling strategies: the SETI
Ellipsoid, and that proposed by Seto, which are both based on the synchronization of
transmissions with a conspicuous astrophysical event. This dataset contains more than
10 million variable star candidates with light curves from the first three years of Gaia’s
operational phase, between 2014 and 2017. Using four different historical supernovae as
source events, we find that less than 0.01% of stars in the sample have crossing times,
the times at which we would expect to receive synchronized signals on Earth, within
the date range of available Gaia observations. For these stars, we present a framework
for technosignature analysis that searches for modulations in the variability parameters
by splitting the stellar light curve at the crossing time.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
The JWST Discovery of the Triply-imaged Type Ia “Supernova H0pe” and Observat...Sérgio Sacani
A Type Ia supernova (SN) at z = 1.78 was discovered in James Webb Space Telescope Near Infrared
Camera imaging of the galaxy cluster PLCK G165.7+67.0 (G165; z = 0.35). The SN is situated 1.5–
2 kpc from its host galaxy Arc 2 and appears in three different locations as a result of gravitational
lensing by G165. These data can yield a value for Hubble’s constant using time delays from this
multiply-imaged SN Ia that we call “SN H0pe.” Over the entire field we identified 21 image multiplicities,
confirmed five of them using Near-Infrared Spectrograph (NIRspec), and constructed a new
lens model that gives a total mass within 600 kpc of (2.6 ± 0.3) × 1014M⊙. The photometry uncovered
a galaxy overdensity at Arc 2’s redshift. NIRSpec confirmed six member galaxies, four of which
surround Arc 2 with relative velocity ≲900 km s−1 and projected physical extent ≲33 kpc. Arc 2
dominates the stellar mass ((5.0±0.1)×1011M⊙), which is a factor of ten higher than other members
of this compact galaxy group. These other group members have specific star formation rates (sSFR)
arXiv:2309.07326v1 [astro-ph.GA] 13 Sep 2023
2 Frye, Pascale, Pierel et al.
of 2–260 Gyr−1 derived from the Hα-line flux corrected for stellar absorption, dust extinction, and slit
losses. Another group centered on the dusty star forming galaxy Arc 1 is at z = 2.24. The total SFR
for the Arc 1 group (≳400M⊙ yr−1) translates to a supernova rate of ∼1 SNe yr−1, suggesting that
regular monitoring of this cluster may yield additional SNe.
Locating Hidden Exoplanets in ALMA Data Using Machine LearningSérgio Sacani
Exoplanets in protoplanetary disks cause localized deviations from Keplerian velocity in channel maps of
molecular line emission. Current methods of characterizing these deviations are time consuming,and there is no
unified standard approach. We demonstrate that machine learning can quickly and accurately detect the presence of
planets. We train our model on synthetic images generated from simulations and apply it to real observations to
identify forming planets in real systems. Machine-learning methods, based on computer vision, are not only
capable of correctly identifying the presence of one or more planets, but they can also correctly constrain the
location of those planets.
The massive relic galaxy NGC 1277 is dark matter deficient From dynamical mod...Sérgio Sacani
According to the Λ cold dark matter (ΛCDM) cosmology, present-day galaxies with stellar masses M? > 1011 M should contain
a sizable fraction of dark matter within their stellar body. Models indicate that in massive early-type galaxies (ETGs) with M? ≈
1.5 × 1011 M, dark matter should account for ∼15% of the dynamical mass within one effective radius (1 Re) and for ∼60% within
5 Re
. Most massive ETGs have been shaped through a two-phase process: the rapid growth of a compact core was followed by the
accretion of an extended envelope through mergers. The exceedingly rare galaxies that have avoided the second phase, the so-called
relic galaxies, are thought to be the frozen remains of the massive ETG population at z & 2. The best relic galaxy candidate discovered
to date is NGC 1277, in the Perseus cluster. We used deep integral field George and Cynthia Mitchel Spectrograph (GCMS) data to
revisit NGC 1277 out to an unprecedented radius of 6 kpc (corresponding to 5 Re). By using Jeans anisotropic modelling, we find
a negligible dark matter fraction within 5 Re (fDM(5 Re) < 0.05; two-sigma confidence level), which is in tension with the ΛCDM
expectation. Since the lack of an extended envelope would reduce dynamical friction and prevent the accretion of an envelope, we
propose that NGC 1277 lost its dark matter very early or that it was dark matter deficient ab initio. We discuss our discovery in the
framework of recent proposals, suggesting that some relic galaxies may result from dark matter stripping as they fell in and interacted
within galaxy clusters. Alternatively, NGC 1277 might have been born in a high-velocity collision of gas-rich proto-galactic fragments,
where dark matter left behind a disc of dissipative baryons. We speculate that the relative velocities of ≈2000 km s−1
required for the
latter process to happen were possible in the progenitors of the present-day rich galaxy clusters.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
More Related Content
Similar to Astronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies
Imaging the Milky Way with Millihertz Gravitational WavesSérgio Sacani
Modern astronomers enjoy access to all-sky images across a wide range of the electromagnetic spectrum from
long-wavelength radio to high-energy gamma rays. The most prominent feature in many of these images is our
own Galaxy, with different features revealed in each wave band. Gravitational waves (GWs) have recently been
added to the astronomers’ toolkit as a nonelectromagnetic messenger. To date, all identified GW sources have been
extra-Galactic and transient. However, the Milky Way hosts a population of ultracompact binaries (UCBs), which
radiate persistent GWs in the milliHertz band that is not observable with today’s terrestrial gravitational-wave
detectors. Space-based detectors such as the Laser Interferometer Space Antenna will measure this population and
provide a census of their location, masses, and orbital properties. In this work, we will show how this data can be
used to form a false-color image of the Galaxy that represents the intensity and frequency of the gravitational
waves produced by the UCB population. Such images can be used to study the morphology of the Galaxy, identify
interesting multimessenger sources through cross-matching, and for educational and outreach purposes.
First light of VLT/HiRISE: High-resolution spectroscopy of young giant exopla...Sérgio Sacani
A major endeavor of this decade is the direct characterization of young giant exoplanets at high spectral resolution to determine the composition of
their atmosphere and infer their formation processes and evolution. Such a goal represents a major challenge owing to their small angular separation
and luminosity contrast with respect to their parent stars. Instead of designing and implementing completely new facilities, it has been proposed
to leverage the capabilities of existing instruments that offer either high contrast imaging or high dispersion spectroscopy, by coupling them using
optical fibers. In this work we present the implementation and first on-sky results of the HiRISE instrument at the very large telescope (VLT),
which combines the exoplanet imager SPHERE with the recently upgraded high resolution spectrograph CRIRES using single-mode fibers. The
goal of HiRISE is to enable the characterization of known companions in the H band, at a spectral resolution of the order of R = λ/∆λ = 100 000,
in a few hours of observing time. We present the main design choices and the technical implementation of the system, which is constituted of three
major parts: the fiber injection module inside of SPHERE, the fiber bundle around the telescope, and the fiber extraction module at the entrance
of CRIRES. We also detail the specific calibrations required for HiRISE and the operations of the instrument for science observations. Finally, we
detail the performance of the system in terms of astrometry, temporal stability, optical aberrations, and transmission, for which we report a peak
value of ∼3.9% based on sky measurements in median observing conditions. Finally, we report on the first astrophysical detection of HiRISE to
illustrate its potential.
AUTOMATIC SPECTRAL CLASSIFICATION OF STARS USING MACHINE LEARNING: AN APPROAC...mlaij
With the increase in astronomical surveys, astronomers are faced with the challenging task of analyzing a
large amount of data in order to classify observed objects into hard-to-distinguish classes. This article
presents a machine learning-based method for the automatic spectral classification of stars from the latest
release of the SDSS database. We propose the combinatorial use of spectral data, derived stellar data, and
calculated data to create patterns. Using these patterns as inputs, we develop a Random Forest model that
outputs the spectral class of the observed star. Our model is able to classify data into six complex classes:
A, F, G, K, M, and Carbon stars. Due to the unbalanced nature of the data, we train our model considering
three data use cases: using the original data, using under-sampling, and over-sampling data techniques.
We further test our model by using a fixed dataset and a stratified dataset. From this, we analyze the
performance of our model through statistical metrics. The experimental results showed that the
combinatorial use of data as an input pattern contributes to improve the prediction scores in all data use
cases, meanwhile, the model trained with augmented data outperforms the other cases. Our results suggest
that machine learning-based spectral classification of stars may be useful for astronomers.
EXOPLANETS IDENTIFICATION AND CLUSTERING WITH MACHINE LEARNING METHODSmlaij
The discovery of habitable exoplanets has long been a heated topic in astronomy. Traditional methods for
exoplanet identification include the wobble method, direct imaging, gravitational microlensing, etc., which
not only require a considerable investment of manpower, time, and money, but also are limited by the
performance of astronomical telescopes. In this study, we proposed the idea of using machine learning
methods to identify exoplanets. We used the Kepler dataset collected by NASA from the Kepler Space
Observatory to conduct supervised learning, which predicts the existence of exoplanet candidates as a
three-categorical classification task, using decision tree, random forest, naïve Bayes, and neural network;
we used another NASA dataset consisted of the confirmed exoplanets data to conduct unsupervised
learning, which divides the confirmed exoplanets into different clusters, using k-means clustering. As a
result, our models achieved accuracies of 99.06%, 92.11%, 88.50%, and 99.79%, respectively, in the
supervised learning task and successfully obtained reasonable clusters in the unsupervised learning task.
A giant thin stellar stream in the Coma Galaxy ClusterSérgio Sacani
The study of dynamically cold stellar streams reveals information about the gravitational potential where they reside and provides
important constraints on the properties of dark matter. However, the intrinsic faintness of these streams makes their detection beyond
Local environments highly challenging. Here, we report the detection of an extremely faint stellar stream (µg,max = 29.5 mag arcsec−2
)
with an extraordinarily coherent and thin morphology in the Coma Galaxy Cluster. This Giant Coma Stream spans ∼510 kpc in length
and appears as a free-floating structure located at a projected distance of 0.8 Mpc from the center of Coma. We do not identify any
potential galaxy remnant or core, and the stream structure appears featureless in our data. We interpret the Giant Coma Stream as
being a recently accreted, tidally disrupting passive dwarf. Using the Illustris-TNG50 simulation, we identify a case with similar
characteristics, showing that, although rare, these types of streams are predicted to exist in Λ-CDM. Our work unveils the presence
of free-floating, extremely faint and thin stellar streams in galaxy clusters, widening the environmental context in which these objects
are found ahead of their promising future application in the study of the properties of dark matter.
Prospects for Detecting Gaps in Globular Cluster Stellar Streams in External ...Sérgio Sacani
Stellar streams form through the tidal disruption of satellite galaxies or globular clusters orbiting a
host galaxy. Globular cluster streams are exciting since they are thin (dynamically cold) and, therefore
sensitive to perturbations from low-mass subhalos. Since the subhalo mass function differs depending
on the dark matter composition, these gaps can provide unique constraints on dark matter models.
However, current samples are limited to the Milky Way. With its large field of view, deep imaging
sensitivity, and high angular resolution, the upcoming Nancy Grace Roman Space Telescope (Roman)
presents a unique opportunity to increase the number of observed streams and gaps significantly. This
paper presents a first exploration of the prospects for detecting gaps in streams in M31 and other
nearby galaxies with resolved stars. We simulate the formation of gaps in a Palomar-5-like stream
and generate mock observations of these gaps with background stars in M31 and the foreground Milky
Way stellar fields. We assess Roman’s ability to detect gaps out to 10 Mpc through visual inspection
and with the gap-finding tool FindTheGap. We conclude that gaps of ≈ 1.5 kpc in streams that are
created from subhalos of masses ≥ 5×106 M⊙ are detectable within a 2–3 Mpc volume in exposures of
1000s–1 hour. This volume contains ≈ 150 galaxies, including ≈ 8 galaxies with luminosities > 109 L⊙.
Large samples of stream gaps in external galaxies will open up a new era of statistical analyses of gap
characteristics in stellar streams and help constrain dark matter models.
A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...Sérgio Sacani
We conducted a search for narrowband radio signals over four observing sessions in 2020–2023 with
the L-band receiver (1.15–1.73 GHz) of the 100 m diameter Green Bank Telescope. We pointed the
telescope in the directions of 62 TESS Objects of Interest, capturing radio emissions from a total of
∼11,860 stars and planetary systems in the ∼9 arcminute beam of the telescope. All detections were
either automatically rejected or visually inspected and confirmed to be of anthropogenic nature. In
this work, we also quantified the end-to-end efficiency of radio SETI pipelines with a signal injection
and recovery analysis. The UCLA SETI pipeline recovers 94.0% of the injected signals over the usable
frequency range of the receiver and 98.7% of the injections when regions of dense RFI are excluded. In
another pipeline that uses incoherent sums of 51 consecutive spectra, the recovery rate is ∼15 times
smaller at ∼6%. The pipeline efficiency affects SETI search volume calculations as well as calculations
of upper bounds on the number of transmitting civilizations. We developed an improved Drake Figure
of Merit for SETI search volume calculations that includes the pipeline efficiency and frequency drift
rate coverage. Based on our observations, we found that there is a high probability (94.0–98.7%) that
fewer than ∼0.014% of stars earlier than M8 within 100 pc host a transmitter that is detectable in
our search (EIRP > 1012 W). Finally, we showed that the UCLA SETI pipeline natively detects the
signals detected with AI techniques by Ma et al. (2023).
DYNAMICAL ANALYSIS OF THE DARK MATTER AND CENTRAL BLACK HOLE MASS IN THE DWAR...Sérgio Sacani
We measure the central kinematics for the dwarf spheroidal galaxy Leo I using integrated-light measurements and
previously published data. We find a steady rise in the velocity dispersion from 30000 into the center. The integratedlight kinematics provide a velocity dispersion of 11.76±0.66 km s−1
inside 7500. After applying appropriate corrections
to crowding in the central regions, we achieve consistent velocity dispersion values using velocities from individual stars.
Crowding corrections need to be applied when targeting individual stars in high density stellar environments. From
integrated light, we measure the surface brightness profile and find a shallow cusp towards the center. Axisymmetric,
orbit-based models measure the stellar mass-to-light ratio, black hole mass and parameters for a dark matter halo. At
large radii it is important to consider possible tidal effects from the Milky Way so we include a variety of assumptions
regarding the tidal radius. For every set of assumptions, models require a central black hole consistent with a mass
3.3 ± 2×106 M. The no-black-hole case for any of our assumptions is excluded at over 95% significance, with
6.4 < ∆χ
2 < 14. A black hole of this mass would have significant effect on dwarf galaxy formation and evolution.
The dark halo parameters are heavily affected by the assumptions for the tidal radii, with the circular velocity only
constrained to be above 30 km s−1
. Reasonable assumptions for the tidal radius result in stellar orbits consistent with
an isotropic distribution in the velocities. These more realistic models only show strong constraints for the mass of
the central black hole.
Computational Training and Data Literacy for Domain ScientistsJoshua Bloom
Presented at the National Academy of Sciences (11 April 2014, Washington, D.C.) at the workshop "Training Students to Extract Value from Big Data.” Discussion of computational and programming education at UC Berkeley. Emphasis on Python as a glue/gateway language. An advocation for the notion of first teaching "Data Literacy" to domain scientists before teaching Big Data proficiency.
Signal Synchronization Strategies and Time Domain SETI with Gaia DR3Sérgio Sacani
Spatiotemporal techniques for signal coordination with actively transmitting extraterrestrial civilizations, without the need for prior communication, can constrain technosignature searches to a significantly smaller coordinate space. With the variable star
catalog from Gaia Data Release 3, we explore two related signaling strategies: the SETI
Ellipsoid, and that proposed by Seto, which are both based on the synchronization of
transmissions with a conspicuous astrophysical event. This dataset contains more than
10 million variable star candidates with light curves from the first three years of Gaia’s
operational phase, between 2014 and 2017. Using four different historical supernovae as
source events, we find that less than 0.01% of stars in the sample have crossing times,
the times at which we would expect to receive synchronized signals on Earth, within
the date range of available Gaia observations. For these stars, we present a framework
for technosignature analysis that searches for modulations in the variability parameters
by splitting the stellar light curve at the crossing time.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
The JWST Discovery of the Triply-imaged Type Ia “Supernova H0pe” and Observat...Sérgio Sacani
A Type Ia supernova (SN) at z = 1.78 was discovered in James Webb Space Telescope Near Infrared
Camera imaging of the galaxy cluster PLCK G165.7+67.0 (G165; z = 0.35). The SN is situated 1.5–
2 kpc from its host galaxy Arc 2 and appears in three different locations as a result of gravitational
lensing by G165. These data can yield a value for Hubble’s constant using time delays from this
multiply-imaged SN Ia that we call “SN H0pe.” Over the entire field we identified 21 image multiplicities,
confirmed five of them using Near-Infrared Spectrograph (NIRspec), and constructed a new
lens model that gives a total mass within 600 kpc of (2.6 ± 0.3) × 1014M⊙. The photometry uncovered
a galaxy overdensity at Arc 2’s redshift. NIRSpec confirmed six member galaxies, four of which
surround Arc 2 with relative velocity ≲900 km s−1 and projected physical extent ≲33 kpc. Arc 2
dominates the stellar mass ((5.0±0.1)×1011M⊙), which is a factor of ten higher than other members
of this compact galaxy group. These other group members have specific star formation rates (sSFR)
arXiv:2309.07326v1 [astro-ph.GA] 13 Sep 2023
2 Frye, Pascale, Pierel et al.
of 2–260 Gyr−1 derived from the Hα-line flux corrected for stellar absorption, dust extinction, and slit
losses. Another group centered on the dusty star forming galaxy Arc 1 is at z = 2.24. The total SFR
for the Arc 1 group (≳400M⊙ yr−1) translates to a supernova rate of ∼1 SNe yr−1, suggesting that
regular monitoring of this cluster may yield additional SNe.
Locating Hidden Exoplanets in ALMA Data Using Machine LearningSérgio Sacani
Exoplanets in protoplanetary disks cause localized deviations from Keplerian velocity in channel maps of
molecular line emission. Current methods of characterizing these deviations are time consuming,and there is no
unified standard approach. We demonstrate that machine learning can quickly and accurately detect the presence of
planets. We train our model on synthetic images generated from simulations and apply it to real observations to
identify forming planets in real systems. Machine-learning methods, based on computer vision, are not only
capable of correctly identifying the presence of one or more planets, but they can also correctly constrain the
location of those planets.
The massive relic galaxy NGC 1277 is dark matter deficient From dynamical mod...Sérgio Sacani
According to the Λ cold dark matter (ΛCDM) cosmology, present-day galaxies with stellar masses M? > 1011 M should contain
a sizable fraction of dark matter within their stellar body. Models indicate that in massive early-type galaxies (ETGs) with M? ≈
1.5 × 1011 M, dark matter should account for ∼15% of the dynamical mass within one effective radius (1 Re) and for ∼60% within
5 Re
. Most massive ETGs have been shaped through a two-phase process: the rapid growth of a compact core was followed by the
accretion of an extended envelope through mergers. The exceedingly rare galaxies that have avoided the second phase, the so-called
relic galaxies, are thought to be the frozen remains of the massive ETG population at z & 2. The best relic galaxy candidate discovered
to date is NGC 1277, in the Perseus cluster. We used deep integral field George and Cynthia Mitchel Spectrograph (GCMS) data to
revisit NGC 1277 out to an unprecedented radius of 6 kpc (corresponding to 5 Re). By using Jeans anisotropic modelling, we find
a negligible dark matter fraction within 5 Re (fDM(5 Re) < 0.05; two-sigma confidence level), which is in tension with the ΛCDM
expectation. Since the lack of an extended envelope would reduce dynamical friction and prevent the accretion of an envelope, we
propose that NGC 1277 lost its dark matter very early or that it was dark matter deficient ab initio. We discuss our discovery in the
framework of recent proposals, suggesting that some relic galaxies may result from dark matter stripping as they fell in and interacted
within galaxy clusters. Alternatively, NGC 1277 might have been born in a high-velocity collision of gas-rich proto-galactic fragments,
where dark matter left behind a disc of dissipative baryons. We speculate that the relative velocities of ≈2000 km s−1
required for the
latter process to happen were possible in the progenitors of the present-day rich galaxy clusters.
Similar to Astronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies (20)
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Sérgio Sacani
Recent discoveries of Earth-sized planets transiting nearby M dwarfs have made it possible to characterize the
atmospheres of terrestrial planets via follow-up spectroscopic observations. However, the number of such planets
receiving low insolation is still small, limiting our ability to understand the diversity of the atmospheric
composition and climates of temperate terrestrial planets. We report the discovery of an Earth-sized planet
transiting the nearby (12 pc) inactive M3.0 dwarf Gliese 12 (TOI-6251) with an orbital period (Porb) of 12.76 days.
The planet, Gliese 12 b, was initially identified as a candidate with an ambiguous Porb from TESS data. We
confirmed the transit signal and Porb using ground-based photometry with MuSCAT2 and MuSCAT3, and
validated the planetary nature of the signal using high-resolution images from Gemini/NIRI and Keck/NIRC2 as
well as radial velocity (RV) measurements from the InfraRed Doppler instrument on the Subaru 8.2 m telescope
and from CARMENES on the CAHA 3.5 m telescope. X-ray observations with XMM-Newton showed the host
star is inactive, with an X-ray-to-bolometric luminosity ratio of log 5.7 L L X bol » - . Joint analysis of the light
curves and RV measurements revealed that Gliese 12 b has a radius of 0.96 ± 0.05 R⊕,a3σ mass upper limit of
3.9 M⊕, and an equilibrium temperature of 315 ± 6 K assuming zero albedo. The transmission spectroscopy metric
(TSM) value of Gliese 12 b is close to the TSM values of the TRAPPIST-1 planets, adding Gliese 12 b to the small
list of potentially terrestrial, temperate planets amenable to atmospheric characterization with JWST.
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...Sérgio Sacani
We report on the discovery of Gliese 12 b, the nearest transiting temperate, Earth-sized planet found to date. Gliese 12 is a
bright (V = 12.6 mag, K = 7.8 mag) metal-poor M4V star only 12.162 ± 0.005 pc away from the Solar system with one of the
lowest stellar activity levels known for M-dwarfs. A planet candidate was detected by TESS based on only 3 transits in sectors
42, 43, and 57, with an ambiguity in the orbital period due to observational gaps. We performed follow-up transit observations
with CHEOPS and ground-based photometry with MINERVA-Australis, SPECULOOS, and Purple Mountain Observatory,
as well as further TESS observations in sector 70. We statistically validate Gliese 12 b as a planet with an orbital period of
12.76144 ± 0.00006 d and a radius of 1.0 ± 0.1 R⊕, resulting in an equilibrium temperature of ∼315 K. Gliese 12 b has excellent
future prospects for precise mass measurement, which may inform how planetary internal structure is affected by the stellar
compositional environment. Gliese 12 b also represents one of the best targets to study whether Earth-like planets orbiting cool
stars can retain their atmospheres, a crucial step to advance our understanding of habitability on Earth and across the galaxy.
The importance of continents, oceans and plate tectonics for the evolution of...Sérgio Sacani
Within the uncertainties of involved astronomical and biological parameters, the Drake Equation
typically predicts that there should be many exoplanets in our galaxy hosting active, communicative
civilizations (ACCs). These optimistic calculations are however not supported by evidence, which is
often referred to as the Fermi Paradox. Here, we elaborate on this long-standing enigma by showing
the importance of planetary tectonic style for biological evolution. We summarize growing evidence
that a prolonged transition from Mesoproterozoic active single lid tectonics (1.6 to 1.0 Ga) to modern
plate tectonics occurred in the Neoproterozoic Era (1.0 to 0.541 Ga), which dramatically accelerated
emergence and evolution of complex species. We further suggest that both continents and oceans
are required for ACCs because early evolution of simple life must happen in water but late evolution
of advanced life capable of creating technology must happen on land. We resolve the Fermi Paradox
(1) by adding two additional terms to the Drake Equation: foc
(the fraction of habitable exoplanets
with significant continents and oceans) and fpt
(the fraction of habitable exoplanets with significant
continents and oceans that have had plate tectonics operating for at least 0.5 Ga); and (2) by
demonstrating that the product of foc
and fpt
is very small (< 0.00003–0.002). We propose that the lack
of evidence for ACCs reflects the scarcity of long-lived plate tectonics and/or continents and oceans on
exoplanets with primitive life.
A Giant Impact Origin for the First Subduction on EarthSérgio Sacani
Hadean zircons provide a potential record of Earth's earliest subduction 4.3 billion years ago. Itremains enigmatic how subduction could be initiated so soon after the presumably Moon‐forming giant impact(MGI). Earlier studies found an increase in Earth's core‐mantle boundary (CMB) temperature due to theaccumulation of the impactor's core, and our recent work shows Earth's lower mantle remains largely solid, withsome of the impactor's mantle potentially surviving as the large low‐shear velocity provinces (LLSVPs). Here,we show that a hot post‐impact CMB drives the initiation of strong mantle plumes that can induce subductioninitiation ∼200 Myr after the MGI. 2D and 3D thermomechanical computations show that a high CMBtemperature is the primary factor triggering early subduction, with enrichment of heat‐producing elements inLLSVPs as another potential factor. The models link the earliest subduction to the MGI with implications forunderstanding the diverse tectonic regimes of rocky planets.
Climate extremes likely to drive land mammal extinction during next supercont...Sérgio Sacani
Mammals have dominated Earth for approximately 55 Myr thanks to their
adaptations and resilience to warming and cooling during the Cenozoic. All
life will eventually perish in a runaway greenhouse once absorbed solar
radiation exceeds the emission of thermal radiation in several billions of
years. However, conditions rendering the Earth naturally inhospitable to
mammals may develop sooner because of long-term processes linked to
plate tectonics (short-term perturbations are not considered here). In
~250 Myr, all continents will converge to form Earth’s next supercontinent,
Pangea Ultima. A natural consequence of the creation and decay of Pangea
Ultima will be extremes in pCO2 due to changes in volcanic rifting and
outgassing. Here we show that increased pCO2, solar energy (F⨀;
approximately +2.5% W m−2 greater than today) and continentality (larger
range in temperatures away from the ocean) lead to increasing warming
hostile to mammalian life. We assess their impact on mammalian
physiological limits (dry bulb, wet bulb and Humidex heat stress indicators)
as well as a planetary habitability index. Given mammals’ continued survival,
predicted background pCO2 levels of 410–816 ppm combined with increased
F⨀ will probably lead to a climate tipping point and their mass extinction.
The results also highlight how global landmass configuration, pCO2 and F⨀
play a critical role in planetary habitability.
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Sérgio Sacani
The recently reported observation of VFTS 243 is the first example of a massive black-hole binary
system with negligible binary interaction following black-hole formation. The black-hole mass (≈10M⊙)
and near-circular orbit (e ≈ 0.02) of VFTS 243 suggest that the progenitor star experienced complete
collapse, with energy-momentum being lost predominantly through neutrinos. VFTS 243 enables us to
constrain the natal kick and neutrino-emission asymmetry during black-hole formation. At 68% confidence
level, the natal kick velocity (mass decrement) is ≲10 km=s (≲1.0M⊙), with a full probability distribution
that peaks when ≈0.3M⊙ were ejected, presumably in neutrinos, and the black hole experienced a natal
kick of 4 km=s. The neutrino-emission asymmetry is ≲4%, with best fit values of ∼0–0.2%. Such a small
neutrino natal kick accompanying black-hole formation is in agreement with theoretical predictions.
Detectability of Solar Panels as a TechnosignatureSérgio Sacani
In this work, we assess the potential detectability of solar panels made of silicon on an Earth-like
exoplanet as a potential technosignature. Silicon-based photovoltaic cells have high reflectance in the
UV-VIS and in the near-IR, within the wavelength range of a space-based flagship mission concept
like the Habitable Worlds Observatory (HWO). Assuming that only solar energy is used to provide
the 2022 human energy needs with a land cover of ∼ 2.4%, and projecting the future energy demand
assuming various growth-rate scenarios, we assess the detectability with an 8 m HWO-like telescope.
Assuming the most favorable viewing orientation, and focusing on the strong absorption edge in the
ultraviolet-to-visible (0.34 − 0.52 µm), we find that several 100s of hours of observation time is needed
to reach a SNR of 5 for an Earth-like planet around a Sun-like star at 10pc, even with a solar panel
coverage of ∼ 23% land coverage of a future Earth. We discuss the necessity of concepts like Kardeshev
Type I/II civilizations and Dyson spheres, which would aim to harness vast amounts of energy. Even
with much larger populations than today, the total energy use of human civilization would be orders of
magnitude below the threshold for causing direct thermal heating or reaching the scale of a Kardashev
Type I civilization. Any extraterrrestrial civilization that likewise achieves sustainable population
levels may also find a limit on its need to expand, which suggests that a galaxy-spanning civilization
as imagined in the Fermi paradox may not exist.
Jet reorientation in central galaxies of clusters and groups: insights from V...Sérgio Sacani
Recent observations of galaxy clusters and groups with misalignments between their central AGN jets
and X-ray cavities, or with multiple misaligned cavities, have raised concerns about the jet – bubble
connection in cooling cores, and the processes responsible for jet realignment. To investigate the
frequency and causes of such misalignments, we construct a sample of 16 cool core galaxy clusters and
groups. Using VLBA radio data we measure the parsec-scale position angle of the jets, and compare
it with the position angle of the X-ray cavities detected in Chandra data. Using the overall sample
and selected subsets, we consistently find that there is a 30% – 38% chance to find a misalignment
larger than ∆Ψ = 45◦ when observing a cluster/group with a detected jet and at least one cavity. We
determine that projection may account for an apparently large ∆Ψ only in a fraction of objects (∼35%),
and given that gas dynamical disturbances (as sloshing) are found in both aligned and misaligned
systems, we exclude environmental perturbation as the main driver of cavity – jet misalignment.
Moreover, we find that large misalignments (up to ∼ 90◦
) are favored over smaller ones (45◦ ≤ ∆Ψ ≤
70◦
), and that the change in jet direction can occur on timescales between one and a few tens of Myr.
We conclude that misalignments are more likely related to actual reorientation of the jet axis, and we
discuss several engine-based mechanisms that may cause these dramatic changes.
The solar dynamo begins near the surfaceSérgio Sacani
The magnetic dynamo cycle of the Sun features a distinct pattern: a propagating
region of sunspot emergence appears around 30° latitude and vanishes near the
equator every 11 years (ref. 1). Moreover, longitudinal flows called torsional oscillations
closely shadow sunspot migration, undoubtedly sharing a common cause2. Contrary
to theories suggesting deep origins of these phenomena, helioseismology pinpoints
low-latitude torsional oscillations to the outer 5–10% of the Sun, the near-surface
shear layer3,4. Within this zone, inwardly increasing differential rotation coupled with
a poloidal magnetic field strongly implicates the magneto-rotational instability5,6,
prominent in accretion-disk theory and observed in laboratory experiments7.
Together, these two facts prompt the general question: whether the solar dynamo is
possibly a near-surface instability. Here we report strong affirmative evidence in stark
contrast to traditional models8 focusing on the deeper tachocline. Simple analytic
estimates show that the near-surface magneto-rotational instability better explains
the spatiotemporal scales of the torsional oscillations and inferred subsurface
magnetic field amplitudes9. State-of-the-art numerical simulations corroborate these
estimates and reproduce hemispherical magnetic current helicity laws10. The dynamo
resulting from a well-understood near-surface phenomenon improves prospects
for accurate predictions of full magnetic cycles and space weather, affecting the
electromagnetic infrastructure of Earth.
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Sérgio Sacani
In the Nice model of solar system formation, Uranus and Neptune undergo an orbital upheaval,
sweeping through a planetesimal disk. The region of the disk from which material is accreted by
the ice giants during this phase of their evolution has not previously been identified. We perform
direct N-body orbital simulations of the four giant planets to determine the amount and origin of solid
accretion during this orbital upheaval. We find that the ice giants undergo an extreme bombardment
event, with collision rates as much as ∼3 per hour assuming km-sized planetesimals, increasing the
total planet mass by up to ∼0.35%. In all cases, the initially outermost ice giant experiences the
largest total enhancement. We determine that for some plausible planetesimal properties, the resulting
atmospheric enrichment could potentially produce sufficient latent heat to alter the planetary cooling
timescale according to existing models. Our findings suggest that substantial accretion during this
phase of planetary evolution may have been sufficient to impact the atmospheric composition and
thermal evolution of the ice giants, motivating future work on the fate of deposited solid material.
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Sérgio Sacani
The highest priority recommendation of the Astro2020 Decadal Survey for space-based astronomy
was the construction of an observatory capable of characterizing habitable worlds. In this paper series
we explore the detectability of and interference from exomoons and exorings serendipitously observed
with the proposed Habitable Worlds Observatory (HWO) as it seeks to characterize exoplanets, starting
in this manuscript with Earth-Moon analog mutual events. Unlike transits, which only occur in systems
viewed near edge-on, shadow (i.e., solar eclipse) and lunar eclipse mutual events occur in almost every
star-planet-moon system. The cadence of these events can vary widely from ∼yearly to multiple events
per day, as was the case in our younger Earth-Moon system. Leveraging previous space-based (EPOXI)
lightcurves of a Moon transit and performance predictions from the LUVOIR-B concept, we derive
the detectability of Moon analogs with HWO. We determine that Earth-Moon analogs are detectable
with observation of ∼2-20 mutual events for systems within 10 pc, and larger moons should remain
detectable out to 20 pc. We explore the extent to which exomoon mutual events can mimic planet
features and weather. We find that HWO wavelength coverage in the near-IR, specifically in the 1.4 µm
water band where large moons can outshine their host planet, will aid in differentiating exomoon signals
from exoplanet variability. Finally, we predict that exomoons formed through collision processes akin
to our Moon are more likely to be detected in younger systems, where shorter orbital periods and
favorable geometry enhance the probability and frequency of mutual events.
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...Sérgio Sacani
Mars is a particularly attractive candidate among known astronomical objects
to potentially host life. Results from space exploration missions have provided
insights into Martian geochemistry that indicate oxychlorine species, particularly perchlorate, are ubiquitous features of the Martian geochemical landscape. Perchlorate presents potential obstacles for known forms of life due to
its toxicity. However, it can also provide potential benefits, such as producing
brines by deliquescence, like those thought to exist on present-day Mars. Here
we show perchlorate brines support folding and catalysis of functional RNAs,
while inactivating representative protein enzymes. Additionally, we show
perchlorate and other oxychlorine species enable ribozyme functions,
including homeostasis-like regulatory behavior and ribozyme-catalyzed
chlorination of organic molecules. We suggest nucleic acids are uniquely wellsuited to hypersaline Martian environments. Furthermore, Martian near- or
subsurface oxychlorine brines, and brines found in potential lifeforms, could
provide a unique niche for biomolecular evolution.
Continuum emission from within the plunging region of black hole discsSérgio Sacani
The thermal continuum emission observed from accreting black holes across X-ray bands has the potential to be leveraged as a
powerful probe of the mass and spin of the central black hole. The vast majority of existing ‘continuum fitting’ models neglect
emission sourced at and within the innermost stable circular orbit (ISCO) of the black hole. Numerical simulations, however,
find non-zero emission sourced from these regions. In this work, we extend existing techniques by including the emission
sourced from within the plunging region, utilizing new analytical models that reproduce the properties of numerical accretion
simulations. We show that in general the neglected intra-ISCO emission produces a hot-and-small quasi-blackbody component,
but can also produce a weak power-law tail for more extreme parameter regions. A similar hot-and-small blackbody component
has been added in by hand in an ad hoc manner to previous analyses of X-ray binary spectra. We show that the X-ray spectrum
of MAXI J1820+070 in a soft-state outburst is extremely well described by a full Kerr black hole disc, while conventional
models that neglect intra-ISCO emission are unable to reproduce the data. We believe this represents the first robust detection of
intra-ISCO emission in the literature, and allows additional constraints to be placed on the MAXI J1820 + 070 black hole spin
which must be low a• < 0.5 to allow a detectable intra-ISCO region. Emission from within the ISCO is the dominant emission
component in the MAXI J1820 + 070 spectrum between 6 and 10 keV, highlighting the necessity of including this region. Our
continuum fitting model is made publicly available.
WASP-69b’s Escaping Envelope Is Confined to a Tail Extending at Least 7 RpSérgio Sacani
Studying the escaping atmospheres of highly irradiated exoplanets is critical for understanding the physical
mechanisms that shape the demographics of close-in planets. A number of planetary outflows have been observed
as excess H/He absorption during/after transit. Such an outflow has been observed for WASP-69b by multiple
groups that disagree on the geometry and velocity structure of the outflow. Here, we report the detection of this
planet’s outflow using Keck/NIRSPEC for the first time. We observed the outflow 1.28 hr after egress until the
target set, demonstrating the outflow extends at least 5.8 × 105 km or 7.5 Rp This detection is significantly longer
than previous observations, which report an outflow extending ∼2.2 planet radii just 1 yr prior. The outflow is
blueshifted by −23 km s−1 in the planetary rest frame. We estimate a current mass-loss rate of 1 M⊕ Gyr−1
. Our
observations are most consistent with an outflow that is strongly sculpted by ram pressure from the stellar wind.
However, potential variability in the outflow could be due to time-varying interactions with the stellar wind or
differences in instrumental precision.
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneySérgio Sacani
Using deep archival observations from the Chandra X-ray Observatory, we present an analysis of
linear X-ray-emitting features located within the southern portion of the Galactic center chimney,
and oriented orthogonal to the Galactic plane, centered at coordinates l = 0.08◦
, b = −1.42◦
. The
surface brightness and hardness ratio patterns are suggestive of a cylindrical morphology which may
have been produced by a plasma outflow channel extending from the Galactic center. Our fits of the
feature’s spectra favor a complex two-component model consisting of thermal and recombining plasma
components, possibly a sign of shock compression or heating of the interstellar medium by outflowing
material. Assuming a recombining plasma scenario, we further estimate the cooling timescale of this
plasma to be on the order of a few hundred to thousands of years, leading us to speculate that a
sequence of accretion events onto the Galactic Black Hole may be a plausible quasi-continuous energy
source to sustain the observed morphology
Efficient spin-up of Earth System Models usingsequence accelerationSérgio Sacani
Marine and terrestrial biogeochemical models are key components of the Earth System Models (ESMs) used toproject future environmental changes. However, their slow adjustment time also hinders effective use of ESMsbecause of the enormous computational resources required to integrate them to a pre-industrial equilibrium. Here,a solution to this "spin-up" problem based on "sequence acceleration", is shown to accelerate equilibration of state-of-the-art marine biogeochemical models by over an order of magnitude. The technique can be applied in a "blackbox" fashion to existing models. Even under the challenging spin-up protocols used for Intergovernmental Panelon Climate Change (IPCC) simulations, this algorithm is 5 times faster. Preliminary results suggest that terrestrialmodels can be similarly accelerated, enabling a quantification of major parametric uncertainties in ESMs, improvedestimates of metrics such as climate sensitivity, and higher model resolution than currently feasible.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
2. 2 V. Etsebeth
ing has been applied to various data types. For instance, it has been
used to identify outliers in SDSS spectroscopic data through adapta-
tions of techniques like random forests (Baron & Poznanski 2017).
Similarly, anomalies in Kepler light curves have been successfully
detected (Giles & Walkowicz 2019), and generative adversarial net-
works have been employed for anomaly detection in optical images
(Storey-Fisher et al. 2021). Solarz et al. (2017) showed that anoma-
lies vary in relevance depending on the specific study. Anomalies
may range from artefacts, which are of interest to system opera-
tors but considered contaminants by astronomers, to rare but known
sources such as strong gravitational lenses, which have an estimated
occurrence rate of just 1 in 104 (Huang et al. 2021).
Unsupervised machine learning anomaly detection methods have
been implemented successfully in astronomy when combined with
additional techniques. Lochner & Bassett (2021) developed a gen-
eral framework for anomaly detection that incorporates a novel active
learning approach. astronomaly combines active learning with per-
sonalised user feedback, enabling users to interactively label objects
and refine the anomaly detection process. Moreover, astronomaly
has the capability to handle various types of data, including images,
spectra, or time series data, and can leverage domain knowledge
and user preferences to improve detection accuracy and efficiency.
astronomaly was applied to the Galaxy Zoo3 dataset, as well as
on simulated data in order to evaluate the performance of the ac-
tive learning technique. In both cases, the active learning approach
of astronomaly almost doubled the number of interesting anoma-
lies detected in the first 100 objects viewed by the user when com-
pared to the popular anomaly detection algorithm, iForest (Liu et al.
2008). Walmsley et al. (2022) adapted astronomaly to include a
deep learning approach combined with a novel active learning algo-
rithm. A convolutional neural network (CNN) was used to learn a
low-dimensional representation that captures the salient features of
galaxy images. Additionally, the regressor that models user interest:
Random Forest (Breiman 2001) in astronomaly, was replaced by
a Gaussian process (GP, Rasmussen & Williams 2006) that allowed
the use of an acquisition function to more optimally select targets for
user labelling.
Anomaly detection is a challenging task with very few studies
done on a large scale in astronomy. While astronomaly has been
shown to be effective in detecting anomalies, it has also yet to be
used on a large dataset. The main objective of this work is to test the
capabilities and limitations of astronomaly by applying it to a large
subset of the Dark Energy Camera Legacy Survey (DECaLS, Dey
et al. 2019). DECaLS has yet to be extensively studied, making it
excellent for searching for undiscovered anomalies. This will evaluate
the performance and scalability of astronomaly as well as provide
the opportunity to make new discoveries.
The paper is structured as follows: Section 2 covers DECaLS sub-
set selection criteria and data pre-processing. An evaluation set is
also created, which is used to test the performance of the different
active learning methods mentioned previously. Section 3 discusses
the astronomaly framework, including the algorithms and param-
eters used for this work. Section 4.1 presents the findings from the
evaluation set, followed by the results for the main DECaLS subset
in Section 4.2. Lastly, Section 4.3 highlights some of the interesting
anomalies detected.
3 https://www.kaggle.com/c/galaxy-zoo-the-galaxy-challenge
2 DATA
The Dark Energy Spectroscopic Instrument (DESI) Legacy Surveys
encompass three distinct imaging surveys: the Beijing-Arizona Sky
Survey (BASS)4, the Mayall 𝑧-band Legacy Survey (MzLS)5, and
the Dark Energy Camera Legacy Survey (DECaLS). These surveys
are enriched with data from the Dark Energy Survey (DES, The
Dark Energy Survey Collaboration 2005) and select observations
from the Wide-Field Infrared Survey Explorer (WISE, Wright et al.
2010), notably the W1 and W2 bands, to provide additional colour
information. This work makes use of data from the 8th Public Data
Release (DR8)6, of DECaLS, using the three optical bands 𝑔, 𝑟 and
𝑧. DECalS reaches depths of magnitude 24, 23.4 and 22.5 in bands 𝑔,
𝑟 and 𝑧 respectively, which is significantly deeper than the previous
large optical survey SDSS.
2.1 Selection Cuts
DR8 of the Legacy Surveys contains more than 1.6 billion unique
sources. The quantity of data alone requires a subset to be selected
due to limitations associated with downloading and storing such
significant amounts of data. More information on the sources, storage
and computational requirements can be found in Appendix A. This is
a common challenge and as a result colour or magnitude constraints
are often implemented on target data (Sridhar et al. 2020; Walmsley
et al. 2022). DECaLS also contains a number of sources that clearly
do not have anomalous morphologies, such as stars, which should
ideally be removed before searching for anomalies. While applying
strict selection cuts can result in higher data quality and reduce the
impact of artefacts and biases, it can also limit the scope of the
analysis and remove potentially interesting anomalies. In this paper,
we minimise the cuts used in order to produce a subset that is as
inclusive as possible, but still of a manageable size for the resources
available.
In order to implement appropriate selection cuts, we make use
of the processing that has already been applied to DECaLS data to
identify artefacts and classify sources, as outlined in Dey et al. (2019).
It is important to acknowledge that the calibrations and techniques
employed were not perfect. One area where challenges persist is
in the detection and masking of artefacts and other bright sources.
While the flagging from Dey et al. (2019) allowed the removal of the
bulk of these sources, a number of artefacts remained, which will be
discussed in later sections. Here we briefly outline the final selection
cuts used, which are described in more detail in Appendix B.
First, we required that none of the bands has any masked sources,
which are generally masked due to artefacts or bright foreground
stars.
Second, we ensure all sources are well-fit by a standard galaxy
profile model. The DECaLS pipeline fits several models, including
an exponential and de Vaucouleurs (de Vaucouleurs 1948) model,
reporting on the reduced chi-square for the best-fitting model. If this
is larger than one in any band, it indicates a poor fit and the source is
rejected.
A positive signal-to-noise (SNR) ratio in all three bands was also
imposed. This cut also eliminated the observations that do not have
passes in all three bands. We also applied a minimum flux threshold
in any of the bands to remove sources that are too faint to be dis-
tinguishable. The threshold value was based on visual inspections
4 https://batc.bao.ac.cn/BASS/doku.php
5 https://www.legacysurvey.org/mzls/
6 https://www.legacysurvey.org/dr8/
MNRAS 000, 1–15 (2023)
3. Astronomaly at Scale 3
and was chosen to ensure that only relatively bright sources were
included. Sources were also selected based on their size, specifically
whether they have a large enough radius in either of the two DECaLS
models available (de Vaucouleurs or exponential). This cut reduced
the number of compact objects that are unlikely to be resolved, fo-
cusing on more extended sources. These cuts were carefully chosen
after testing various combinations of them and other criteria that are
not listed in the final selection. For a more detailed description of the
selection criteria, including the exact query used, see Appendix B.
Mao et al. (2021) imposed additional cuts by restricting the model
fitting the parameters rchi_g, rchi_r and rchi_z, but it was seen that
this removed a significant number of interesting sources when we
applied this to a small subset of data, so they are not implemented.
The DECaLS catalogue consists of 871 359 667 resolved (non-PSF)
sources, which was reduced to 3 884 404 after applying the above
selection cuts. This subset will be referred to as the main subset for
the remainder of the paper.
2.2 Image Cutout Sizes
Our methodology includes using a convolutional neural network
(CNN) to extract good representations of the source images as was
done in Walmsley et al. (2022). CNNs generally require input im-
ages of identical dimensions, which makes choosing a cutout size
important. Moreover, since machine learning algorithms are affected
by the angular source size in the image (Slijepcevic et al. 2023), we
also needed to individually adjust the scale of the input images such
that the sources are similarly sized.
Walmsley et al. (2022) employed a fixed image size that was pos-
sible because of the availability of the Petrosian radius (Petrosian
1976) for the sources, which allowed the pixel count that covered
the target source to be estimated. Images were obtained from the
DECaLS cutout service, using the native telescope resolution and
adjusting the visible sky area to match the radius determined by the
Petrosian radii. Finally, the images in Walmsley et al. (2022) were
interpolated to 424x424 pixels and colourised for viewing purposes.
However, the Petrosian radius is not available for the entire DECaLS
dataset. Furthermore, the radii included in the DECaLS catalogue,
which are derived by model fitting, were generally found to be much
larger than the apparent visible extent of the target source. As a re-
sult, the above approach could not be directly replicated for the range
of target sources used in the subset. Thus, we needed an alterna-
tive method to roughly estimate the appropriate cutout size for each
source.
DESI has objective requirements for optical imaging, one of which
is to achieve the magnitude depths of at least 24, 23.4, and 22.5 in the
𝑔-, 𝑟-, and 𝑧-bands, respectively. These are defined as the “optimal
extraction depth” of galaxies near the DESI depth limit. The definition
of such a galaxy is an exponential profile with a half-light radius of
0.45 arcseconds. An important part of such a profile is the ability to
get a good estimate of the number of photometric pixels that make
up an image of the galaxy using the equation:
𝑁𝑒 𝑓 𝑓 =
4𝜋𝜎2
1
𝑝
+
8.91𝑟2
ℎ𝑎𝑙 𝑓
1
𝑝
𝑝
(1)
where 𝜎 is the standard deviation for a Gaussian fit, p = 1.15 and
𝑟ℎ𝑎𝑙 𝑓 is the half-light radius for an exponential profile fit to the galaxy
(Dey et al. 2019). The equation approximates an exponential fit to a
galaxy even though some sources follow a de Vaucouleurs profile,
while others have a composite profile. This equation estimates the
Figure 1. Distribution of the number of sources with varying pixel values as
determined by Equation 1 for a random sample of sources. The number of
pixels is given by the square root of 𝑁𝑒 𝑓 𝑓 in Equation 1.
size of each source, and a cutout could be extracted as a result. We
performed visual tests on a sample of sources to confirm that the
estimated cutout size was sufficient.
Figure 1 illustrates the distribution of source numbers based on
their respective pixel sizes, as determined by Equation 1, for a ran-
dom sample of sources. The majority of the sources within this
random sample had values of between 100 and 200 pixels. To en-
sure consistency, all source cutouts used throughout were therefore
interpolated to a standardised, yet arbitrary, resolution of 150 by
150 pixels. This size discrepancy, significantly fewer pixels than the
424 used in Walmsley et al. (2022), is attributed to the inclusion of
markedly smaller and fainter sources that dominate the population
of the subsets investigated. It is worth noting that the secondary peak
in Figure 1, situated at approximately 350 pixels, corresponds to
artefacts present within the random sample used.
2.3 Finding An Evaluation Set
A labelled subset of data plays a crucial role in the process of al-
gorithm selection and hyperparameter optimisation. Additionally,
it serves as a valuable resource for evaluating the performance of
anomaly detection techniques. To construct this evaluation subset,
we randomly selected 15 000 sources from the complete DECaLS
dataset and supplemented them with DECaLS cutouts for 342 lens
candidates sourced from Huang et al. (2020). These lenses were in-
tentionally included to ensure the presence of interesting anomalies
within the subset.
To ensure consistency and adherence to the criteria described in
Section 2.1, the same selection cuts were applied to this random set.
Table 1 summarises the number of sources excluded by each selection
cut. We found that a surprisingly high number of the lens candidates
did not meet the selection criteria, with only 87 candidates passing
all of the applied cuts, implying that the chosen cuts are not optimal
for lens inclusion. Some lens candidates failed multiple cuts, and it
was observed that three of them were incorrectly labelled as Point
Spread Functions (PSFs) in the catalogue.
After an extensive search, no generalisable criteria could be found
that contain a large sample of lenses while also reducing the size of
the entire dataset to a manageable level. We thus elected to continue
MNRAS 000, 1–15 (2023)
4. 4 V. Etsebeth
Table 1. The results of applying different selection criteria to the evaluation
set. Each row represents a criterion that was applied. The first column indicates
the name of the criterion, the second column shows how many lenses (out
of the 342) were excluded by that criterion, and the third column shows how
many of the 15 342 sources were excluded by that criterion.
Selection Cut Lenses All Sources
Type (non-PSF) 3 3
Allmask 0 0
Reduced Chi-Squared 16 1840
Signal-to-Noise Ratio 1 1
Flux Values 19 3281
Shapedev_r and Shapeexp_r 244 9513
with a sample of 5000 random sources, and the 87 lenses, that passed
all selection cuts. This subset of sources was then fully labelled using
the labelling scheme described in Section 3 and illustrated in Figure
2. It is worth noting that even such a small sample is still expected to
contain other interesting anomalies, including moderately interesting
sources like mergers, in addition to the lenses that have been added
to it. This subset formed the evaluation set that was used to assess
the performance of the methods used.
3 METHODOLOGY
3.1 Feature Extraction
Images are typically high-dimensional and often require dimen-
sionality reduction for computational efficiency. Feature extraction
achieves this by transforming images into lower-dimensional vectors
that retain their essential information. This is done using an image
representation function that maps images to vectors while maintain-
ing similarity.
To create the representations of the images that will serve as fea-
tures in this work, a pre-trained convolutional neural network (CNN)
was used, following the approach of Walmsley et al. (2022). The
CNN model was initially trained on a complex classification task,
but its remarkable capability extends to learning relevant features for
different tasks beyond its original training purpose.
CNNs have multiple layers, each of which performs a different
transformation on the input image. Only the last layer performs clas-
sification by generating a probability distribution over the image’s
possible classes. The rest of the network extracts image information
for the classification layer, producing a vector which can be used
as features. The CNN in Walmsley et al. (2022) uses the Efficient-
NetB0 architecture (Tan Le 2020) and is implemented in Zoobot
Walmsley et al. (2023) for various galaxy classification purposes.
To extract features from the cutout images, we employed the same
model as described by Walmsley et al. (2019). However, we used a
different pre-processing method to better suit our larger and noisier
dataset. Since we use the estimated size of the source to extract the
cutout, cropping is not required for our dataset. No augmentations of
the images were done when passed to the CNN as the network was
used for feature extraction only. A standard sigma clipping algorithm
from the Astropy python package (The Astropy Collaboration et al.
(2013, 2018, 2022)) was applied to each image. The algorithm uses
an iterative approach to estimate the noise in the image and masks all
pixels below the 3𝜎 threshold. Finally, the images were greyscaled
by averaging the three bands, 𝑔, 𝑟 and 𝑧, into a single band.
The CNN produces a vector that contains 1280 features. This
feature vector is still a high-dimensional representation that poses
computational challenges. To reduce the dimensionality but still re-
tain most of the information, we made use of Principal Component
Analysis (PCA, Pearson (1901)). PCA is a statistical method that
transforms a set of correlated variables into a set of uncorrelated
variables called principal components, which account for most of
the variance in the original data (Shlens 2014). By setting a variance
limit of 95%, the principal components retain 95% of the information
in the original features, while reducing the dimensionality from 1280
to 26. After the features were extracted and PCA applied to reduce
the dimensionality, an anomaly detection procedure similar to that
of Lochner Bassett (2021) was applied, using the image represen-
tations as features instead of the simple morphological features that
were used in that work.
3.2 Anomaly Detection
astronomaly incorporates two widely used anomaly detection al-
gorithms: isolation forest (iForest) and the local outlier factor (LOF)
algorithm (Liu et al. 2008; Breunig et al. 2000) with both algorithms
implemented using the scikit-learn package (Pedregosa et al. 2011).
We conducted tests and determined that LOF could not effectively
scale to handle the volume of data in the main subset, while iForest
proved scalable to such volumes. iForest is a fast algorithm that de-
tects anomalies by employing decision trees. It isolates a data point
by randomly selecting a feature and a value within its range, then
recursively splits the data into subsections based on this value. This
process continues until all data points are isolated, forming a forest
of trees. The number of splits required to isolate a point is referred
to as the path length, and it serves as a measure of the anomaly score
for that point. The underlying idea is that anomalies are more likely
to be isolated from the rest of the data, with fewer splits compared
to normal points. Therefore, the shorter the path length, the more
anomalous the point is considered.
iForest assigns an anomaly score to each source in the dataset,
determining how it differs from the majority of sources. We normalise
this score such that a higher value represents how anomalous the
object is and thus determine a ranked order of the entire subset from
most to least anomalous. Active learning (AL) can then be applied,
where the graphical interface of astronomaly allows the users to
manually rate the objects on a scale of 0 to 5 according to how
interesting they are.
AL is an approach that allows algorithms to select the most in-
formative examples to label in order to improve model performance.
The classification of anomalies is often subjective and depends on
the user’s judgement to identify objects of particular interest. By
focusing on the relevant samples that provide the most information,
AL aims to reduce the amount of labelled data needed. This is useful
in situations where labelling data is expensive and time-consuming
and requires human expertise, or where no labels exist and only a
subset of the data is sufficient to achieve good results. astronomaly
uses these user-provided labels to update the anomaly scores for the
entire dataset, leading to a revised ranking order.
Two different AL approaches were evaluated in this work. The first
approach is the novel AL algorithm introduced by Lochner Bassett
(2021), which is integrated into astronomaly and referred to in this
study as the neighbour score (NS) algorithm. The second is that used
in Walmsley et al. (2022); a Gaussian process (GP, Rasmussen
Williams 2006) that can model smooth distributions, hereon referred
to as the direct regression (DR) algorithm.
The primary objective of the NS algorithm is to adjust anomaly
scores based on user-provided labels. It uses training data consisting
of a small number of human-provided labels and employs a random
MNRAS 000, 1–15 (2023)
5. Astronomaly at Scale 5
forest regression algorithm (Liaw Wiener 2002) to predict user
scores for all data instances based on these labels. The method calcu-
lates the distance of each instance to its nearest labelled neighbour,
effectively determining regions within the feature space where the
algorithm exhibits uncertainty. In these uncertain regions, it returns
scores that are close to the original anomaly scores. Conversely, in
regions with ample training data, the predicted user scores modu-
late the anomaly scores. The algorithm employs a scoring function
that combines nearest neighbour distances, predicted user scores, and
original anomaly scores to compute the final anomaly scores for each
instance.
In contrast, the direct regression approach skips entirely the truly
unsupervised anomaly detection step. Instead, it attempts to use ac-
tive learning to directly model the user’s “interestingness” score. The
approach of Walmsley et al. (2021) is to select a small set of random
examples for labelling and then iteratively query the user (sometimes
called the oracle in machine learning literature) with a new set of ex-
amples explicitly selected, using an “acquisition” function, in order
to improve the regression algorithm. While any regression algorithm
can be used, it must be able to produce uncertainty information in
order to compute the acquisition function. Gaussian processes are
ideally suited to this task as they model uncertain or noisy functions.
They assign probabilities to potential data-fitting functions and up-
date these probabilities as data points are labelled, transitioning from
a prior to a posterior distribution. GPs rely on a kernel to shape
function behaviour and smoothness, with hyperparameters adjusted
based on data likelihood to maximise the fit to observed data.
The GP incorporated in astronomaly uses a combination of the
Matérn kernel and the WhiteKernel to model the relationships be-
tween data points and account for noise in the predictions. Both im-
plementations are implemented in Python through the scikit-learn
package (Pedregosa et al. 2011).
3.3 Evaluation
To evaluate the performance of the two active learning approaches,
the fully labelled evaluation set described in Section 2.3 was used as
a benchmark, which allowed the recall of the algorithms to be deter-
mined. iForest was applied to compute the anomaly score for each
data point in the dataset. In the NS algorithm, the process involved
selecting the top 100 images with the highest anomaly scores. These
images were then labelled through the astronomaly interface to
establish a new ranking order based on user input. From this new
ranking, the top 100 sources were identified and labelled, excluding
those already labelled in previous iterations. This iterative process
continued until a total of 500 sources were labelled and used for
training. For the DR algorithm, the first 100 sources were manually
labelled based on iForest scores, and the sources were then sorted by
acquisition scores. The retraining procedure was repeated iteratively
until 500 sources had been labelled and trained upon. Throughout
this process, the accuracy of each labelling iteration was assessed to
evaluate the performance and effectiveness of the algorithms.
Figure 2 illustrates typical sources in the dataset. Each row cor-
responds to a different label, ranging from 0 to 5, which would be
assigned to the source during the active learning process. For in-
stance, the first row contains examples of artefacts, masked sources,
and low signal-to-noise ratio sources, all of which would receive
a score of 0. Because the labelling process is subjective, having a
fixed labelling scheme aids in producing a self-consistent labelled
evaluation set.
Label
0
Label
1
Label
2
Label
3
Label
4
Label
5
Figure 2. Examples of sources in the dataset with their labels. The first row
contains sources such as artefacts, masked sources and low SNR sources. At
the bottom are sources that would be considered to be interesting anomalies
and contain sources such as galaxy mergers, strong gravitational lenses and
other sources that are not readily identifiable. The angular scale bar within
each image represents 5 arcseconds.
4 RESULTS
To evaluate the performance of anomaly detection methods, partic-
ularly in scenarios where the dataset contains an unknown number
of interesting anomalies, we employ two types of plots: a recall plot
and a Uniform Manifold Approximation and Projection (UMAP) plot
(McInnes et al. 2020).
The recall plot, as depicted in Figure 3, serves to illustrate how
the anomaly detection algorithm ranks sources from most to least
anomalous based on their anomaly scores. In this plot, the x-axis
represents the position of the sources in the ranked list, while the y-
axis represents the recall for the specific class of interest. Essentially,
the y-value increases when a source belongs to the class of interest.
MNRAS 000, 1–15 (2023)
6. 6 V. Etsebeth
A steeper slope at the beginning of the plot indicates better perfor-
mance, as it means that a user has to search through fewer sources
to find interesting anomalies. This ranking of sources is crucial be-
cause, in the astronomaly interface, the sources on the left would
be displayed first for labelling when using the score-based ranking.
Other ordering methods, such as random ordering, are also available
in the interface.
To visualise the high-dimensional features in a two-dimensional
space, the UMAP technique was used in this work (McInnes et al.
2020). UMAP is a nonlinear dimensionality reduction technique that
retains the topological structure of the data. This means it can capture
both global and local relationships among data points. In a UMAP
plot, clusters are groups of points that are close together in feature
space, suggesting that the data points within the clusters share sim-
ilarities or patterns that differentiate them from other data points.
UMAP can also highlight outliers, which are data points that sig-
nificantly differ from the majority of the data. Outliers may appear
as isolated points far from any clusters or as points located on the
edges of clusters but not tightly grouped with them. By embedding
high-dimensional features into a two-dimensional space, UMAP vi-
sualisations, like the one shown in Figure 5, help provide valuable
insights into the underlying data structure and the performance of the
anomaly detection methods.
4.1 Evaluation Set
The comparison of the NS, DR, and iForest-only anomaly detection
methods is illustrated in Figure 3. The plot shows how many of the
known anomalies in the evaluation set were detected as a function
of the sample size. The evaluation set consists of a total of 184
anomalies including the 87 lenses injected from a known catalogue.
The performance of iForest on the evaluation set yielded somewhat
surprising results, as it detected only 15 anomalies among the top 500
sources. However, when the DR approach was applied and trained
with 100 labels, it detected 51 anomalies, while the NS method
found 45. Both active learning methods demonstrated a significant
improvement in anomaly detection compared to using iForest alone.
Further increasing the number of labels to 500 resulted in even better
anomaly detection performances, with both active learning methods
detecting 84 anomalies within the top 500 sources.
To gain a deeper understanding of why iForest struggled to detect
a significant number of anomalies, a more detailed analysis was con-
ducted on the sources and their associated scores. Figure 4 presents
a normalised plot of the sources in the evaluation set, categorised
into three distinct classes: artefacts, galaxy merger candidates, and
known lenses, as these were the sources of particular interest. To
directly compare classes with very different numbers of objects, we
normalise the recall in each class to a scale of 0 to 1. This plot
reveals the reason for iForest’s poor performance: sources with a
high anomaly score are predominantly artefacts. While technically
anomalies, these sources may not be particularly interesting from a
scientific perspective.
Figure 4 provides additional evidence of the enhanced detection
rates achieved through the application of the NS algorithm. The
galaxy mergers and gravitational lensing candidates have been shifted
higher in the order (to the left and upwards), while the artefacts have
been moved lower (down and to the right). This outcome is quite im-
portant, as it demonstrates that AL can rapidly remove artefacts that
may have been missed by automated pipelines, allowing a scientist
to focus their attention on interesting anomalies.
Figure 5 presents a UMAP plot for the evaluation set, where all
sources were labelled and could therefore be classified into different
Figure 3. Performance of the neighbour score and direct regression active
learning algorithms on the evaluation set. The algorithms were applied in
iterations of 100 labels, with the results from 100 and 500 labels illustrated.
Both algorithms have comparable performance.
Figure 4. Recall of three types of anomaly in the evaluation set. The dashed
lines in the plot represent the initial results from iForest, while the solid lines
represent the results obtained from the NS algorithm trained with 200 labels.
The plot has been normalised to emphasise the performance gains of active
learning. Active learning improves the detection rates of interesting sources
while decreasing the impact of unwanted artefacts.
“classes” based on their scores. The most prevalent class, represented
by a Score of 1, typically includes “normal” galaxies with no spe-
cific interest. All other classes formed distinct sub-clusters within the
UMAP plot, showing minimal overlap with each other. Of special in-
terest are the classes associated with artefacts (Score 0) and the most
interesting anomalies (Score 5). These two classes display dense
clusters, and the artefacts are notably well-separated from all other
sources. First, this demonstrates the capability of image representa-
tions to effectively extract features that distinguish classes from one
another. Secondly, this offers valuable insights into the challenges
encountered when using iForest as a standalone anomaly detection
method. The interesting anomalies are located on the edge of the
overall structure of the plot, but the artefacts are much more likely to
MNRAS 000, 1–15 (2023)
7. Astronomaly at Scale 7
Figure 5. UMAP plot of the evaluation set. The different scores/labels show that subclusters are formed within the feature space, but that they are surrounded by
the more common, uninteresting sources. It is interesting to note that the artefacts present, represented in pink, form a relatively distinct cluster. The anomalies
show a similar pattern, with a very dense cluster formed.
be detected as anomalies by iForest given the position and distance
of their sub-cluster. These sub-clusters highlight the importance of
AL algorithms, which identify these regions of interest in the feature
space and are crucial for eliminating artefacts.
The feature space in Figure 5 is markedly different from that of
Walmsley et al. (2022), where most of the interesting anomalies were
deep in the centre of the plot. This is likely due to the fact that the
Galaxy Zoo data represents a specific subset of DECaLS with very
different properties. This explains why iForest, combined with the
NS algorithm, works well here but failed to find interesting anomalies
in the Galaxy Zoo data with the CNN-based features.
All of these classes are interspersed among the common sources,
those with a Score of 1, indicating that it is quite diverse and contains
features common with all of the other classes. The class with a Score
of 3 is closest to the interesting anomalies, which is likely due to
the similarities between spiral galaxies with moderately interesting
morphologies (which constitute most of the Score 3 class) and the
galaxy merger candidates in the Score 5 class. The Score 4 class
contains too few sources to draw clear observations about it.
4.2 Application On A Large Scale
The NS algorithm was used in the application on the main subset,
which consists of 3 884 404 unlabelled sources. The volume of this
subset forms the main challenge of this work as astronomaly has
not been applied on such a scale before. The same approach as in the
evaluation set was followed, excluding the use of the DR algorithm
due to its computational demands and limited discernible benefits.
See Appendix A for more details. Each active learning iteration of the
NS algorithm was done with 2000 labels until 10 000 sources were
labelled. The top 2000 sources were investigated and fully labelled in
each iteration. This entire process only took the lead author several
hours using astronomaly’s interactive interface.
The results are presented in Figure 6, showing the anomalies and
their ranks among the top 2000 sources. The left panel includes
labelled sources while the right panel shows only the new or unseen
anomalies, demonstrating the power of the algorithm to detect new
anomalies. The line for 0 labels, corresponding to iForest only, is not
visible because it detected only 1 interesting anomaly in the top 2000
sources. However, this does not mean that iForest failed to detect
anomalies. In fact, most of the 2000 (1763) sources initially detected
by iForest were artefacts or masked sources; obvious anomalies in
the dataset, but not of interest. The plots demonstrate the necessity of
active learning, as more labels lead to a higher number of interesting
sources within the top 2000. However, as the panel on the right in
Figure 6 shows, the number of new anomalies seen in the top 2000
drops sharply when more than 4000 labels are used. This suggests
a point of diminishing returns, where adding more labels would
not necessitate more anomalies being found when looking at the
same number of sources. In such an instance, discovering additional
anomalies would require investigating a larger number of sources
rather than increasing the number of labels.
4.3 Follow-up Investigations
The 10 000 labels from the main subset were scored between 0 and
5 using the labelling scheme described in Section 3 and illustrated
in Figure 2. The majority of the sources (4861) received a score of
0, indicating that most were artefacts or masked sources. The next
largest group (2408) that received a score of 1 were the common
galaxy types. Followed by 1648 sources given a score of 5, indicat-
ing a large number of interesting anomalies. The other labels were
distributed among the scores of 2 (519), 3 (288) and 4 (276), which
represent galaxies with slightly disturbed morphology.
The 1648 anomalies were analysed and 18 unclassified sources
(Figure 7), 8 gravitational lens candidates (Figure 8) and 1609 galaxy
merger candidates (Figure 9) were detected. Out of the 1648 sources
labelled to be anomalies, further investigations determined that 13
MNRAS 000, 1–15 (2023)
8. 8 V. Etsebeth
Figure 6. Recall as a function of rank for increasing numbers of labels. The plot on the left shows the number of anomalies including those that have been
previously labelled, whereas the plot on the right excludes the labelled anomalies, showing the unlabelled sources the algorithm determines as being the most
likely to be interesting. It is clear that the number of “new” anomalies found in the top 2000 increases as more labels are used for training. However, there is a
point of diminishing returns as the number of “new” anomalies decreases rapidly when more than 4000 labels are used. It should also be noted that the line for
0 labels is not visible on either plot since there was only one source detected.
of these sources were either artefacts, other uninteresting sources
mislabelled as lens candidates or sources that form part of another
detected source and which could be considered to be a duplicate
detection. In an attempt to identify these anomalies, they were cross-
matched with the Simbad database (Wenger et al. 2000) with different
cross-matching distances and using all of the available datasets on
Simbad. A total of 1209 matches at 30 arcseconds and 792 matches
at 10 arcseconds were obtained, generally corresponding to already
labelled galaxies. However, there was no definitive dataset to match
against that could confirm the nature of the anomalies detected and
a manual follow-up would be necessary to confirm their nature, as
even a 10 arcseconds difference can lead to potential mismatches.
Sources with Highly Unusual Morphology
We searched for matches in Simbad (Wenger et al. 2000) for the 18
unidentified sources. Table 2 summarises the source locations and
any known identifiers with a link to the Simbad entry. Almost half of
the sources have no matches or identifications in any other datasets.
The sources themselves can be seen in Figure 7 along with their
labels.
Here we summarise our initial investigations of the sources, based
on information found in Simbad and the Data Aggregation Service7.
• U1 - This appears to be a ring-shaped starburst galaxy (Toba
et al. 2014) at a spectroscopic redshift of 0.077 (Ahumada et al. 2020).
The northern extension may be a tidal tail or interacting galaxy, but
may also be coincident as it has an imprecise photometric redshift of
0.11.
• U2 - This puzzling object has detectable emission in radio
(NVSS, Condon et al. 1998), infrared (2MASS, Skrutskie et al. 2006),
and ultraviolet (GALEX, Martin et al. 2005). Two competing spec-
troscopic redshifts are available, 0.071 (GLADE, Dálya et al. 2018)
7 https://das.datacentral.org.au/das
Table 2. Information about the 18 initially unidentified anomalous sources
detected in the main set. The second and third columns show the right as-
cension and declination in degrees, respectively. The fourth column lists any
known source names of the target from other surveys or catalogues if they
were matched. Blank entries are unmatched sources.
Entry RA Declination Identifiers
U1 209.7286 29.5764 2MASS J13585490+2934356
U2 342.9047 17.8460 2MASX J22513724+1750457
U3 60.7336 -15.2434 LEDA 913772
U4 42.2626 3.2043
U5 27.0134 -21.6656 ESO 543-16
U6 22.1821 -2.3705
U7 69.8371 -50.5307 ESO 202-45
U8 31.6507 -28.0090 2dFGRS TGS226Z042
U9 50.7173 -11.8873
U10 310.1343 1.8177
U11 63.0540 -24.9667
U12 36.2380 -27.2902 2dFGRS TGS230Z092
U13 210.4254 33.8215 NVSS J140141+334937
U14 212.8158 0.1169 SDSS J141116.31+000654.9
U15 46.9290 -14.1055 LEDA 928927
U16 55.5332 -19.8353
U17 106.5698 68.5793
U18 32.2221 -0.9785 SDSS J020853.45-005841.1
and 0.2 (Milliquas, Flesch 2023). The latter seems to correspond to
the bright red source which could be a coincident quasar (given the
selection criteria of Milliquas). A detailed analysis would be required
to determine which of these sources are associated and if the ring
is a lens or an unusually red ring galaxy with a bright patch of star
formation.
• U3 - This source, apparently half red and half blue, has the same
visual appearance in PanSTARRS (Chambers et al. 2016) indicating
MNRAS 000, 1–15 (2023)
9. Astronomaly at Scale 9
Figure 7. The anomalies shown here are difficult to identify with a quick
visual inspection. They need further investigations to determine their nature
and origin. Table 2 lists more information on these anomalies. The angular
scale bar within each image represents 10 arcseconds.
its appearance is not due to a processing error. Unfortunately, only
photometric redshifts from DECaLS are available with an implied
redshift of 0.1120 for the red source and 0.1050 for the bright blue
bulge in the centre. The uncertainties are too large to determine if
these are coincident sources or not. The entire source has associated
ultraviolet emission from GALEX and the red part is clearly detected
in 2MASS. We have no immediate explanation for the dual-colour
nature of this source. It could be two coincident galaxies with a
remarkable chance alignment, interacting galaxies of very different
colours and nature or something else entirely.
• U4 - This source has a star-forming ring and two apparent cores
that may be merging. A photometric redshift of 0.0775 (GLADE,
Dálya et al. 2018) is available for one of the cores. The nearby galaxy,
LEDA 213095, has a photo-z of 0.0734 (GLADE, Dálya et al. 2018)
meaning it could potentially be interacting and triggering the star
formation.
• U5 - A highly disturbed, star-forming ring galaxy. Not surpris-
ingly, this object is bright in radio (NVSS, Condon et al. 1998)
and ultraviolet (GALEX, Martin et al. 2005). Only the bright ring
has spectroscopic information, with a redshift of 0.0459 (6dF Jones
et al. 2009). The neighbouring source has a negative photo-z from
DECaLS. It is thus not clear if these galaxies are interacting or coin-
cident. At the bottom of the cutout, a faint third galaxy can be seen
for which no redshift information is available.
• U6 - This object appears to be a lens but has the unusual fea-
ture of two foreground objects, neither of which is in the centre.
A competing explanation is that the pair of galaxies is disturbing a
third galaxy, completely disrupting its morphology. The left and right
foreground galaxies have photometric redshifts of 0.3117 ± 0.1321
and 0.1224 ± 0.0377 respectively which suggests that one of them
may be coincident. However, given the complexity of the system and
the large uncertainty of the redshift of the left source, we cannot be
certain they are unrelated. The entire system is visible in ultraviolet
(GALEX, Martin et al. 2005) and the two foreground sources are
faintly detected in infrared (2MASS, Skrutskie et al. 2006).
• U7 - Tidal tail due to interaction with a neighbouring galaxy.
Both galaxies have a redshift of 0.0371 (GLADE, Dálya et al. 2018),
although a redshift error is not listed so they may still be coincident
sources.
• U8 - It is difficult to determine if the two red galaxies are
associated with the oddly shaped blue galaxy. The redshift estimates
for the parts of this system vary significantly, although the DECaLS
photometric redshifts place the red galaxies at a similar redshift
(0.22) to the spectroscopic redshift of the blue galaxy (6dF Jones
et al. 2009). However, the group-finding algorithm of Eke et al.
(2004) places these galaxies in a group at redshift 0.11. It seems
likely that it is the group interaction that has warped the morphology
of the blue galaxy triggering star-formation, but careful data analysis
is needed to confirm this.
• U9 - The two galaxies seen here have a similar photometric
redshift in the DECaLS catalogue of 0.17, but the blue region in the
middle registers a very different redshift of 0.34. It’s possible that
these galaxies are simply coincident or they may in fact be interacting
and creating a star-forming region between them which is not well-
estimated by photo-z algorithms.
• U10 - With very different redshift estimates for each part of
this system, these sources could very well be coincident. However,
it seems quite a dramatic alignment of sources so some amount of
interaction and star-formation may be more plausible. Additional
spectroscopic observations would be needed to determine the nature
of this interesting group.
• U11 - This unusual system also has disagreeing photometric
redshifts so it may be a chance alignment, although it visually appears
to be a merging system.
• U12 - This unusual group of galaxies has been detected with
the group-finding algorithm of Eke et al. (2004) at redshift 0.2137.
• U13 - This system was identified as a possible compact group
in Zheng Shen (2020) at redshift 0.02640. The main galaxy is also
the host of the supernova SN 2012T (Asiago Supernova Catalogue,
Barbon et al. 1999).
• U14 - Known group of galaxies at redshift 0.1615 (Eke et al.
2004).
• U15 - With three different photometric redshift estimates in this
MNRAS 000, 1–15 (2023)
10. 10 V. Etsebeth
Table 3. The gravitational lens candidates that have been identified in the top
10 000 anomalies. The second and third columns show the right ascension and
declination in degrees, respectively. The last column indicates whether the
candidates have been confirmed to be a lens, a lens candidate or a candidate
that has not been matched to any other catalogue yet.
Entry RA Declination Information
L1 35.2352 -7.7199 Confirmed lens - [More et al. (2012)]
L2 27.9503 -32.6199 Candidate - [Jacobs et al. (2019)]
L3 61.6016 -26.7733 Candidate - [Jacobs et al. (2019)]
L4 128.1546 13.5797 Candidate - [Shu et al. (2017)]
L5 340.2492 -52.7542 Candidate - [Diehl et al. (2017)]
L6 78.3564 -30.8416 Candidate - Previously undetected
L7 9.7307 7.3230 Candidate - Previously undetected
L8 60.1041 -16.3973 Candidate - Previously undetected
group, these are likely to be coincident galaxies although the usual
caveats about photo-z uncertainties apply.
• U16 - All three photometric redshifts from DECaLS for this
system are very different so again, they are either coincident galaxies
or have poorly estimated redshifts.
• U17 - Likely chance alignment due to differing photometric
redshifts.
• U18 - Disagreeing spectroscopic redshift estimates, 0.19623
and 0.17436, suggest these may be colinear galaxies.
Gravitational Lens Candidates
Figure 8 and Table 3 show the strong gravitational lens candidates
that have been identified in the top 10 000 anomalies. The source
labelled L1 has been cross-matched, identified and confirmed to be
a strong gravitational lens. The sources L2 through L5 have been
cross-matched with other catalogues and identified as strong lens
candidates thanks to the combined strong lens catalogued created in
Grespan et al., (in prep). The remaining sources are suspected strong
lenses based on visual characteristics within the cutouts, but have
not been listed in any known catalogue. It should be noted that only
fairly obvious lenses were labelled as interesting and there may be
many more candidates in the list of anomalies that could potentially
be identified by an expert in the field.
Confirming the nature of these sources is challenging without
significant additional analysis and spectroscopic follow-up. We
found that the photometric redshift information available to us
does not appear to be particularly reliable for these lensed systems.
For instance, the DECaLS photometric redshifts for the sources in
L1, which is a confirmed lens system, places the lens at a higher
redshift than the background source, which is obviously incorrect.
Of particular interest would be to confirm if the system U6 of Figure
7 is actually a lensed system and if the two sources in the foreground
are coincidental or in fact part of the lens. Similarly, the pair of
sources in L8 would need spectroscopic information to determine if
they are both involved in lensing the background source or not.
Galaxy Merger Candidates
Figure 9 shows some of the more striking examples of mergers de-
tected with astronomaly. The 1609 galaxy merger candidates were
compared with the Catalog of Morphologically Identified Merging
Galaxies (Hwang Chang 2009), but the sky coverage of this cata-
logue, 422 deg2, is significantly smaller than that of DECaLS, 14 000
deg2, so not many matches were expected. In addition, the different
Figure 8. The strong gravitational lens candidates detected in the top 10 000
anomalies. More information about these sources can be found in Table 3.
The angular scale bar within each image represents 5 arcseconds.
data cuts applied make a direct comparison challenging. However, 6
matches were found at a cross-matching distance of 30 arcseconds.
These 6 sources were visually confirmed to match the sources in the
catalogue. While spectroscopic follow-up would be needed to further
investigate the rest of the sources shown in Figure 9, the presence
of significant tidal streams and interaction suggests the majority of
these are high probability merger candidates.
These results show that astronomaly could be used to build a
large, albeit incomplete, merger catalogue. Additionally, a second
round of active learning could be used to actually reduce the number
of mergers and highlight more rare types of anomalies.
MNRAS 000, 1–15 (2023)
11. Astronomaly at Scale 11
Figure 9. Some of the galaxy merger candidates that were found in the top 10 000 of the main set were identified and labelled as anomalies. The images displayed
are those that were visually the most interesting out of the 1609 merger candidates. The angular scale bar within each image represents 10 arcseconds.
MNRAS 000, 1–15 (2023)
12. 12 V. Etsebeth
Table 4. Coordinates of the galaxy merger candidates shown in Figure 9. The
Entry columns indicate the corresponding image as shown in Figure 9 and
the following columns show the right ascension and declination in degrees
respectively.
Entry RA Declination Entry RA Declination
M1 11.4865 32.2940 M25 333.0850 0.5605
M2 12.0642 -25.6886 M26 333.2532 21.9719
M3 12.4649 17.7756 M27 335.2230 13.4425
M4 123.6086 37.2613 M28 36.9431 26.5896
M5 127.6318 18.2050 M29 40.4523 -49.0052
M6 134.2013 30.8611 M30 42.9094 -16.6571
M7 138.9657 13.5864 M31 43.6074 -64.1671
M8 147.5532 -5.6929 M32 44.8704 -14.2910
M9 168.9463 15.8239 M33 46.3564 -19.4730
M10 192.1130 15.5824 M34 49.0504 -12.1633
M11 205.4201 13.5041 M35 50.3167 -28.3164
M12 213.6448 24.4104 M36 60.4281 -23.5613
M13 22.8853 -22.3706 M37 63.3138 -14.6525
M14 222.3588 23.3494 M38 64.5453 -31.2488
M15 229.9712 2.6200 M39 68.7329 -40.0342
M16 239.7994 20.7477 M40 70.2079 -44.9443
M17 24.0000 -11.7047 M41 72.2531 -33.3201
M18 25.4929 -13.8470 M42 76.0434 -57.2689
M19 251.9828 10.5277 M43 79.3557 -48.3955
M20 254.6314 58.9370 M44 80.1525 -38.6625
M21 28.6783 27.3285 M45 81.0061 -32.2876
M22 30.6765 -2.5885 M46 81.8048 -31.4080
M23 318.6995 -2.1876 M47 83.3110 -39.4453
M24 330.9685 -1.6943 M48 85.2283 -20.3304
5 CONCLUSIONS
Anomaly detection on a large scale is critical for scientific discovery
in current and future astronomical datasets. Computational chal-
lenges are abundant and more often than not, supervised methods
are used to detect “new” sources that are of the same type as ones
already identified. The search for novel anomalies requires unsuper-
vised methods and has previously been applied on relatively modest
scales using frameworks like astronomaly. In this work, we have
applied astronomaly to a much larger scale; 3 884 404 galaxies ob-
tained from the DECaLS DR8. This was a test to determine if these
methods could be applied on significantly larger scales and if they
could find interesting sources in the relatively unexplored DECaLS
dataset.
We looked at various options for selecting a DECaLS subset to
study and discovered that the choice of data selection criteria had a
significant impact on the balance between scalability and discovery,
which is important for detecting anomalies. If the selection criteria
incorporated are too strict, anomalies might be overlooked. The se-
lection criteria we used impacted the number of gravitational lens
candidates identified within our evaluation set and would conse-
quently constrain the number of such candidates within the larger
dataset used. We were unable to identify a set of cuts that could
reduce the dataset size while retaining all the lens candidates. Ex-
ploring criteria that have less impact on the count of lens candidates
could be a worthwhile avenue for future research. To ensure a thor-
ough investigation, data should be as wide and free as possible, but
should still remain manageable enough in size to analyse.
As the size of the dataset increases, the challenges of large data
volumes, storage, and computational complexity become more im-
portant. One of the main challenges that we experienced was the
transfer of data from the host server to a local computer, which took
several weeks as the data had to be validated too. As the size and
complexity of datasets increase, such as in the case of the LSST and
the SKA, the traditional approach of moving the data to the compute
nodes becomes impractical and costly. Bringing the compute to the
data is therefore essential for anomaly detection in big datasets.
Preprocessing can also have a significant impact on the anomalies
detected. Following Walmsley et al. (2021), we greyscaled the images
before feature extraction through a straightforward averaging of the
three optical channels. However, it is worth noting that using estab-
lished band weightings, such as those available in OpenCV8, could
potentially offer a more suitable approach for creating the greyscale
images (Etsebeth 2020). Furthermore, our use of the basic sigma
clipping procedure, as described in Walmsley et al. (2021), proved ef-
fective in reducing noise but at times could not accommodate nearby
bright sources within the cutouts. Alternative techniques, like the
sigma clipping and masking procedure used in Lochner Bassett
(2021), may provide improved results by ensuring the removal of all
sources that are not part of the target object.
For feature extraction, we showed that using the pre-trained CNN
from Walmsley et al. (2021) to obtain representations of images
works extremely well for DECaLS data, further demonstrating the
value of CNNs as general-purpose feature extractors. This adaptabil-
ity reduces the need for expensive labelling and allows the reuse of
networks for other, unsupervised tasks. It also highlights the value
of training networks to solve complex classification tasks on large
datasets and publicly releasing the trained network weights and code
for use by the community on other tasks and datasets.
We applied astronomaly using the anomaly detection algorithm,
iForest, to the features extracted using the CNN. iForest scaled well
to the considerable amount of data we had and was able to effectively
identify artefacts, which are anomalies but are not scientifically in-
teresting. The fact that a large number of artefacts were present in the
list of anomalies, despite our selection cuts, highlights the difficulty
of detecting artefacts with automated flagging techniques but also
the value of unsupervised methods in detecting artefacts that were
missed. However, iForest alone could not differentiate between inter-
esting and less interesting anomalies. We found that active learning,
making use of a relatively small amount of human labelling, is criti-
cal to success in anomaly detection for large, uncurated astronomical
datasets. Using astronomaly’s interface, it only took one person a
few hours to label enough data to enable significant discoveries.
Figure 6 shows that active learning algorithms can be used to
enhance the performance of anomaly detection algorithms by priori-
tising the most relevant anomalies and filtering out less interesting
ones. We compared two different active learning approaches and
found similar performance (Figure 3). We chose the neighbour score
due to computational challenges with the direct regression method
at a larger scale. While more computationally efficient Gaussian pro-
cess methods are available, the tests on the evaluation set suggested
that the extra implementation effort might not justify potential per-
formance gains. This may not be the case for other datasets (e.g.
Walmsley et al. 2021).
The application of anomaly detection, in conjunction with the NS
active learning method using a total of 10 000 labels, on the dataset
of 3 884 404 sources, identified 1635 interesting anomalies in the top
2000 sources. Of these, 8 gravitational lens candidates were iden-
tified, 5 of which are listed as candidates in other catalogues. In
addition, 1609 sources were identified that contain galaxies exhibit-
ing some signs of a gravitational merger event. Finally, 18 sources,
shown in Figure 7, were found that were unstudied to the best of our
8 https://docs.opencv.org/4.x/de/d25/imgproc_color_
conversions.html
MNRAS 000, 1–15 (2023)
13. Astronomaly at Scale 13
current knowledge. These unusual sources vary in morphology and
require additional investigation in order to identify their nature. They
include ring galaxies exhibiting strange colours and morphology, a
source that is half red and half blue, a potential strongly lensed system
with a pair of sources acting as the lens, several known interacting
groups and some sources that are either interacting or coincidental
alignments. Moreover, it is important to note, that these sources were
all contained within only the top 2000 most anomalous sources after
applying 10 000 labels. This opens up the potential for significantly
more sources that are also interesting to be identified.
Our results show that the modern anomaly detection techniques
included in astronomaly scale well to large datasets and are capa-
ble of rapidly detecting scientifically interesting anomalies. As the
number and quality of anomalies detected can be affected by selec-
tion cuts, these should be avoided as far as possible by leveraging
computationally-intensive unsupervised frameworks running on re-
mote data centres. This work paves the way for scientific discovery
with anomaly detection in large datasets, such as those expected from
the Vera C. Rubin Observatory and the Square Kilometre Array.
6 ACKNOWLEDGEMENTS
The authors would like to personally thank Aritra Ghosh for the input
into identifying the unusual anomalies detected in this work.
VE and ML acknowledge support from the South African Ra-
dio Astronomy Observatory and the National Research Foundation
(NRF) towards this research. Opinions expressed and conclusions
arrived at, are those of the authors and are not necessarily to be
attributed to the NRF.
MW is a Dunlap Fellow and acknowledges funding from the
Science and Technology Facilities Council (STFC) Grant Code
ST/R505006/1.
This paper includes data that has been provided by AAO Data
Central (datacentral.org.au).
The Legacy Surveys consist of three individual and complemen-
tary projects: the Dark Energy Camera Legacy Survey (DECaLS;
Proposal ID #2014B-0404; PIs: David Schlegel and Arjun Dey), the
Beijing-Arizona Sky Survey (BASS; NOAO Prop. ID #2015A-0801;
PIs: Zhou Xu and Xiaohui Fan), and the Mayall 𝑧-band Legacy Sur-
vey (MzLS; Prop. ID #2016A-0453; PI: Arjun Dey). DECaLS, BASS
and MzLS together include data obtained, respectively, at the Blanco
telescope, Cerro Tololo Inter-American Observatory, NSF’s NOIR-
Lab; the Bok telescope, Steward Observatory, University of Arizona;
and the Mayall telescope, Kitt Peak National Observatory, NOIR-
Lab. Pipeline processing and analyses of the data were supported by
NOIRLab and the Lawrence Berkeley National Laboratory (LBNL).
The Legacy Surveys project is honoured to be permitted to conduct
astronomical research on Iolkam Du’ag (Kitt Peak), a mountain with
particular significance to the Tohono O’odham Nation.
NOIRLab is operated by the Association of Universities for Re-
search in Astronomy (AURA) under a cooperative agreement with
the National Science Foundation. LBNL is managed by the Regents
of the University of California under contract to the U.S. Department
of Energy.
This project used data obtained with the Dark Energy Camera
(DECam), which was constructed by the Dark Energy Survey (DES)
collaboration. Funding for the DES Projects has been provided by the
U.S. Department of Energy, the U.S. National Science Foundation,
the Ministry of Science and Education of Spain, the Science and
Technology Facilities Council of the United Kingdom, the Higher
Education Funding Council for England, the National Center for
Supercomputing Applications at the University of Illinois at Urbana-
Champaign, the Kavli Institute of Cosmological Physics at the Uni-
versity of Chicago, Center for Cosmology and Astro-Particle Physics
at the Ohio State University, the Mitchell Institute for Fundamental
Physics and Astronomy at Texas AM University, Financiadora de
Estudos e Projetos, Fundacao Carlos Chagas Filho de Amparo, Fi-
nanciadora de Estudos e Projetos, Fundacao Carlos Chagas Filho
de Amparo a Pesquisa do Estado do Rio de Janeiro, Conselho Na-
cional de Desenvolvimento Cientifico e Tecnologico and the Minis-
terio da Ciencia, Tecnologia e Inovacao, the Deutsche Forschungs-
gemeinschaft and the Collaborating Institutions in the Dark Energy
Survey. The Collaborating Institutions are Argonne National Labo-
ratory, the University of California at Santa Cruz, the University of
Cambridge, Centro de Investigaciones Energeticas, Medioambien-
tales y Tecnologicas-Madrid, the University of Chicago, University
College London, the DES-Brazil Consortium, the University of Ed-
inburgh, the Eidgenossische Technische Hochschule (ETH) Zurich,
Fermi National Accelerator Laboratory, the University of Illinois at
Urbana-Champaign, the Institut de Ciencies de l’Espai (IEEC/CSIC),
the Institut de Fisica d’Altes Energies, Lawrence Berkeley National
Laboratory, the Ludwig Maximilians Universitat Munchen and the
associated Excellence Cluster Universe, the University of Michigan,
NSF’s NOIRLab, the University of Nottingham, the Ohio State Uni-
versity, the University of Pennsylvania, the University of Portsmouth,
SLAC National Accelerator Laboratory, Stanford University, the Uni-
versity of Sussex, and Texas AM University.
BASS is a key project of the Telescope Access Program (TAP),
which has been funded by the National Astronomical Observatories
of China, the Chinese Academy of Sciences (the Strategic Prior-
ity Research Program “The Emergence of Cosmological Structures”
Grant # XDB09000000), and the Special Fund for Astronomy from
the Ministry of Finance. The BASS is also supported by the Exter-
nal Cooperation Program of Chinese Academy of Sciences (Grant
# 114A11KYSB20160057), and Chinese National Natural Science
Foundation (Grant # 12120101003, # 11433005).
The Legacy Survey team makes use of data products from the
Near-Earth Object Wide-field Infrared Survey Explorer (NEOWISE),
which is a project of the Jet Propulsion Laboratory/California Insti-
tute of Technology. NEOWISE is funded by the National Aeronautics
and Space Administration.
The Legacy Surveys imaging of the DESI footprint is supported
by the Director, Office of Science, Office of High Energy Physics
of the U.S. Department of Energy under Contract No. DE-AC02-
05CH1123, by the National Energy Research Scientific Comput-
ing Center, a DOE Office of Science User Facility under the same
contract; and by the U.S. National Science Foundation, Division of
Astronomical Sciences under Contract No. AST-0950945 to NOAO.
We acknowledge the use of the ilifu cloud computing facility
– www.ilifu.ac.za, a partnership between the University of Cape
Town, the University of the Western Cape, the University of Stel-
lenbosch, Sol Plaatje University and the Cape Peninsula University
of Technology. The Ilifu facility is supported by contributions from
the Inter-University Institute for Data Intensive Astronomy (IDIA – a
partnership between the University of Cape Town, the University of
Pretoria and the University of the Western Cape, the Computational
Biology division at UCT and the Data Intensive Research Initiative
of South Africa (DIRISA).
MNRAS 000, 1–15 (2023)
14. 14 V. Etsebeth
DATA AVAILABILITY
The data used in this paper from the DECaLS survey is publicly
available. Catalogues of potential merger candidates and other types
of sources can be made available on request to the authors.
REFERENCES
Ahumada R., et al., 2020, ApJS, 249, 3
Almeida A., et al., 2023, The Eighteenth Data Release of the Sloan
Digital Sky Surveys: Targeting and First Spectra from SDSS-V
(arXiv:2301.07688)
Barbon R., Buondí V., Cappellaro E., Turatto M., 1999, AAS, 139, 531
Baron D., Poznanski D., 2017, MNRAS, 465, 4530
Breiman L., 2001, Machine Learning, 45, 5
Breunig M. M., Kriegel H.-P., Ng R. T., Sander J., 2000, SIGMOD Rec., 29,
93–104
Chambers K. C., et al., 2016, PASP, 128, 104502
Condon J. J., Cotton W. D., Greisen E. W., Yin Q. F., Perley R. A., Taylor
G. B., Broderick J. J., 1998, AJ, 115, 1693
Dálya G., et al., 2018, MNRAS, 479, 2374
Debosscher J., Sarro L. M., Aerts C., Cuypers J., Vandenbussche B., Garrido
R., Solano E., 2007, Astronomy Astrophysics, 475, 1159
Dey A., et al., 2019, The Astronomical Journal, 157, 168
Diehl H. T., et al., 2017, ApJS, 232, 15
Eke V. R., et al., 2004, MNRAS, 348, 866
Etsebeth V., 2020, Master’s thesis, University of the Western Cape, Cape
Town, South Africa, http://hdl.handle.net/11394/9028
Flesch E. W., 2023, arXiv e-prints, p. arXiv:2308.01505
Giles D., Walkowicz L., 2019, MNRAS, 484, 834
Huang X., et al., 2020, The Astrophysical Journal, 894, 78
Huang X., Storfer C., Gu A., Ravi V., Pilon A., et al 2021, The Astrophysical
Journal, 909, 27
Hwang C.-Y., Chang M.-Y., 2009, The Astrophysical Journal Supplement
Series, 181, 233
Ivezić Ž., et al., 2019, ApJ, 873, 111
Jacobs C., et al., 2019, ApJS, 243, 17
Jones D. H., et al., 2009, MNRAS, 399, 683
Liaw A., Wiener M., 2002, R News, 2, 18
Lintott C. J., et al., 2008, MNRAS, 389, 1179
Lintott C., et al., 2011, MNRAS, 410, 166
Liu F. T., Ting K. M., Zhou Z.-H., 2008, in Proceedings of the 2008
Eighth IEEE International Conference on Data Mining. ICDM ’08.
IEEE Computer Society, USA, p. 413–422, doi:10.1109/ICDM.2008.17,
https://doi.org/10.1109/ICDM.2008.17
Lochner M., Bassett B., 2021, Astronomy and Computing, 36, 100481
Mao Y.-Y., Geha M., Wechsler R. H., Weiner B., Tollerud E. J., Nadler E. O.,
Kallivayalil N., 2021, The Astrophysical Journal, 907, 85
Martin D. C., et al., 2005, ApJ, 619, L1
Martinazzo A., Espadoto M., Hirata N. S. T., 2020, Self-
supervised Learning for Astronomical Image Classification,
doi:10.48550/ARXIV.2004.11336, https://arxiv.org/abs/2004.
11336
Massey P., Neugent K. F., Levesque E. M., 2019, The Astronomical Journal,
157, 227
McInnes L., Healy J., Melville J., 2020, UMAP: Uniform Manifold Approx-
imation and Projection for Dimension Reduction (arXiv:1802.03426)
Metcalf R. B., et al., 2019, Astronomy Astrophysics, 625, A119
More A., Cabanac R., More S., Alard C., Limousin M., Kneib J. P., Gavazzi
R., Motta V., 2012, ApJ, 749, 38
Pearson K., 1901, The London, Edinburgh, and Dublin Philosophical Maga-
zine and Journal of Science, 2, 559
Pedregosa F., et al., 2011, Journal of Machine Learning Research, 12, 2825
Petrosian V., 1976, ApJ, 210, L53
Rasmussen C. E., Williams C. K., 2006, Gaussian processes for machine
learning. MIT press
Shlens J., 2014, arXiv
Shu Y., et al., 2017, ApJ, 851, 48
Skrutskie M. F., et al., 2006, AJ, 131, 1163
Slijepcevic I. V., Scaife A. M. M., Walmsley M., Bowles M., Wong O. I.,
Shabala S. S., White S. V., 2023, Radio Galaxy Zoo: Building a multi-
purpose foundation model for radio astronomy with self-supervised learn-
ing (arXiv:2305.16127)
Solarz A., Bilicki M., Gromadzki M., Pollo A., Durkalec A., Wypych M.,
2017, Astronomy Astrophysics, 606, A39
Soroka A., Meshcheryakov A., Gerasimov S., 2021, Morphologi-
cal classification of astronomical images with limited labelling,
doi:10.48550/ARXIV.2105.02958
Sridhar S., et al., 2020, The Astrophysical Journal, 904, 69
Storey-Fisher K., Huertas-Company M., Ramachandra N., Lanusse F., Leau-
thaud A., Luo Y., Huang S., Prochaska J. X., 2021, Monthly Notices of
the Royal Astronomical Society, 508, 2946
Tan M., Le Q. V., 2020, EfficientNet: Rethinking Model Scaling for Convo-
lutional Neural Networks (arXiv:1905.11946)
Taylor M., 2015, TOPCAT’s TAP Client (arXiv:1512.06567)
The Astropy Collaboration et al., 2013, AA, 558, A33
The Astropy Collaboration et al., 2018, The Astronomical Journal, 156, 123
The Astropy Collaboration et al., 2022, The Astrophysical Journal, 935, 167
The Dark Energy Survey Collaboration 2005, The Dark Energy
Survey, doi:10.48550/ARXIV.ASTRO-PH/0510346, https://arxiv.
org/abs/astro-ph/0510346
Toba Y., et al., 2014, ApJ, 788, 45
Walmsley M., et al., 2019, Monthly Notices of the Royal Astronomical Soci-
ety, 491, 1554
Walmsley M., et al., 2021, Monthly Notices of the Royal Astronomical Soci-
ety, 509, 3966
Walmsley M., et al., 2022, Monthly Notices of the Royal Astronomical Soci-
ety, 513, 1581
Walmsley M., et al., 2023, Journal of Open Source Software, 8, 5312
Wenger M., et al., 2000, Astronomy and Astrophysics Supplement Series,
143, 9
Wright E. L., et al., 2010, The Astronomical Journal, 140, 1868
York D. G., et al., 2000, AJ, 120, 1579
Zheng Y.-L., Shen S.-Y., 2020, ApJS, 246, 12
de Vaucouleurs G., 1948, Annales d’Astrophysique, 11, 247
Ćiprijanović A., et al., 2021, Monthly Notices of the Royal Astronomical
Society, 506, 677
APPENDIX A: COMPUTATIONAL RESOURCES
In this paper, our goal was to test astronomaly on a large scale,
which poses significant computational challenges. This appendix
provides the details of the computational resources that we used for
our analysis.
DECaLS9 and Storage
The coadded image stack for DECaLS is stored in files totalling
45 TB of storage space, which is far too large to transfer and host
locally. Cutouts are used instead, as this allows greater freedom over
the data selected and allows the user to determine how large each
cutout should be. With nearly 4 million sources, 150x150 pixels for
each image, the storage space required for the data alone reaches
nearly 150 GB. In addition, the catalogue, features, and other files
created during the pipeline contribute an additional 100 GB needed.
While not significant when it comes to storage, the transfer of such
an amount does take a significant amount of time. Pre-processing
several million sources incurs a significant amount of read-and-write
operations to be done. If the images are processed and the output
9 https://www.legacysurvey.org/dr8/description/
MNRAS 000, 1–15 (2023)
15. Astronomaly at Scale 15
saved in a different location, the storage required is double the original
amount.
Computational Times And Other Requirements
Obtaining the data is straightforward with regard to accessibility, but
the transference of data is extremely time-consuming. To obtain the
cutouts of 3 884 404 sources took several weeks, with substantial
failure rates and server maintenance impacting the process. The
feature extraction process using the CNN is not very memory
intensive but does use a large number of computational hours. The
following lists the computational times and memory requirements
for different steps of the pipeline.
Feature Extraction: It should be noted that a graphics pro-
cessing unit (GPU) is highly recommended for deep learning as
it would significantly speed up the process. For this work though,
only central processing units (CPUs) were used. The features file is
almost 40GB in size (saved as a parquet format) and uses a similar
amount of RAM to read in. This almost instantly rules out any local
computing platform.
CPU times: user 23h 1min 21s
Dimensionality Reduction: Applying PCA on such a file,
which has dimensions of 3 884 404 by 1280, is surprisingly quick
using a CPU, but is memory intensive.
CPU times: 2h 46min 30s
Max Memory: 228.64GB
Isolation Forest: iForest uses smaller subsets of the input
dataset in an ensemble of decision trees which significantly reduces
memory usage. Applied to the features which are reduced to a lower
dimension with PCA:
CPU times: 10min 17s
Max Memory: between 2 and 5Gb
However, if iForest is applied to the full features it is seen
that the memory requirement increases significantly.
CPU times: 1h 5min 25s
Max Memory: Oscillating between 38 and 91 GB
Neighbour score: Memory Peaks at 11GB during retraining.
Computational times vary between 20 and 30 minutes.
Direct regression: Failed at each attempt with memory usage
exceeding the upper limit available at the time (232GB).
No computational time is known.
APPENDIX B: DATA SELECTION CUTS
The selection criteria in Section 2.1 is used to select a subset of the
DECaLS dataset and is done via TOPCat (Taylor (2015)), using a
Table Access Protocol (TAP) query. This type of query is written in
the Astronomical Data Query Language (ADQL) as follows (Note
the terms “AND” and “OR”, referring to inclusive and exclusive data
selection cuts respectively):
allmask_g=0 AND allmask_r=0 AND allmask_z=0
rchisq_g1 OR rchisq_r1 OR rchisq_z1
snr_g0 AND snr_r0 AND snr_z0
flux_g7 OR flux_r7 OR flux_z7
shapedev_r3 OR shapeexp_r3
The DECaLS data is available with a tap query at
https://datalab.noirlab.edu/ls/dataAccess.php. The
following are more detailed descriptions of the various DECaLS
catalogue entries incorporated in this work.
No point sources: This value is used to remove all sources
observed by the telescope that can not be resolved. They would
appear to be point sources and the inherent structure of the source is
not identifiable.
ALLMASK: These are masked sources, typically stellar in
nature. Sources with a value greater than 0 in any of the three bands
(𝑔,𝑟 and 𝑧) are not considered.
Reduced Chi-Squared (rchi) values: The “Profile-weighted
𝜒2 of model fit normalised by the number of pixels in g,r and z.”
This serves as an indication of how well the model fits the source.
Although, as seen in Mao et al. (2021), the models do not fit as
well as expected. Various values of this parameter were visually
inspected using hundreds of images with the conclusion that the
value in any one band should be at least 1; values lower than one
are related to sources that do not have clearly visible structure.
No upper limit restriction was placed because numerous sources
were identified during the inspection that have high values for these
parameters, but which are still well resolved.
Signal-to-Noise Ratio (SNR): The standard signal-to-noise
ratio, is set to be greater than 0 in all bands. This is because sources
with “negative” signal-to-noise ratios exist within the catalogue and
are mainly artefacts. By using this restriction, a significant amount
of artefacts are removed from the datasets.
Flux values: The dataset was further restricted by excluding
sources with a flux value below 7 nanomaggies, as these sources are
too dim to show any clear structure. This was observed from the
visual inspection of the images.
Shapedev and shapeexp: The “Half-light radius of de Vau-
couleurs model ( 0)” and “Half-light radius of exponential model
( 0)” respectively. These are used to identify the image size
to be used. Thousands of images of sources were inspected for
various values of shapedev and shapeexp, with values ranging
from 0 to 200. It was seen that for low values ( 3), the sources
are faint and it becomes difficult to identify any structure within them.
This paper has been typeset from a TEX/L
ATEX file prepared by the author.
MNRAS 000, 1–15 (2023)