Vision and reflection on Mining Software Repositories research in 2024
21cm cosmology with machine learning (Review))
1. 21cm cosmology with machine learning
Hayato Shimabukuro(島袋隼⼠)
(Yunnan university & Nagoya university)
@Statistical analysis of random fields in cosmology (2024/3/4-3/6)
6. 21cm line emission
Proton Electron
21cm line emission(1.4GHz)
(Neutral) hydrogen atom is a good tracer for IGM through the dark ages to EoR.
singlet
Triplet
21cm radiation: neutral hydrogen atom emits 21cm line emission due to hyperfine structure.
Transition
Tb =
TS T
1 + z
(1 exp(⌧⌫))
⇠ 27xH(1 + m)
✓
H
dvr/dr + H
◆ ✓
1
T
TS
◆ ✓
1 + z
10
0.15
⌦mh2
◆1/2 ✓
⌦bh2
0.023
◆
[mK]
Brightness temperature
Red : cosmology Blue : astrophysics
7. 21cm line emission
We can map the distribution of HI in the IGM with 21cm line.
Liu & Shaw (2020)
To describe 21cm signal statistically…
Redshift
8. 21cm line emission
We can map the distribution of HI in the IGM with 21cm line.
Liu & Shaw (2020)
21cm global signal: Sky-averaged 21cm line
signal
To describe 21cm signal statistically…
Redshift
9. Bouman et al 2018
21cm global signal?
•In 2018, the EDGES group reported the detection of 21cm line global signal. If it’s true, this
is the first detection of the high-redshift 21cm line! But...
Singhʼs slide
The reported signal cannot be explained by standard
cosmology and astrophysics!
10. Bouman et al 2018
21cm global signal?
•In 2018, the EDGES group reported the detection of 21cm line global signal. If it’s true, this
is the first detection of the high-redshift 21cm line! But...
Singhʼs slide
The reported signal cannot be explained by standard
cosmology and astrophysics!
SARAS3 did not detect signal
(Singh+2022)
11. Bouman et al 2018
21cm global signal?
•In 2018, the EDGES group reported the detection of 21cm line global signal. If it’s true, this
is the first detection of the high-redshift 21cm line! But...
Singhʼs slide
The reported signal cannot be explained by standard
cosmology and astrophysics!
SARAS3 did not detect signal
(Singh+2022)
Under debate! Need exotic physics?
mis-calibration? unknown systematics?
12. We can map the distribution of HI in the IGM with 21cm line.
Redshift
To describe 21cm signal statistically…
Liu & Shaw (2020)
21cm line emission
13. We can map the distribution of HI in the IGM with 21cm line.
21cm line power spectrum
h Tb(k) Tb(k
0
)i = (2⇡)3
(k + k
0
)P21
Redshift
To describe 21cm signal statistically…
Liu & Shaw (2020)
21cm line emission
14. HERA collaboration 2022b
Astrophysics from 21cm upper limits
Astrophysics with 21cm line signal is around the corner!?
Shimabukuro+ (2023)
•Current on-going telescopes put upper limits on the
21cm power spectrum and constrain astrophysical
parameters.
17. Application of ML to 21cm studies
Hey, ChatGPT! Tell me about the application of the ML in the context of 21cm studies.
18. Application of ML to 21cm studies
Hey, ChatGPT! Tell me about the application of the ML in the context of 21cm studies.
Seems correct!?
19. •Training network with training
dataset, ANN can approximate any
function which associates input and
output values.
y = f(x)
• Applying trained network to unknown
data(test data) for prediction.
yANN = f(xtest)
• ANN architecture consists of input
layer, hidden layer and output layer.
Each layer has neurons.
nonlinear regression problem
Artificial Neural Network (ANN)
24. Statistical challenge in 21cm cosmology
Cosmology 21cm cosmology
We can learn astrophysics if EoR
parameters are constrained.
25. Traditional approach
EoR parameters 21cm power spectrum
Semi-numerical
Simulation (e.g.
21cmFAST)
Likelihood calculation
MCMC(Markov Chain Monte Carlo)
Obtain posterior for parameters
•For likelihood calculation, we need to perform
a simulation for every parameter.
It takes too much time for MCMC!
(input) (output)
Greig & Mesinger (2015)
26. EoR parameters 21cm power spectrum
Likelihood calculation
Emulator
MCMC(Markov Chain Monte Carlo)
(input) (output)
Semi-numerical
Simulation (e.g.
21cmFAST)
27. EoR parameters 21cm power spectrum
Likelihood calculation
•We construct the ANN which can associate
parameters with 21cm power spectrum. It can
bypass to conduct simulations
Emulator
MCMC(Markov Chain Monte Carlo)
(input) (output)
28. EoR parameters 21cm power spectrum
Likelihood calculation
Obtain posterior for parameters
•We construct the ANN which can associate
parameters with 21cm power spectrum. It can
bypass to conduct simulations
•We can obtain posteriors with same accuracy
of using semi-numerical simulation.
Emulator
MCMC(Markov Chain Monte Carlo)
(input) (output)
Schmit+ (2018)
31. Parameter estimate (inverse problem)
(input)
21cmPS
(output)
EoR parameters
10
20
30
40
50
60
10 20 30 40 50 60
R
mfp,ANN
[Mpc]
Rmfp,true[Mpc]
10
20
30
40
50
60
10 20 30 40 50 60
ζ
ANN
ζtrue
1
10
100
1 10 100
T
vir,ANN
[K/10
3
]
Tvir,true[K/103
]
(Shimabukuro &
Semelin 2017)
•Our ANN returns EoR parameters for
given 21cm power spectrum.
•If parameters obtained by the ANN is
located on the solid line, the ANN
successfully returns correct parameters.
•It seems that the ANN works well.
32. Parameter estimate (inverse problem)
(input)
21cmPS
(output)
EoR parameters
10
20
30
40
50
60
10 20 30 40 50 60
R
mfp,ANN
[Mpc]
Rmfp,true[Mpc]
10
20
30
40
50
60
10 20 30 40 50 60
ζ
ANN
ζtrue
1
10
100
1 10 100
T
vir,ANN
[K/10
3
]
Tvir,true[K/103
]
(Shimabukuro &
Semelin 2017)
•Our ANN returns EoR parameters for
given 21cm power spectrum.
•If parameters obtained by the ANN is
located on the solid line, the ANN
successfully returns correct parameters.
•It seems that the ANN works well.
(See also Choudhury+2022)
34. Posterior inference with machine learning
•We usually DO NOT evaluate the uncertainty of machine learning itself. ANN just returns “points”.
p(θ|t0) ∝ ℒ(t0 |θ)p(θ)
•In MCMC, we need to calculate likelihood (and prior) to obtain posterior of parameters.
Likelihood
Prior
Posterior
Q1. Can we obtain posterior with ANN direct parameter
estimate?
Q2. We usually assume Gaussian and diagonal form as
likelihood. Is this choice reasonable?
35. Posterior inference with machine learning
(Zhao+ 2022a)
•Zhao+ (2022) suggested a “likelihood-free” approach (DELFI, Density estimation likelihood-
free inference) or simulation-based inference in 21cm study. They consider conditional density
distribution instead of likelihood.
p(θ|t0) ∝ ℒ(t0 |θ)p(θ)
Q1. Can we obtain posterior with ANN direct parameter estimate?
36. Posterior inference with machine learning
(Zhao+ 2022a)
•Zhao+ (2022) suggested a “likelihood-free” approach (DELFI, Density estimation likelihood-
free inference) or simulation-based inference in 21cm study. They consider conditional density
distribution instead of likelihood.
p(θ|t0) ∝ ℒ(t0 |θ)p(θ)
Q1. Can we obtain posterior with ANN direct parameter estimate?
37. Posterior inference with machine learning
(Zhao+ 2022a)
•Zhao+ (2022) suggested a “likelihood-free” approach (DELFI, Density estimation likelihood-
free inference) or simulation-based inference in 21cm study. They consider conditional density
distribution instead of likelihood.
p(θ|t0) ∝ ℒ(t0 |θ)p(θ)
Data: 21cm image
Q1. Can we obtain posterior with ANN direct parameter estimate?
38. Posterior inference with machine learning
(Zhao+ 2022a)
•Zhao+ (2022) suggested a “likelihood-free” approach (DELFI, Density estimation likelihood-
free inference) or simulation-based inference in 21cm study. They consider conditional density
distribution instead of likelihood.
p(θ|t0) ∝ ℒ(t0 |θ)p(θ)
Data: 21cm image
Q1. Can we obtain posterior with ANN direct parameter estimate?
39. Posterior inference with machine learning
(Zhao+ 2022a)
•Zhao+ (2022) suggested a “likelihood-free” approach (DELFI, Density estimation likelihood-
free inference) or simulation-based inference in 21cm study. They consider conditional density
distribution instead of likelihood.
p(θ|t0) ∝ ℒ(t0 |θ)p(θ)
Summaries:
EoR parameters( )
Tvir, ζ
Data: 21cm image
Q1. Can we obtain posterior with ANN direct parameter estimate?
40. (Zhao+ 2022a)
•Mixture density networks(MDN) (Bishop 1994)
•Masked Autoencoder for Density Estimation (MADE) (Papamakarios+ 2017)
(See also Alsing+2019)
MCMC sampling
Training dataset {θ, t}
Posterior inference with machine learning
We train neural networks with and obtain conditional density based on simulations.
{θ, t} p(t ∣ θ)
41. (Zhao+ 2022a)
•Mixture density networks(MDN) (Bishop 1994)
•Masked Autoencoder for Density Estimation (MADE) (Papamakarios+ 2017)
(See also Alsing+2019)
MCMC sampling
Training dataset {θ, t}
Posterior inference with machine learning
We train neural networks with and obtain conditional density based on simulations.
{θ, t} p(t ∣ θ)
42. (Zhao+ 2022a)
•Mixture density networks(MDN) (Bishop 1994)
•Masked Autoencoder for Density Estimation (MADE) (Papamakarios+ 2017)
(See also Alsing+2019)
MCMC sampling
Training dataset {θ, t}
Posterior inference with machine learning
We obtain the
probability distribution
We train neural networks with and obtain conditional density based on simulations.
{θ, t} p(t ∣ θ)
43. (Zhao+ 2022a)
•We can directly compare the posterior obtained from 21cm image map with the posterior obtained
from 21cm PS with MCMC.
Posterior from 21cm image and PS
21cm image map can provide tighter constraints on EoR parameters than 21cm PS
44. Exploring the likelihood
Prelogovic & Mesinger(2023)
Q2. We usually assume Gaussian and diagonal form as likelihood. Is this choice reasonable?
45. Exploring the likelihood
Prelogovic & Mesinger(2023)
•If we include a covariance matrix in Gaussian-
likelihood, parameter inference is improved.
Q2. We usually assume Gaussian and diagonal form as likelihood. Is this choice reasonable?
46. Exploring the likelihood
Prelogovic & Mesinger(2023)
•If we include a covariance matrix in Gaussian-
likelihood, parameter inference is improved.
•NDE(neural density estimator) with
Gaussian-mixture provides best constraints on
parameters.
Q2. We usually assume Gaussian and diagonal form as likelihood. Is this choice reasonable?
47. Exploring the likelihood
Prelogovic & Mesinger(2023)
•If we include a covariance matrix in Gaussian-
likelihood, parameter inference is improved.
•NDE(neural density estimator) with
Gaussian-mixture provides best constraints on
parameters.
The non-Gaussian likelihood including covariance
matrix is better than the Gaussian likelihood for
parameter inference from 21cm power spectrum.
Q2. We usually assume Gaussian and diagonal form as likelihood. Is this choice reasonable?
50. Recovering statistics from another statistics
•We can associate statistics with other statistics by the ANN. Can we associate statistics with
another statistics by the ANN?
Statistics A Statistics B
51. Recovering statistics from another statistics
•We can associate statistics with other statistics by the ANN. Can we associate statistics with
another statistics by the ANN?
Statistics A Statistics B
21cm power spectrum
52. Recovering statistics from another statistics
•We can associate statistics with other statistics by the ANN. Can we associate statistics with
another statistics by the ANN?
Statistics A Statistics B
21cm power spectrum Bubble size distribution
(BSD)
Shimabukuro+ (2022)
53. Recovering statistics from another statistics
•We can associate statistics with other statistics by the ANN. Can we associate statistics with
another statistics by the ANN?
Statistics A Statistics B
21cm power spectrum
54. Recovering statistics from another statistics
•We can associate statistics with other statistics by the ANN. Can we associate statistics with
another statistics by the ANN?
Statistics A Statistics B
21cm power spectrum 21cm global signal
(Shimabukuro+ in prep)
Preliminary
55. Foreground problem
Chapman & Jelic (2019)
•The 21cm line signal is buried in very intense foreground radiation (e.g. Galactic foregrounds).
Foreground: ~ a few-10 K
21cm signal: ~ a few-10 mK
3 - 4 order!
Very challenging issue!
Suggestions
•Foreground removal
•Foreground avoidance
56. Foreground removal with GPR
•Recently, Gaussian Process Regression (GPR) is employed in LOFAR (Mertens+ 2018,2020)
d = ffg + f21 + n
d ∼ N(m(ν), K(ν, ν))
The probability distribution is assumed to be expressed
by Gaussian, which is characterized by
covariance(and mean).
Hothi+ (2020) reported GPR has better
performance than GMCA and FastICA
for foreground removal (simulated sky).
58. • Machine learning approach is currently used in 21cm cosmology
Take away messages
59. • Machine learning approach is currently used in 21cm cosmology
Take away messages
•ML is mainly used in constructing emulator, direct parameter
estimation in the context of 21cm cosmology
60. • Machine learning approach is currently used in 21cm cosmology
Take away messages
•ML is mainly used in constructing emulator, direct parameter
estimation in the context of 21cm cosmology
•ML is also being applied in other 21cm studies (e.g. foreground
removal)
61. • Machine learning approach is currently used in 21cm cosmology
Take away messages
•ML is mainly used in constructing emulator, direct parameter
estimation in the context of 21cm cosmology
•ML is also being applied in other 21cm studies (e.g. foreground
removal)
21cm cosmology + machine learning
has big potential. Any idea??