Optimal Estimations of Photometric Redshifts and SED Fitting Parameters
1. Optimal Estimations of Photometric Redshifts and SED Fitting Parameters
By Julia Avez
Hunter College
2. In order to delve into the evolution of the universe, we strive to get the best possible
approximations of the photometric redshift, as well as Spectral Energy Distribution (SED) fitting
parameters. This will help clue us in on important physical properties such as the age, stellar mass, and
star formation rate of the galaxies.
Redshift is significant in that it is indicative of the evolution of the universe. As the universe
continues to expand, lengthening the frequency spectrum along the wavelength axis, the light traveling
from an astronomical object is stretched,or redshifted.
In our research,we work with the public photometric redshift code (EAZY),and compare its
accuracy to the spectroscopic redshift data provided by NASA (CANDELS) for 1338 astronomical
objects. Why do we compare the two codes? They both focus on the same parts of the sky (the part that
we use for our data does, at least), but we find that spectroscopic redshift is more accurate. We want to
use it to improve our photometric redshift code because photometry is less expensive and less time
consuming than spectroscopy. It would be unrealistic (and virtually impossible) to use spectroscopy to
measure the redshift for all areas of the sky.
Spectroscopy is based on emission and absorption lines of photons. It measures the intensity of
radiation (from a galaxy, for example) as a function of wavelength. Because atoms and molecules have a
unique spectrum, we can measure it using telescopes to determine the chemical color, composition and
properties of the astronomical objects (such as temperature and redshift). Spectroscopy is selective
because each resolution element only “accepts” photons in a very restricted range (4-5 Angstrom). This is
why creating the spectrum can be very time-consuming. Photometry, on the other hand, measures redshift
by estimating the distance of the galaxy using broad filters to view the brightness of the object. It is much
less selective, gathering many photons within hundreds or thousands of Angstroms of each emission line,
3. and uses less time than spectroscopy, to gather an equal amount of photons. This is the main reason that
we use photometry; since it also gives us a less exact shape,whereas spectroscopy is very precise.
Because we cannot point telescopes for extremely long amounts of time at every galaxy (most of
which are too faint and far away to measure the spectra),we use the easier method of photometry to
measure the spectra at known emission wavelengths (using filters).
In order to compare the results gathered from the CANDELS survey vs. EAZY,we created
catalogs that contain photometric fluxes and errors seen in the filters, as well as their spectroscopic
redshift. The flux is the amount of photons that pass through a specific band, or filter. This will tell us the
luminosity of the galaxy so that we may understand the distance.
Equally important as the flux in each filter is the error computed along with it, and being sure
about how much you may deviate from the correct flux. For this reason, we have tested the CANDELS
galaxy catalogs at different levels of systematic uncertainties.
We found that that the photometric redshift values in EAZY closely match the spectroscopic CANDELS
redshift, catalogued with the appropriate errors, of course.
Accounting for systematic uncertainties was the next big step in our research. To make sure that
our uncertainties are accurate,we model the estimates for our systematic uncertainties on certain
percentages of the flux. Varying the percent errors in certain bands would bring us closer to our ideal
values of the 68% confidence region (which is one sigma of the width of the Gaussian we are modeling
to), and 95% confidence region (two sigma). I varied the three bands separately, one at a time, to isolate
the effects of each. The bands are ultraviolet to optical, near infrared, and IRAC.
I ended up doing a total of 13 trials. I managed to get into the 68% region very well, but not as
well for the 95% region, having reached a limit due to the outliers. After varying the three bands many
4. times, it became evident that it no longer made a difference how much I would vary them, and I was not
able to get past 89-91% in the 95% region in the best case.
The table below illustrates our trial and error process for the photometric redshift.
Photometric errors Number of objects in 68%
region
Number of objects in 95%
region
Trial 1 61 85
Trial 2 64 87
Trial 3 81 94
Trial 4 71 91
Trial 5 74 91
Trial 6 70 91
Trial 7 70 91
Trial 8 69 91
Trial 9 60 87
Trial 10 64 88
Trial 11 68 90
Trial 12 68 89
Trial 13 68 89
We can see that trial 11 is our best-produced trial. Of course, we isolated the effects of the
outliers to get our best case (trial 11) as best as we could by not including them in the number of objects
5. (total = 1338). We did this by setting up a new counter that would only count the objects that were not
outliers. This reduced our count of not-outliers to 1232 in trial 11.
Next, we used our best trial and set up the probability distribution of parameter redshift (pz) and
the redshift as computed by the grid parameters (z). This gave us the probability distribution functions for
the photometric redshift, which we use as SED fitting parameters. We then fixed the redshift errors at
these values, and shall use a code (such as Professor Acquaviva’s SpeedyMC) to do SED fitting. It is our
hope that this will help improve our models of UV/Optical/IR galaxies, and fitting their SED’s will help
us find out more about their physical properties.
We then created a routine called probdist.py, which opens 'filename' (filename stands for all of
the files in OUTPUT that end in .pz - basically the redshift probability distributions for each of the 1338
astronomical objects) for reading. We write to our created file; candels.txt, which uses the z and pz values
from filename. This will graph p(z) (the probability distribution of the parameter redshift for 1338
objects, or galaxies) vs. the redshift. The candels.txt files should ideally match up to the .pz files, which
graph the flux vs the wavelength.
We check what the graph looks like by using gnuplot, and use only columns 1 and 4 for the .pz
files (z and z_p, which correspond to z and pz, respectfully). An example of the correspondence between
a candels.txt file and .pz file is graphed below using gnuplot, for the first galaxy, 1.pz and 1candels.txt.
6. They do match up ideally, with the .pz distribution drawn with a line, and the .txt plotted as stars.
We try to fit this redshift probability distribution to a Gaussian, and we do this by finding the mean and
standard deviation. For this, I created a module that will compute the average,called average.py.
We found the average of the X values, which gives an estimate of the mean. It is our initial for the
mean of the curve when we are fitting. The module stdev.py excludes those redshift probability
distributions that are less than .001 to give us better values, and this also reduces the amount of galaxies
that we are using to the amount that have the desired redshift probability distribution value.
We then proceed to use POPT (probability output; this is the name given to our vector list with
the amplitude, mean and standard deviation), which optimizes our parameters and uses a curve_fit
function to fit to a Gaussian. We assigned a, x0 and Sigma to POPT (a is normalization [amplitude or
height], x0 is the average,or center of the curve, and sigma is the standard deviation, or width of the
curve). These values are used in our Gaussian module, which utilizes the equation of a Gaussian (and
stdev.py calls upon this module to print POPT,and provide us with the values of a, x0, and sigma). We
7. then use these values in the equation of a Gaussian (a function of the form
where a is the amplitude, b is the average and c is the standard deviation) in gnuplot. Of course, all this
information is also written in the file that we called 4colampavgstdev.txt, and can be looked up from
there. It has an index, followed by the amplitude, average,and standard deviation of each galaxy.
The curve fit did not work for every single galaxy. Galaxy 6 in particular had a negative
amplitude, and Galaxy 11 and 566 had a negative average. Although we looked at each of these
“negative” galaxies in particular, we could not figure out just why there was a negative amplitude and
negative averages. This is likely due to a system error.
There was a degeneracy; or two possible solutions for galaxies 88, 684, 991, and 1203. The graph
of the curve fit for galaxy 88 can be seen below. When we plug in the values of the values of a, x0, and
sigma for this galaxy (using gnuplot), we can see just how far off the distribution for this galaxy is from a
Gaussian distribution (the fit in dotted lines is what it should ideally look like).
8. The graph above can be compared to the curve fit of 1.pz below, where the actualdistribution is much
closer to the ideal Gaussian curve fit in the dotted line.
Having some worthy distributions and other very “off” distributions grouped together was no
longer appropriate. We needed to select a group of the best to work with. In our final stage,we worked
with the “golden galaxies”; called golden because they are the ones that have the best values (they have a
signal to noise ratio of at least 8 (flux/error > 8) in at least 13 bands). The signal to noise ratio of 8 was
chosen because it leaves us with a nice amount of the 1338 galaxies - 585 of the 1338 galaxies are part of
the gold sample. I copied the file of probability distributions in order to have the best possible redshift
probability distributions for these 585 galaxies. I then reused the module I used for Gaussian fitting for all
1338 galaxies, for just the gold sample. The proof that this worked well was that all objects were fit to the
curve, with the exception of number 304 (which corresponds to 684.pz), which was the only one of the
other degenerate objects which was included in the gold sample. There is nothing wrong with this,
though. The error in the curve fit is due to the degeneracy – two possible yet veritable solutions.
We have made much progress in our research from when we had started until now. We computed
9. the optimal estimation of errors for all of the bands, and then we isolated the outliers to further improve
upon our results. We then used our best trial to create probability distribution functions for the
photometric redshift, which we use as SED fitting parameters. Fitting the galaxies using the equation of a
Gaussian for a better curve fit, as well as reducing the number of galaxies to include only the golden
sample, paves the way for future improvement of our estimation of errors.
Acknowledgements
Professor Viviana Acquaviva – Research Mentorand Programming Advisor
http://stackoverflow.com/questions/14459340/gaussian-fit-with-scipy-optimize-curve-fit-in-python-with-
wrong-results
http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
http://stackoverflow.com/questions/17403371/how-can-i-fix-valueerror-too-many-values-to-unpack-in-
python
http://en.wikipedia.org/wiki/Gaussian_function
http://scipyscriptrepo.com/wp/?p=76
EAZY: http://www.astro.yale.edu/eazy/