Emission Line Objects in Integral Field Spectroscopic Datacubes

Emission Line Objects in Integral Field Spectroscopic
Datacubes
Master-Thesis1
Edmund Christian Herenz2
Humboldt-Universit¨at zu Berlin
November 9, 2011
1
Supervisor: Lutz Wisotzki (Leibniz Institute for Astrophysics Potsdam)
2
B: cherenz@aip.de

Astronomy compels the soul to look upwards
and leads us from this world to another.
Plato

Contents
1 Introduction 3
1.1 The discovery of high-redshift galaxies . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Emission Line Selection with Narrow Band Imaging . . . . . . . . 4
1.1.2 Color Selection (Lyman Break Technique) . . . . . . . . . . . . . . 8
1.2 Galaxy Evolution at High Redshifts: Global Trends . . . . . . . . . . . . 11
1.2.1 Some Open Questions . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3 Aim and Structure of this Work . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Integral Field Spectroscopy with MUSE 17
2.1 Integral Field Spectroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 MUSE - Overview of Instrument Characteristics & Performances . . . . . 20
2.3 Representation of Integral Field Spectroscopic Data in the FITS File Format 24
3 Dry Runs for MUSE 31
3.1 QSim - Generating MUSE Datacubes from Astrophysical Input . . . . . . 32
3.2 Instrument Numerical Model . . . . . . . . . . . . . . . . . . . . . . . . . 33
4 Building a Catalog of Emission Line Objects in (simulated) IFS Data 37
4.1 Cutting an IFS Cube into Smaller Pieces . . . . . . . . . . . . . . . . . . 37
4.2 Masking Continuum Objects in IFS Datacubes . . . . . . . . . . . . . . . 38
4.2.1 Collapsing an IFS Cube . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.2 Mask Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 Matched Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.1 Matched Filtering - Preparatory Considerations . . . . . . . . . . . 51
4.3.2 Spatial Cross-Correlation . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.3 Spectral Cross-Correlation . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.4 Examples of Emission Line Objects - Before & After Matched
Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4 Detecting and Cataloging Emission Line Objects . . . . . . . . . . . . . . 65
4.5 Measuring Emission Line Fluxes . . . . . . . . . . . . . . . . . . . . . . . 67
4.6 Possible Future Improvements . . . . . . . . . . . . . . . . . . . . . . . . . 68
5 Closing the Loop 70
5.1 Estimating Type I and Type II Errors . . . . . . . . . . . . . . . . . . . . 70
5.2 Analysis on QSim Datacubes . . . . . . . . . . . . . . . . . . . . . . . . . 77
1

6 Summary, Conclusions and Outlook 85
6.1 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
A Supplementary Material 87
A.1 Conversion Between World Coordinates and Array Coordinates for IFS
Data in the FITS Forma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
A.2 A Vectorized Algorithm for Spectral Convolution . . . . . . . . . . . . . . 91
A.3 First Catalog of possible Emission Line Sources detected in the QSim
Shallow Survey simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
A.3.1 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
A.3.2 Graphical Representation . . . . . . . . . . . . . . . . . . . . . . . 105
A.4 Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
A.4.1 cut-out-cube.py - Cut spatially a rectangular subcube out of a
larger cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
A.4.2 cube-collapse.py - A Script to Collapse QSim Datacubes . . . . 112
A.4.3 signal to noise mask.py - Creation of Datacube Masks from
S/N continuum maps . . . . . . . . . . . . . . . . . . . . . . . . . 113
A.4.4 prep-sex.py & mask3D.py - Creation of Datacube Masks from
SExtractor Segmentation Maps . . . . . . . . . . . . . . . . . . . . 114
A.4.5 spatial smooth parallel.py - Spatial Convolution of all Wave-
length Slices with a 2D Gaussian . . . . . . . . . . . . . . . . . . . 117
A.4.6 wavelength smooth paralell.py - Wavelength Convolution of all
Spaxels with a restframe fixed 2D Gaussian . . . . . . . . . . . . . 118
A.4.7 Benchmarks for wavelength smooth paralell.py &
spatial smooth parallel.py . . . . . . . . . . . . . . . . . . . . 118
A.4.8 extract-sex-spectra.py - Extraction of Spectra with SExtrac-
tor Segmentation Maps . . . . . . . . . . . . . . . . . . . . . . . . 120
A.4.9 s2n-cube.py, s2n-cut.py & emlinecat.py - Building a Catalog
of Emission Line Sources using a Detection Significance Threshold
in Cross-Correlated Datacubes . . . . . . . . . . . . . . . . . . . . 122
A.4.10 estimate-flux.py - Measuring Fluxes of several Objects from an
Input Catalog with fixed Apertures in a 3D datacube . . . . . . . 125
B References 127
Danksagung 133
Eigenständigkeitserklärung 134
2

Chapter 1
Introduction
Major progress in astronomy is often related to the use of new instruments and tech-
niques. For example the Hubble Space Telescope (HST) has changed dramatically our
view of the universe since it launch in 1990 (Dalcanton 2009). Major contributions to
our understanding of the universe have also been made with the current generation of
ground based telescopes, like ESO’s Very Large Telescopes (VLT), which went 1999 into
scientific operation (Renzini 2009). An advantage of ground based telescopes is that old
instruments can cost-effectively be replaced with new ones thus enabling new scientific
opportunities. One of those 2nd generation instruments for the VLT will be the Multi
Unit Spectroscopic Explorer (MUSE), an wide-field integral field spectrograph (IFS)
operating in the visible part of the electromagnetic spectrum. (McDermid et al. 2008;
Bacon et al. 2009; Bacon et al. 2010).
One of the main and most challenging scientific drivers that motivated the construc-
tion of this instrument concerns the physical properties of faint Lyman-α (Lyα) emitting
galaxies (LAEs) at high redshift and how this type of galaxies is related to other high
redshift galaxy populations. As it is mostly the case in photon-starved astronomy, most
of the open questions are related to limitations in current instrumentation, and some will
be answered when MUSE is going online. To provide the necessary physical background,
Sect. 1.1 will review some of the aspects of observational high-redshift galaxy science.
This section will conclude rasing these questions.
Hyper-spectral datasets produced by IFSs are, due to their 3-dimensional nature,
different than those conventionally produced in optical astronomy (i.e. classical imaging
or spectra). Unprecedented by any IFS before, MUSE will produce 90000 moderate
resolution optical spectra simultaneously in a spatiall contigous field of view (FOV) of 1
arcmin2. Such datasets will enable new types of analyses, coupling the discovery poten-
tial of astronomical imaging with high-quality spectroscopy. Some of the questions raised
in Sect. 1.1 will be answered by these analyses which however require new methods and
data-processing techniques. The major goal of this Master Thesis is to present, imple-
ment and test a method capable of automatically finding emission line objects (ELOs)
in such datasets. The methods will be further tested on simulated MUSE observations.
I will present the aim and the structure of this work in more detail in Sect. 1.3.
3

1.1 The discovery of high-redshift galaxies
One of the primary goals of astrophysics is to understand the formation and evolution
of galaxies (Shapley 2009). There are many open questions surrounding the connection
between high redshift galaxies and their counterparts in the local universe. This situation
may seem unsettling at first, however one has to recall that observational studies of high-
redshift galaxies became possible only ∼15 years ago with the then new generation of
ground- and space-based instruments. Preceeding this period the only objects that could
be studied at significant look-back times where luminous quasars and radio galaxies.
These are rare objects tracing the most massive structures in the universe. Back then
no normal star-forming galaxy was known that remotely approached the redshifts of
these objects, a situation which has dramatically changed. Now, the known number
of star-forming galaxies within the first billion years of the universe surmounts that of
known quasars in this epoch. Until recently, the galaxy with highest known (reliable,
i.e. spectroscopically confirmed) redshift was a Lyα emitter (LAE) at z = 6.69 (IOK-1
Iye et al. 2006; Ota et al. 2008, see also Fig. 1.3). Then, but only for a short time, the
record holder was again a quasar at z = 7.085 (Mortlock et al. 2011), which has recently
been surpassed by a high-redshift galaxy at z = 7.109 (Vanzella et al. 2011).1 Note that
the two last mentioned discoveries were made during the work on this thesis!
This “expansion of exploration” (Ellis 2008) we are witnessing is a huge achievement.
Nevertheless, sizeable samples at these very high redshifts (z 6.5), that would enable
us to draw firm physical conclusions, do not exist. At lower redshifts large samples have
already been constructed and global evolutionary trends have been identified. Before
reviewing some of these (Sect. 1.2), I will give an overview of two common methods that
have been used to identify high-redshift star-forming galaxies. Since more relevant for
this thesis, I will focus on the narrow-band imaging technique that targets the emission
line of Lyα-emitting galaxies (Sect. 1.1.1) but since also galaxies found by color selection
criteria show in some cases Lyα emission I also briefly explain the so-called “Lyman
Break” technique (Sect. 1.1.2). The connection between these two classes of objects is
still matter of open debate.
It is known that the methods used so far do not produce complete samples in terms of
physical quantities, i.e. star formation rates or stellar mass. Indeed, resolving this issue
is one of the tasks to be tackled with future observational techniques, such as wide-field
integral field spectroscopic surveys with MUSE, to the preparation of which this work
will (hopefully) contribute.
1.1.1 Emission Line Selection with Narrow Band Imaging
The search for galaxies in the early universe was stimulated as a consequence of a seminal
theoretical reasoning by Partridge and Peebles (1967). Aiming at an assessment of
possible detection signatures of galaxies in a young evolutionary phase, their calculations
showed that galaxies must have been much brighter than they are now. Furthermore the
authors showed that 6-7 % of the total flux would be converted to Lyα radiation. This
is inferred from the idea that young galaxies must contain a large amount of young hot
(short-lived) stars, which produce a high number of hydrogen ionizing photons. These
photons then ionize the interstellar hydrogen which in turn recombines due to the low
1
There also exists a claim to have spectroscopically identified a galaxy at z = 8.6 (Lehnert et al. 2010).
However, the published spectrum is very noisy and the identification remains highly controversial.
4

densities present in this medium. As a result of this recombination cascade 2/3 of the
total radiation at wavelengths shorter than λ = 912 ˚A (10 % of the bolometric luminosity,
see also Sect. 1.1.2) would be converted into Lyα line radiation. This means that the
flux density of the Lyα line will be much higher compared to the surrounding continuum
(Fig. 1.1). In retrospective, their quantitative conclusion that 5 min integration time on
a D = 2m telescope and employing a narrow band filter would separate this objects from
the sky-background was overoptimistic, since it took three decades until the first galaxy
was found by the virtue of its strong Lyα emission (Hu and McMahon 1996, Fig. 1.2) in
a narrow-band imaging campaign. This is mainly due to the fact, that the predictions
assumed a monolithical gravitational-collapse as galaxy forming process, i.e. all mass
was assembled at once and the thus all the stars where formed in an instantaneous star-
burst. Furthermore the resonant nature of the Lyα line makes this emission susceptible
to small amounts of dust in the interstellar medium.
The main idea in narrow-band imaging campaigns targeting emission line galaxies is
to compare narrow-band (∆λ ∼ 10−100 ˚A) images with off-band or broad-band images,
the latter being often obtained in “Deep Field” programs of particular dark regions on
the sky (e.g. the Great Observatories Origins Deep Survey - GOODS, Dickinson et al.
2003). In the simplest case emission lines from galaxies with a spectrum like the one
presented in Fig. 1.1 will give a conspicuous excess in the narrow-band filters, while being
faint or even undetected in neighboring broad-band filters (examples of such detections
are shown in Fig. 1.2 and 1.3). More complicated criteria can be formulated when more
filters are available (Fig. 1.4). Furthermore, the formulation of such criteria is often
aided by theoretical calculated model spectra.
Almost all known LAEs to date were found via the narrow-band imaging technique.
But imaging alone provides just a catalog of plausible LAE candidates. Usually, due
to time pressure on large telescopes, only a small subset of these candidates can sub-
sequently be spectroscopically confirmed or rejected. These spectroscopic follow-ups
are usually used to make estimates of the statistical completeness and the level of con-
tamination of the photometrically obtained sample. The number of spectroscopically
confirmed LAEs thus is just ∼ 20% up to z≃6.5 and there exists only one unique robust
spectroscopic detection of a narrow-band selected object at z > 6.5 (Ota et al. 2008, ,
photometric appearance of this object presented in Fig. 1.3).
The huge observational efforts carried out in the last decade resulted so far in ∼ 2000
photometrically detected LAEs at 3 z 6.5 (reviewed in Taniguchi et al. (2003) and
Herenz (2009), the latter being my Bachelor Thesis). Above z ≃ 6.5 observations are
more challenging, since here the Lyα line shifts into the near infrared domain, where
the night sky is intrinsically brighter. Nevertheless ∼ 300 LAEs are currently known
with z 6.5 (e.g. Ouchi et al. 2010; Kashikawa et al. 2011). The observations become
even more challenging at higher z, since then Lyα shifts near the 1.1µm band gap
of Si, so conventional Si-based CCD detectors cannot be used. The situation is further
complicated by the reduced flux due to increasing distance of these targets. Nevertheless,
in recent years several wide field IR sensors have been deployed at telescopes, which now
also start being used for LAE surveys, but the samples obtained so far are not statistically
robust (see studies by Hibon et al. 2010, 2011, which contain only ∼ 10 objects each).
Besides the already mentioned problem of having only a small subset of found emit-
ters spectroscopically confirmed, there are several other problems and limitations con-
nected to the narrow-band imaging technique. To mention a few important ones:
5

Figure 1.1: Semi-quantitative prediction from Partridge and Peebles (1967) on the ex-
pected spectra from primeval galaxies. Depicted is the most extreme case, where all
ionizing photons would have been converted into Lyα radiation, resulting in a line emis-
sion containing 6-7 % of the total ﬂux. See also Fig. 1.5.
Figure 1.2: From Hu and McMahon (1996): Comparison between a narrow-band image
(left) with a broad-band image (right) each showing a 3.5′′ × 3.5′′ region around the
Quasar BR2237-0607. The total integration time for the narrow-band image was 20.8h
on a 2.2m telescope, and for the broad-band image 1h on the 10m Keck telescope. All,
but 3 objects detected in the narrow-band image have counterparts in the broad-band
image. Subsequent spectroscopy (4h integration on the 10m Keck telescope) of these 3
objects revealed that one object is a foreground [OII] emitter (LA10), while the other
two objects are indeed LAEs at z = 4.55.
6

Figure 1.3: Thumbnail (20′′ × 20′′) images for z ∼ 7 LAE candidates IOK-1 and IOK-
2 from Ota et al. (2008). Shown are broad-band filters B,V,R,i’,z’ and narrow-band
filters NB816, NB921 and NB973 (λcentral = 8160 ˚A, 9196 ˚A and 9755 ˚A respectively,
∆λ ∼ 200 ˚A. This figure illustrates the most simple form of the narrow-band imaging
search technique for Lyα emitters. The selection criterion was a non-detection in all
broad- and narrow-band filters, except NB973. Model calculations by the authors showed
that only z ∼ 7 Lyα-emitting galaxies would pass this criterion. In a 876 arcmin2 field
this was only the case for the two objects shown. In a subsequent campaign for both
objects spectra were taken on the Subaru 8m Telescope (integration time: 8.5 h for
the brighter object IOK-1 and 11 h for IOK-2). After analysis of the spectra only
IOK-1 could successfully be confirmed (measured line flux of FLyα = 2 × 10−17 erg s−1
cm−2, corresponding to a Lyα luminosity of LLyα ∼ 1043 erg s−1 assuming the current
concordance cosmology). This object was (until recently) the galaxy with the highest
redshift.
Figure 1.4: Sketch explaining the narrow-band imaging technique for z = 5.7 Lyα
emitters. A simplified spectrum of a hypothetical LAE is shown with the black curve.
The dashed curves represent different filters (V, R and i are broad-band filters and NB816
is the narrow-band filter at 8159 ˚A). This emitter would produce a strong detection in
the NB816 filter, while the measured flux density in the i and R filter will be relatively
small. Moreover, the measured flux density is higher in the i than in the R band. Finally,
there will be now flux detected in the V band. These ideas form the basis of the selection
criteria used in NB imaging surveys for LAEs.
7

• Since the necessary signal to noise ratios for detection of faint emission line objects
in narrow-band images can only be reached in atmospheric bands where the telluric
emission (or absorption) is minimal the technique is limited to these atmospheric
“windows”.
• Only a small-range in redshift, typically ∆z 0.1, can be studied with each
narrow-band filter. This means, even if these surveys image large regions on the
sky (usually several hundred arcmin2 for the deep surveys, and several thousand
arcmin2 for the shallow surveys), the corresponding surveyed volume still is not
very large. Combined with the above point this also means that a large fraction
of redshift space is still not studied - only a few narrow slices.
• The limiting flux (i.e. the flux down to which these samples are believed to be
complete) reached in these studies still enables one to study only the brightest of
these emitters. The “faint-end” of this population remains still a mystery.
• The photometric criteria are trimmed to search for strong line emission, which
means that only galaxies above a certain equivalent width are selected, thus weak-
lined continuum-dominated galaxies are systematically missed.
• Line flux estimates from narrow band images are not correct for emitters that fall in
the wings of the narrow-band filter, which generally have non-square band-passes.
Furthermore these non-square band-passes complicate statistically analyses of the
samples.
These (and more) issues related to narrow-band searches for LAEs still hamper our
understanding of this important population of high-redshift galaxies, but will be solved
with future deep and wide integral field spectroscopic surveys.
1.1.2 Color Selection (Lyman Break Technique)
With the new class of large telescopes (and HST) arriving in the mid-90s the redshift
boundary to which galaxies could be observed was moved from z ∼ 1 to z ∼ 6. The
primary observational technique most prominently used to gather a large number of
high-redshift galaxies is based on the expected colors of these sources (Giavalisco 2002).
As explained in the last section, Partridge and Peebles (1967) predicted that strong
Lyα emission could be a prominent feature of young galaxies. The underlying assump-
tion for this was an non-uniform distribution of neutral hydrogen in the model galaxy
(optically thick clouds with 10-100 atoms/cm3) that reprocesses the ionizing radiation
to Lyα. Nevertheless they also noted that if the distribution of all hydrogen would be
uniform most of the ionizing radiation would escape, thus the spectral appearance would
not differ from a superposition of the stars. Such a spectrum then would be dominated
by young-hot stars. Their semi-quantitative prediction of such a spectrum is shown in
Fig. 1.5, with a strong absorption of flux at wavelengths shorter than 912 ˚A. They con-
cluded however that their primeval galaxy would be too faint in surface brightness to be
observable at high redshifts, if no conversion of ionizing photons to Lyα-photons would
take place. This pessimistic prediction is a consequence of the their monolithical-collapse
model, since the starburst would have been initiated as the galaxy collapsed and thus
the radius of the galaxy would be very large, i.e. albeit the too high mass in this model
the surface brightness would still be faint. As noted above, the now well established
8

Figure 1.5: Semi-quantitative prediction from Partridge and Peebles (1967) on the ex-
pected spectrum from a primeval galaxy for an assumption of uniform hydrogen distri-
bution in their toy model galaxy. In this case the spectrum would simply be a super-
position of all stellar spectra and the flux would be dominated by the young O- and
B-type stars of Teff ≈ 30.000 K. Partridge & Peebles approximated this as a black-body
spectrum of this temperature. The hydrogen ionization edge 912 ˚A forms the “Lyman
Break”, an absorption feature in atmospheres of massive stars. In Fig. 1.1 all radiation
at shorter wavelengths than the Lyman Break had been converted in Lyα flux under
the assumption that the neutral hydrogen is distributed in optically thick clouds with
10-100 atoms/cm3.
hierarchal galaxy formation model predicts an opposite scenario for galaxies in the early
universe (mostly compact, low-mass systems).
The strong absorption feature below 912 ˚A (i.e. at E > 13.6 eV) is called the
Lyman Break. It originates mainly from hydrogen in the stellar atmospheres of short-
lived O- and B-type stars as depicted in Fig. 1.5, but for real starburst galaxies at high
redshifts two other effects will contribute in making this effect even more pronounced:
Absorption from interstellar neutral hydrogen and from intervening hydrogen clouds in
the intergalactic medium. Moreover, the latter effect is also responsible for a drop in the
spectrum at shorter rest-frame wavelengths than λLyα = 1216 ˚A because of the resonant
nature of the Lyα line.
Using a set of suitable broad-band filters, one can now target galaxies for character-
istic flux ratios (termed color indices) originating from the expected spectral appearance
of the galaxy, especially the Lyman Break. This is exemplarily shown for galaxies at
z = 3.3 and z = 4.3 in Fig. 1.6. Galaxies found with this technique are called Lyman
break galaxies (LBGs) or dropouts (since they disappear in the pass-bands at shorter
wavelengths). The quantitative formulation of the photometric selection criteria is aided
by model spectra of stellar populations in combination with prescriptions for internal
dust-reddening and intergalactic hydrogen absorption. Furthermore, by using differ-
ent filter-sets in different combinations, the method can be tuned to select galaxies at
different redshift intervals.
Deep and Medium-Deep Field campaigns usually give ∼ 1000 − 10000 high-redshift
(2 z 6) galaxy candidates in areas of ∼ 10-100 arcmin2 when employing the Ly-
man Break technique. Ongoing and future surveys with wide-field IR instruments will
9

(a) z = 3.3 (b) z = 4.4
Figure 1.6: Graphical illustration of the idea behind the Lyman Break technique (from
Giavalisco 2002): In both panels the transmission curves of the Un (blue), G (green),
R (red) and I (pink) filter bands from a ground based survey are shown and the black
curve shows a redshifted model spectrum of a galaxy at z = 3.3 (left) and z = 4.3
(right). The dashed black line shows the calculated model spectrum without the effect
of intergalactic hydrogen absorption included. Galaxies at z = 3.3 will be characterized
by a “drop-out” in the Un band, while they would appear redder when comparing the
R and the G filter. Galaxies at z = 4.3 show a similar behavior, but here the Un, G and
R filter set is replaced by G, R and I.
improve the number statistics at higher redshifts. E.g. utilizing only the first epoch
of observations in the Hubble Ultra Deep Field with the new “Wide Field Camera 3”,
installed on the Hubble Space Telescope during the last service mission, Bouwens et al.
(2010) extended the drop-out technique to the near-infrared, finding 5 sources2 with
z ∼ 8 − 8.5.
As it is the case with LAEs selected by narrow-band imaging, the Lyman-break color
selection technique gives only a list of plausible candidates for high redshift galaxies. To
confirm the detection, measure accurate redshifts, and also show the reliability of the
photometric selection criteria spectroscopic follow-up observations of sub-samples of the
candidate list are a must. Currently ∼ 1000 spectroscopic confirmations of LBGs exist,
with most of them being at z 3 (Appenzeller 2009).
There are several other problems related to observations of this population of high-
redshift sources. To mention only a few:
• An accurate determination of the redshift from the photometric information is not
possible.
• Galaxies forming stars also produce significant amounts of dust that redden the
spectral appearance, thus in turn affect the probability of a galaxy making it into
a photometrically selected LBG sample. Furthermore, the wavelength-dependent
dust attenuation is not well known at these high redshifts, and assumptions usually
used for formulation of the selection criteria still need to be tested.
2
Recently, one of these sources was even claimed to have been spectroscopically confirmed (Lehnert
et al. 2010), but as explained in Footnote 1, this confirmation remains highly controversial.
10

• No generally accepted in formulating the selection criteria exists, therefore making
it difficult comparing the samples created by different groups.
1.2 Galaxy Evolution at High Redshifts: Global Trends
Having obtained sizeable samples of galaxies at high redshifts with the methods described
in last both sections, it is now possible to compare observed properties with local galaxies.
Moreover, the complicated processes that are involved in the formation of galaxies require
identification of evolutionary trends with redshift. In this section I summarize some of
the main results that have been obtained so far.
The morphological distribution of galaxies in the local universe can be put into
the “Hubble Classification Scheme”, which basically distinguishes between spiral- and
elliptical galaxies. In addition there exist galaxies that do not fit into this scheme,
highly irregular structures that most likely are a result from interactions between two
(or more) galaxies. Since the angular sizes of high-redshift galaxies are small, the best
results regarding the morphological properties of these galaxies come from studies with
the Hubble Space Telescope (see e.g. recent results utilizing the new Wide-Field Camera
3 on the Hubble Space Telescope by Law et al. 2011, Fig. 1.7). The observational
result of these studies is that above z ≈ 2 the fraction of regular galaxies is rapidly
decreasing, whereas the fraction of highly irregular galaxies, as well as multi-component
systems increases (Fig. 1.7). At z ≈ 4 there are almost no galaxies found that resemble
structures that can be classified the Hubble scheme (Beckwith et al. 2006). This, in
combination with the smaller physical extents of the galaxies, is in good agreement with
the hierarchical build-up scenario predicted from theoretical cold-dark matter models.
Figure 1.7: Examples of rest-frame optical morphologies of galaxies at z ∼ 2 − 3 from
a sample by Law et al. (2011). Shown is a visual classification into 3 types: Type I ˆ=
regular single nucleated source, Type II ˆ= two or more distinct nucleated sources, Type
III ˆ= highly irregular objects. More than half (58 %) of the Objects in this sample
belong to Type II or III. The thumbnails are 3′′×3′′ in size.
11

One of the most fundamental statistical functions that can be compared with theoret-
ical models of galaxy formation is the luminosity function (LF) Φ(L) dL (see Johnston
2011, for a contemporary historical and methodical review). This distribution describes
the relative number of galaxies with different luminosities, such that the observed num-
ber dN of galaxies within a luminosity range from L to L + dL within a volume interval
dV is given by
dN = Φ(L) dL dV (1.1)
The units of Φ(L) are in comoving Mpc−3 (or, if no explicit value for the Hubble pa-
rameter H0 is assumed, in h3Mpc−3). For studies of galaxy formation and evolution the
luminosity function is usually parameterized by an analytic representation Φ(L) dL that
was introduced by Schechter (1976), namely
Φ(L) dL = φ∗ L
L∗
α
exp −
L
L∗
dL
L∗
, (1.2)
where the parameter α < 0 defines the faint-end slope3 and φ∗ is the normalization
factor defining the overall density of galaxies (per comoving Mpc3) at the characteristic
turnover point L∗. At L∗ Eq. (1.2) changes from faint-end power law behavior to the
bright-end exponential law. Transforming Φ(L) via
L
L∗
= 10−0.4(M−M∗)
(1.3)
to absolute magnitudes, a unit which is convenient when expressing luminosity densities
measured in broad-band filters, yields
Φ(M) = 0.4 × ln(10) × φ∗
×
100.4·(M∗−M)·(α+1)
exp(100.4(M∗−M))
(1.4)
Much observational effort has been devoted to measure the parameters φ∗ , L∗ (or
M∗) and α of the high-redshift galaxy populations introduced in Sect. 1.1.1 and 1.1.2
in order to constrain the evolution with z. Redshift dependence of φ∗ would imply a
number evolution and for L∗ a luminosity evolution of the population. A negative (pos-
itive) evolution in α would imply that the fraction of low-luminosity galaxies increases
(decreases).
There is consensus in the literature that the Lyα line luminosity function for LAEs
seems not to evolve between redshifts z ∼ 3 and z ∼ 6, however so far only the bright end
of this function has been measured (Herenz 2009). Complicating is the fact that different
groups used different selection methods for their samples, hindering easy comparison
between them. Nevertheless, a recent observational narrow-band imaging effort in the
Subaru Deep Field by Ouchi et al. (2010) created the currently largest photometric LAE
sample (207 objects) at z = 6.6. This study shows in comparison to their previous
LF-determinations at z = 3.1, z = 3.7 and z = 5.7 in the same field (Ouchi et al.
2008) a decrease in characteristic luminosity, while the overall density seems to remain
constant. Since the faint-end distribution of LAEs could not be constrained, the slope
α was arbitrarily set to a constant value for the determination of this result.
Contrary to the LAE LF, an evolution of the Schechter function parameters with
redshift has been observed for the UV continuum of Lyman-break selected galaxies.
3
Note that sometimes in the literature a different sign-convention is used for α in Eq. (1.2).
12

Here the characteristic luminosity rises towards earlier cosmic epochs (i.e. from z ∼ 1
to z ∼ 4), before it decreases again with higher redshift (z 4) (Bouwens et al. 2007).
This evolution of characteristic luminosity is summarized in Fig. 1.8. Since the errors
are still large, the evolution for α and φ∗ is less clear. From Fig. 1.8 it can be seen that,
Figure 1.8: From Bouwens et al. (2007): Evolution in characteristic UV luminosity,
expressed in M∗. Different symbols and colors correspond to different survey-campaigns.
The red-points were determined by Bouwens et al. (2007) using the drop-out technique
explained in Sect. 1.1.2. The black lines are empirically calibrated models.
according to Eq. (1.3), the galaxies at z ≈ 3.5 (corresponding to a look-back time of
∼ 10 Gyr) were on average brighter by a factor of ∼ 15 than they are now. Bouwens
et al. (2007) find, that the rise from z ∼ 8 to z ∼ 6 (corresponding a time interval ≈ 0.3
Gyr) does not only qualitatively but also quantitatively fit the hierarchical build-up
predicted by the current concordance cosmology. The effect that the brightening halts
and turns-over at z ≈ 3.5, so that the galaxies become on average fainter with time until
now, is less well understood. It might be related to the depletion of cold-gas reservoirs
in galaxies, which furthermore might happen preferentially in high-mass galaxies thus
leading to a “downsizing” (Cowie et al. 1996) where star-formation would move from
high-mass galaxies to lower mass systems.
Indeed, luminosity can be used as a tracer for the star formation rate (SFR) in a
galaxy. Two calibrations are commonly used for the high-redshift galaxy populations
introduced in Sect. 1.1.1 and 1.1.2 (Kennicutt 1998), one based on the UV-continuum
radiation
SFR[M⊙yr−1
] = 1.4 × 10−28
Lν[erg s−1
Hz−1
] (1.5)
and one based on the Lyα line radiation4
SFR[M⊙yr−1
] = 10−42
LLyα[erg s−1
] . (1.6)
Using these calibrators one finds that presently known Lyα selected galaxies have typical
SFRs of 1-10 M⊙ yr−1, while the (typically UV bright) LBGs have SFRs 10 - 100 M⊙ yr−1
but sometimes even more.
4
For Eq. (1.6) the relation between the rate of the ionizing photons from Kennicutt (1998) and the
star-formation rate has been converted into an Lyα-Luminosity under the assumption that 2/3 of all
ionizing photons is converted to Lyα radiation
13

Several uncertainties are involved in the derivation and application of this calibra-
tions, but they give an handle to the cosmic star formation history in form of the star-
formation rate density. Combining the results obtained with Eq. (1.5) or Eq. (1.6) (and
with other calibrators for different galaxy populations) with the derived number density
of galaxies in samples at different redshifts yields the global star formation rate density
˙ρSFR (usually quoted in M⊙ yr−1 Mpc−3
) at different cosmic epochs. Although there is
a huge scatter in the relation for ˙ρ(z), partly related to the mentioned uncertainties,
the results clearly indicate that half of the stellar mass we observe today was in place
at z ≈ 2 (corresponding to a look-back time of t ∼ 9 Gyr). A typical plot showing the
star-formation rate density obtained with various calibrators is shown in Fig. 1.9. The
behavior of the rapid decline in star formation rate density at lower look-back times
appears to be consistent with the “downsizing” effect found in the evolution of the LF.
Figure 1.9: Compilation of cosmic-star formation history measurements obtained at
different redshifts from Ellis (2008). Different colors correspond to different SFR cali-
brations. In particular, the blue symbols show SFRs derived via Eq. (1.5) from galaxies
found with the LBG technique. The implication of this plot is that half of the stars we
observe today were in place at z ∼ 1 (obtained by integrating along the parametric fit
indicated by the dashed region).
1.2.1 Some Open Questions
As reported in the last section, the observational effort carried out in the last ∼ 15 years
clearly revals the presence of global evolutionary trends in galaxy formation with cosmic
time. Nevertheless, there are some drawbacks when relying on results that are solely
derived from photometrically selected samples.
Since LAEs as well as LBGs are star forming galaxies, there must be some overlap
between those two populations. However, studying the physical parameters of LAEs,
14

which are important ingredients to constrain this relationship (e.g. stellar masses and
age of the stellar populations), is difficult, since mostly these galaxies have faint continua.
These faint continua are only measurable with broad-band photometry in the deepest
fields (i.e. the Hubble Ultra Deep Field). The area of these fields however is too small
to be effective for narrow band searches, since those searches only probe tiny redshift
slices. However, only data in this parameter space wold allow one to tightly constrain
the overlap between those populations.
Furthermore, comparing the properties of the Lyα emission found in spectra of LBGs
objects that show equivalent widths (EW) as large as for the LAE are not found. On the
other hand, a large fraction of LBGs show small EWs usually missed in LAE samples.
Why do objects with faint UV continua do usually show strong Lyα emission, whereas
emitters with bright continua lack this emission? Are Lyα emitters an early evolutionary
stage in galaxy evolution containing the youngest most-metal poor stars?
The problem measuring in measuring large EWs is that these galaxies usually have
very faint continua and the error-bars on the EW measurement are very large. The
fraction of extremely high EW LAEs, that cannot be explained by simple stellar pop-
ulations, still is unknown with conflicting results reported in the literature. However,
these are probably the most interesting objects to be studied at high redshifts.
As explained, the LF of LBG shows evolution that is consistent with the hierarchical
scenario (buildup from z ∼ 6 to z ∼ 4), whereas the LAE luminosity function shows no
apparent evolution. Strangely, when constructing a UV LF of LAEs (only possible for
those objects that have significant continuum counterpart detections, thus for the small
fraction of LAEs that have low EWs) the trend is reversed, implying that the LAEs are
more dominant in the UV at z = 6 than z = 3 (Ouchi et al. 2008). This implies that
the fraction of LAEs among LBGs increases from z = 3 to z = 6. But having only 3
wavelength slices available for which narrow-band images are obtainable clearly limits
the quantitative analysis of this behavior.
Current observations of the LAE LF only constrain the bright end, while the faint-
end still remains observationally unexplored. Evolutionary trends that are suspected
from the above considerations might show up as an evolution of this faint end. But this
exploration is not possible with current observational capabilities.
Clearly, LAEs are cosmic lighthouses which enable detailed studies of galaxy for-
mation but still they remain a separate class of high redshift objects. Resolving above
issues will lead to an understanding that might enable an unification of the LAE with
other high-redshift galaxy populations. This is one of the key scientific driver behind
the MUSE wide-field integral field spectrograph, that will see first light at the VLT in
2013. Several surveys (shallow, medium-deep and deep) are planned that will result in
several thousand LAEs sampled continuously in a redshift range from z ≈ 3 to z ≈ 6.5
probing down to the faint end of the luminosity function.
1.3 Aim and Structure of this Work
In this Master Thesis I will present a method to automatically find and catalog emission
lines in wide-field IFS data. Such a method is needed when doing blind surveys for LAEs
with the future generations of IFS instruments. Currently, in the scientific preparation
for the second generation VLT instrument MUSE, simulated data of such surveys is
being generated. I use such simulated data for a planned shallow survey with MUSE
(texp. = 2 h) of the Hubble Ultra Deep Field (HUDF) to develop the method.
15

This thesis is structured as follows: Chapter 2 will provide some background in-
formation on integral field spectroscopy in general (Sect. 2.1) before focusing on the
MUSE instrument (Sect. 2.2). This instrument is currently in construction and, when
operational, will be most powerful IFS in the world. The 3D data products produced
by MUSE are different to what is conventionally used in astronomy, The “Flexible Im-
age Transport System” (FITS) digital file format in which these data products will be
stored is the de-facto standard in astronomy and the specification of this format allows
also storage of IFS data. This matter will be explained in Sect. 2.3, where I describe
the discovery of some errors related to conversions between celestial and FITS array
coordinates. Appendix A.1 will supplement the discussion of Sect. 2.3 by deriving the
necessary conversion formulas needed to resolve these errors.
The simulated MUSE observations and how they were created is briefly explained in
Chapter 3. Currently simulations are done with a software called QSim, which mimics
the final data product of such an observing campaign with the instrument. Sect. 3.1
introduces the software and the simulation input for observations in the HUDF. In Sect.
3.2 I will give an brief outlook on the next step of simulations, which involve a instrument
numerical model.
After these two introductory chapters I then present in Chapter 4 the method that
I developed and implemented, which automatically and efficiently searches for emission
line objects in wide-field integral field spectroscopic datasets. The method is based on
the matched-filtering approach (Sect. 4.3), but several preparatory steps are necessary
before the matched filter can be applied to the data: These are explained in Sects. 4.1
and 4.2.
The purpose of the current work is to develop and to test the methods that will be
used on real datasets, which are expected to come from the instrument in 2013. Since
the simulation input is known, in Chapter 5 I will “close the loop”, meaning that I
compare my analyses output to the simulation input. I will present first tests of objects
recovered by my method compared to the input data.
Finally, Chapter 6 will summarize and conclude this work. Since only first (but
promising) test results are presented in the previous chapter, here I also will give an
outlook on further tests that need to (and will) be carried out .
This thesis is accompanied by a extensive Appendix describing the software prototype
that I wrote implementing the presented method (Appendix A.4). This software was
used to generate the first catalog of significant emission line source detections in the
simulations for the planned shallow survey with MUSE in the HUDF. The full catalog,
which has been used for first comparisons with the input data of the simulations in
Chapter 5 is shown in Appendix A.3. Furthermore, the software makes use of a vectorized
algorithm for spectral convolution to efficiently perform matched filtering in the large
dataset. This algorithm is presented in Appendix A.2. Already mentioned was Appendix
A.1 that details aspects introduced in Sect. 2.3. The software as well as the catalog in
machine-readable format are available on request.
16

Chapter 2
Integral Field Spectroscopy with
MUSE
This chapter introduces the concept of integral field spectroscopy (Sect. 2.1) and gives an
overview of the MUSE integral field spectrograph (Sect. 2.2), which will be operational
at the VLT in 2013.
Combining spatial and spectral information in one dataset, integral field spectro-
scopic data is more complex than traditional datasets produced in optical astronomy
(i.e. images or spectra). A common file format used in astronomy is the Flexible Image
Transport System (FITS) (Wells et al. 1981), which provides a powerful mechanism to
store n-dimensional data arrays and thus is well suited for application in integral field
spectroscopy. For analysis of datasets produced by IFS which are stored in FITS, basic
knowledge about the specifications of this format is needed. For this reason Sect. 2.3
will give an detailed overview of the mechanism that is used in the FITS format to store
spatial and spectral information simultaneously.
Currently, in the process of “Dry Runs” for planned observations with MUSE (ex-
plained in Chapter 3), the expected MUSE FITS data-products are generated artificially.
The basis for these simulations are published catalogs for deep fields (particularly galaxy
catalogs of the Hubble Ultra Deep Field - Beckwith et al. 2006; Coe et al. 2006). These
catalogs give positional information in fixed equatorial coordinates (right-ascension α
and declination δ), which have to be transformed into the array coordinates of FITS for
the simulations. I found out that this transformation was done erroneously, thus Sect.
2.3 will explain in some detail how celestial coordinates are encoded in FITS arrays.
This discussion is supplemented in Appendix A.1, were I will give a detailed derivation
of the transformation formulas between FITS array coordinates and physical coordinates
and vice versa.
2.1 Integral Field Spectroscopy
Integral field spectroscopy (IFS), often also called 3D spectroscopy1, is an astronomical
observing technique that encodes spatial and spectral information in one exposure. The
data product of an IFS exposure, usually called datacube, thus is a three-dimensional
data array, with two spatial axes and a third axis in spectral direction. An information
1
Although according to Allington-Smith (2006) 2D spectroscopy or 3D imaging would be more ap-
propriate)
17

element at the point (x, y, z) in the datacube is called a “voxel” (short for volume pixel).
At each voxel a scalar quantity, related to the flux density Fλ at a specific wavelength
λ at this spatial coordinate (x, y) is stored. An information element in the spatial (x, y)
plane (i.e. the equivalent to a pixel in conventional imaging) contains a spectrum at this
position and is called a “spaxel”.
Overcoming the paradigm of doing imaging and spectroscopy separately, IFS is a
very efficient way of doing astronomical observations. There are even analyses which
would not be possible (or at least very expensive in observing time) without IFS, such
as creating 2D maps of various physical parameters of galaxies (e.g. Weilbacher et al.
2011). In surveys where usually targets for adjacent spectroscopy are selected by multi-
band imaging, one could in principle benefit from IFS, especially in those which search for
emission line galaxies, such as LAEs. As explained in Sect. 1.1, here one typically uses
a narrow-band filter in combination with broadband photometry. However, blind IFS
emission-line surveys are not very practical with the current generation of IFUs, since
they have only a small FOV and are limited in throughput. The future IFS MUSE,
described in the next section, is especially designed to overcome these handicaps.
Compared to imaging current IFS instruments cover only small areas on the sky.
Table 2.1 exemplarily lists the existing instruments at the VLT having integral field
spectroscopic capabilities with their wavelength coverage ∆λ, spectroscopic resolution
R = λ/∆λ, FOVs and spatial sampling ∆s. While SINFONI is a pure IFS instrument,
working in the mid-infrared, FLAMES and VIMOS are multi-purpose instruments pro-
viding an IFS mode. A complete list of all IFS instruments currently in operation at
major astronomical facilities worldwide can be found online at the IFS Wiki2 (Westmo-
quette et al. 2009).
The discovery potential of IFS in blind-searches for LAEs was shown by van Breuke-
len et al. (2005), using the VIMOS instrument on the VLT in low resolution mode
(R ≈ 200). VIMOS provides currently the largest FOV on an 8m Telescope, but with
relatively coarse spatial sampling and at low efficiency. Despite these instrumental lim-
itations, they found 14 bright LAEs (i.e. LLyα 1042 erg s−1) in their 9h exposure of a
∼ 1 × 1 arcmin2 FOV (by combination of 4 neighboring pointings, furthermore rectan-
gular regions around spaxels containing emission from continuum emitters were masked
out). A redshift interval of 2.3 ≤ z ≤ 4.6 was covered, with gaps in this interval due to
sky-subtraction problems. The limiting line flux was on average Fline ≈ 1.8 × 10−17 erg
2
URL: http://ifs.wikidot.com
Instrument ∆λ [nm] / R FOV ∆s
VIMOS 400 - 1150 / 200 - 3000 27′′× 27′′ 0.33′′× 0.33′′
54′′× 54′′ 0.67′′× 0.67′′
SINFONI 1100-2450 / 1500 - 4000 0.8′′× 0.8′′ 0.0125′′× 0.0255′′
3′′× 3′′ 0.05′′× 0.10′′
8′′× 8′′ 0.125′′× 0.25′′
FLAMES 370 - 950 / 5600 - 46000 12′′× 7′′ 0.52′′× 0.25′′
6′′× 4.2′′ 0.3′′× 0.3′′
Table 2.1: Current IFS Instruments at the VLT
18

s−1 cm−2. The whole observing program for this pilot-study took 4 nights (each night
one pointing). MUSE will reach deeper than this flux limit in a 2 h exposure covering
the same FOV at once and at much higher spatial resolution.
Figure 2.1: Schematic sketch of the main IFS techniques (adapted from Allington-Smith
2006)
There are different techniques which IFS instruments use to produce a spectrum of
each spatial section in their FOV. The three main techniques in use are sketched in
Figure 2.1 (Allington-Smith and Content 1998; Allington-Smith 2006). Lenslet arrays
were the first realization of IFS-instruments (e.g. the IFU Tiger on the 3.6m Canada-
France-Hawaii telescope). Here an array of lenses is used to segment the focal plane.
The pupil images produced by these lenses are then fed into the spectrograph, and
give short spectra on the CCD. By rotating the dispersion axis against the symmetry
axis of the spectrograph input, overlap between the spectra is avoided (Fig. 2.1 - top).
A modification of this technique is to couple the lenses with fibers. The ends of the
fibers are then reformatted onto a pseudo-slit that is dispersed by the spectrograph
(Fig. 2.1 - middle), so deadspace on the detector is minimized. This so called fiber-
lenslet array technique is currently the most common IFS-technique used in astronomy
(Westmoquette et al. 2009). An example for such an instrument would be the PMAS
IFU, which is installed at the Cassegrain focus of the 3.5m telescope at Calar Alto
Observatory (Spain) or the already mentioned VIMOS IFU on the VLT.
A different realization is the image slicer technique. Here mirrors are used to seg-
ment the image into several thin horizontal slices. A secondary mirror then reformats
these slices so that they lie on top of each other. This stack of slices is then dispersed
by the spectrograph (Fig. 2.1 - bottom). The image slicer technique brings advantages
in throughput and further minimizes deadspace on the detector. Since the spatial seg-
mentation is done only in one dimension, the spatial information along the slices is
maintained. This design allows maximum spatial and spectral resolution. An example
of an IFU putting the image slicer technique into practice is SINFONI at the VLT (see
Table 2.1), which operates in the near infrared. The future VLT instrument MUSE,
described in the next section, will employ this technique in the optical.
19

2.2 MUSE - Overview of Instrument Characteristics &
Performances
Figure 2.2: Artist Rendition of MUSE at the VLT Nasmyth Platform (from Bacon et al.
2009). Visbile are the tubes of the 24 spectrographs, the tanks of the cryogenic system
and behind the tanks the housings that contain the electronics.
The Multi Unit Spectroscopic Explorer (Bacon et al. 2004; McDermid et al. 2008;
Bacon et al. 2006; Bacon et al. 2009; Bacon et al. 2010), or short MUSE3, is a 2nd
generation VLT Instrument to be installed on Unit Telescope 4 (UT4 “YEPUN”) of the
Very Large Telescope (VLT) at Cerro Paranal (Chile). The instrument, planned and
built by a consortium consisting of six European institutes4 and ESO, will be operational
at the VLT in 2013. It will be the most powerful integral field spectrograph (IFS) going
into operation up to date. A computer-aided drawing of the envisioned final appearance
of the instrument at the VLTs Nasmyth B Platform is presented in Figure 2.2.
MUSE is an optical IFS making use of the image slicer principle. It is designed
to have a spatial sampling of 0.2×0.2 arcsec2 in a field-of-view (FOV) of 1×1 arcmin2,
thus giving 90000 spectra per exposure5. Each of this 90000 spaxels will contain ∼ 4000
spectral pixels (“voxels”, short for volume pixels) in a wavelength range from 480 nm
to 930 nm, meaning in terms of spectral resolution R ≈ 1750 at the beginning and
3
When the concept for this instrument was originally presented it was dubbed MIFS: Mega Integral
Field Spectrograph for the VLT (Bacon et al. 2002).
4
CRAL (Lyon), IAG (Göttingen), AIP (Potsdam), NOVA (Leiden), LATT (Toulouse) & ETH
(Zurich)
5
In addition to this so called “wide-field mode” MUSE will also provide a “narrow-field mode”,
enabling the possibility to sample a FOV of 7.5×7.5 arcmin2
at 0.025×0.025 arcsec2
resolution
20

R ≈ 3750 at the end of the wavelength range6. This means that in total there will be ∼
360 million voxels in a datacube resulting from an observation with MUSE.
Such a large number of information elements can of course not be provided on one
detector alone, thus MUSE will consist of 24 identical IFUs. Fore optics, placed in the
Nasmyth focus of the telescope, will first de-rotate, split and magnify the 1 × 1 arcmin2
into 24 rectangular sub-fields (top left panel of Fig. 2.3) of 2.5×60 arcsec2. Each of these
sub-fields will then will than be fed into one of these identically manufactured IFUs.
The slicer in these IFUs, consisting of a slicer stack (discontinuous mirrors with different
tilts, that literally cut the sub-fields to minislits, rearranged into staggered rows - see
Fig. 2.4 for a photo), several lenses and masks, slice the sub-fields into 48 slices (top right
panel of Fig. 2.3). These slices form the entrance plane of spectrograph. The dispersive
element in these spectrographs then produce the spectra, which are picked up by the
detector (CCD in a cryogenic vacuum environment).
A full description of the design of the IFUs giving all technical and optical details
can be found in Laurent et al. (2006) and Laurent et al. (2010). The main point is, that
the whole system was designed in a way that, except from some parts of the calibration
system (Kelz et al. 2006) and the de-rotator, there are essentially no moving parts in
MUSE. This ensures the high stability that is needed for long-time exposures, which will
be done e.g. as a series of 1 h exposures over several nights. Another crucial ingredient in
the design was a high system efficiency, meaning that a high number of photons arriving
at the telescope will also contribute to the measured flux. First laboratory measurements
show that the instrument is well within its science driven specifications (Laurent et al.
2010).
After exposures and calibration exposures have been obtained at the telescope, the
end product are n×24 of 4000×4000 pixel (where n is the number of exposures that
were taken of a field) CCD images (bottom panel, Fig. 2.3). It is then the effort of
the data-reduction software to build from this set of n× 24 CCD images the final ∼
300×300×4000 datacube, which will be used in astrophysical analyses (Weilbacher et al.
2008).
6
There is also an extended mode, resulting from removal of a second order filter in the spectrograph.
In this extended mode the end of wavelength range in the blue is at 465 nm.
21

Figure 2.3: Sketch of the Image Splitting and Slicing Principle of MUSE. The 1×1
arcmin2 FOV will be split into 24 subﬁelds, which will be then fed into 24 IFUs working
with the image slicer principle.
(Image courtesy of P. Weilbacher & P. B¨ohm, AIP)
22

Figure 2.4: Photo of a prototype of a slicer-stack, similar to the one which willl used
in each of the 24 MUSE IFUs. This is essentially a discontinuous mirrror, that literally
cuts the input 2.5×60 arcsec2 FOV into minislits arranged in a staircase arrangement.
(Image Source: MUSE web-site http://muse.univ-lyon1.fr/)
23

2.3 Representation of Integral Field Spectroscopic Data in
the FITS File Format
The Flexible Image Transport System7 (FITS) is the de facto standard file format in
astronomy. Since its introduction (Wells et al. 1981) it has been used by astronomers
in all wavelength bands, from radio to X-Ray data. An update to the FITS standard
(FITS Version 3.0) has been published recently (Pence et al. 2010). The format provides
a natural way to store n-dimensional data arrays together with relevant metadata, which
makes it well suited to the application in IFS. Since the keyword format which is used
to store the metadata is very flexible, it is easy to build own extensions based on FITS.
Another advantage of the format is that there exist FITS-interfaces for the most common
programming languages to provide access to such files (i.e. loading, manipulating and
saving). Examples of such interfaces are cfitsio for the C programming language
(Pence 1999) or pyfits for Python (Barrett and Bridgman 1999). Note that FITS
arrays historically are 1-indexed (similar to the indexing in FORTRAN), so special care has
to be taken when using those arrays with 0-indexed programming languages like C or
Python.
There are also disadvantages with storing IFS-data in basic FITS fashion (i.e. by
methods described only in the FITS standard). For instance some IFSs have non-square
spaxels (i.e. SINFONI, see Table 2.1) which might be distributed in non-square arrays
over the FOV. Also the filling factor of the spaxels in the FOV does not necessarily need
to be 100 % (e.g. for the VIRUS-P instrument it is 30 % - Adams et al. 2011). In this
case spatial resampling from the original CCD data to the 3D datacube will result in
information loss. To address such issues the FITS extension Euro3D was created (Kissler-
Patig et al. 2004). However, in this thesis only simulated data for planned observations
with the MUSE instrument, produced by the QSim software, will be analyzed. This
data does not make use of the Euro3D extension, because IFUs based on the image
slicer technique do not benefit from it.
In general a FITS file is built from one or more header-data-units (HDUs). A HDU
contains a header and the associated n-dimensional array filled with values.8 The header
is built from standardized keyword-value pairs, plus a optional comment. The header
contains encoded in these keyword-value pairs the metadata relating the contents of the
HDUs array to the physical world. This could include information of what instrument
was used, at which point in time the observations where taken and which values are
measured in what type of units. Especially important in the astronomical context is
the information on which part of the sky observations were taken and how to relate
measurements in the grid of the array to physical coordinates.
The FITS file produced by QSim consists of 4 HDUs. The first is a primary HDU,
containing only a header which stores information on the parameters that were used
in the simulation. HDU 2 contains the datacube, with flux stored in 10−20 erg s−1
cm−2 ˚A−1, and HDU 3 contains the associated variances. HDU 3 contains an white
7
The term “Transport System” reflects that the format was originally developed to transport astro-
nomical images on magnetic tape. However the format evolved over time and is now capable of handling
astronomical data of various complexity.
8
FITS files can also be used to store tables, e.g. the QSim scene file described in Sect. 3.1 is a FITS
table. Table HDUs differ significantly from data HDUs discussed here, see Pence et al. (2010) for more
information.
24

light image of the simulated field, i.e. a 2D representation of the datacube collapsed in
spectral dimension (see also Sect. 2.3).
A typical header from simulated IFS data for MUSE is shown in Fig. 2.5, where in
each line the keyword is shown on the left, the value assigned to that keyword is shown
right next to it, separated by a =, and the optional comment is shown next to the value,
separated by a /. How such a header relates the stored values in the HDU to the physical
world is now explained in more detail.
The Keyword SIMPLE in line 1 states that this is a standard FITS HDU. Line 2 &
3 indicate that in this FITS HDU a 3 dimensional (NAXIS = 3) array is stored with
values in floating point 32 bit arithmetic9 . In line 7 the logical value T of the EXTEND
keywords indicates that this is the primary HDU of an FITS file with more than one
HDU. In this case each of the HDUs can get a name, which is stored in the EXTNAME
keywords value. The NAXISn keywords in lines 4-6 define the number of elements in
each dimension n = 1, 2, 3 of the array. These keywords must be stored in descending
order of their values according to the FITS standard. So because of the 1-indexed nature
of FITS in the example header of Fig. 2.5, we have a HDU with an 301 × 301 × 3463
array. Note that pyfits sorts the array dimensions the other way around in ascending
order. In lines 16-18 the CTYPEn keywords define, what physical meaning is assigned to
these dimensions. Here the first 2 dimensions are the spatial axes (i.e. right-accession
α ˆ= RA---TAN and declination δ ˆ= DEC--TAN) in the gnomonic10,11 projection of the
celestial sphere in FK2000 (line 31) coordinates, which will be explained in detail in the
Appendix A.1. The third axis corresponds to wavelength λ (measured in air), indicated
by the value AWAV. Lines 23 to 25 give the units (CUNITn keywords) corresponding to
these axes.
The formalism of how specific (x, y, z)-coordinates of the array are converted into
(α, δ, λ)-tuples makes use of the key-value pairs in lines 9-14, 19-22 and 26-30 of the
header given in Fig. 2.5. This process of how “world coordinates” (in the sense of
measurable quantities in some physical parameter space) are converted between the
array coordinates of FITS is defined in Greisen and Calabretta (2002) (basic formalism),
Calabretta and Greisen (2002) (application of the formalism to spatial coordinates on
the celestial sphere) and Greisen et al. (2006) (application of the formalism to spectral
coordinates). The conventions provided there have recently been in-cooperated into the
official FITS standard (Pence et al. 2010). The formalism itself is detailed in Appendix
A.1.
Normally, astronomers do not have to deal with those coordinate transformations
by themselves, software libraries like e.g. WCSTools12 (Mink 1999) provide convenient
functions to do such kind of operations with FITS Files. However, in the case of the
MUSE simulations with QSim (explained in more detail in Sect. 3.1), it is convenient
9
The “-” in line 2 of the example header in Fig. 2.5 specifies that the floating point representation
is according to the “IEEE Standard for Binary Floating-Point Arithmetic for microprocessor systems”
(Pence et al. 2010).
10
The name originates from the Greek word gnomon (engl. indicator). A gnomon is the part of
the sundial that draws the shadow. In the gnomonic projection which is centered on some location of
the earth, the angles between the meridians separated by 15◦
are the same as the hour markings of an
sundial at this location (Snyder 1993). The gnonomic projection is also often used for maps of earths
polar regions.
11
MUSE with a FOV of 1×1 arcmin2
in the Nasmyth focus of a Ritchey-Cretién telescope naturally
samples the celestial sphere in a gnomonic projection (Zacharias 2001).
12
WCSTools can be used in the Python scripting language via the module astlib by Douglas Mink,
which can be downloaded from http://astlib.sourceforge.net/.
25

1 SIMPLE = T / conforms to FITS standard
2 BITPIX = -32 / array data type
3 NAXIS = 3 / number of array dimensions
4 NAXIS1 = 301
5 NAXIS2 = 301
6 NAXIS3 = 3463
7 EXTEND = T
8 EXTNAME = ’DATA ’ / extension name
9 CRVAL3 = 480.0 / Start wavelength
10 CRPIX3 = 1 / Start wavelength in pixel
11 CRVAL1 = 53.19060746442399 / Start x spatial coord
12 CRPIX1 = 150 / Start x spatial coord
13 CRVAL2 = -27.79323102191726 / Start y spatial coord
14 CRPIX2 = 150 / Start y spatial coord in pixel
15 BUNIT = ’10**(-20)*erg/s/cm**2/Angstrom’ / Flux Unit
16 CTYPE1 = ’RA---TAN’ / Coord Type
17 CTYPE2 = ’DEC--TAN’ / Coord Type
18 CTYPE3 = ’AWAV ’ / Coord type
19 CD1_1 = -4.128582363763301E-05
20 CD1_2 = -3.717392257549213E-05
21 CD2_1 = -3.717392257549213E-05
22 CD2_2 = 4.128582363763301E-05
23 CUNIT3 = ’nm ’ / Wavelength units
24 CUNIT1 = ’deg ’ / Spatial units
25 CUNIT2 = ’deg ’ / Spatial units
26 CD1_3 = 0
27 CD2_3 = 0
28 CD3_1 = 0
29 CD3_2 = 0
30 CD3_3 = 0.13
31 EQUINOX = 2000 / Standard FK5 (years)
Figure 2.5: FITS Header for simulated MUSE data
26

to have transformation formulas between world coordinates and FITS array coordinates
at hand, because the input for the simulations is derived in parts from catalogs from
the literature, with positions of objects in fixed equatorial coordinates α and δ, and the
header of the final FITS files is written after the simulations. It is therefore necessary
to understand the inner workings of the mechanism to avoid errors in that process.
This is especially the case because I found that wrong transformation formulas be-
tween world and FITS array coordinates had been used in early versions of the MUSE
dry-run datacubes produced by QSim. The error can be seen when comparing the im-
ages of the HUDF with collapsed dry-run datacubes (white-light images, see also Sect.
4.2.1). To illustrate these errors made in the simulations, the B435 Band image of the
HUDF (Fig. 2.6) is compared to two white-light images (i.e. 2D images generated from
the simulated datacube, where the contend of each spaxel is summed up in spectral di-
mension) of simulated MUSE datacubes of the same region with erroneous input (Figs.
2.7(a) & 2.7(b)).
In Fig. 2.7(a) a wrong transformation law between equatorial and pixel coordinates
has been used. Instead of the right transformations only a linear scaling from the refer-
ence value (αp, δp) ↔ (x0, y0) in the center of the field has been used. Notice in Fig. 2.7(a)
that this makes almost no error in declination, but the error in right ascension grows
more pronounced when moving off the center.
In order to resolve this issue, I derived the right transformation formulas (see Ap-
pendix A.1) between FITS pixel (or for IFS data spaxel) coordinates and world coor-
dinates, following strict the rules provided in Calabretta and Greisen (2002) for the
gnomonic projection13.
These formulas were incorporated into the current version of the QSim simulations.
In Fig. 2.7(b) the result of this simulation is shown. However, the objects derived
from the catalog that acts as input for QSim are also stored as datacubes in FITS (see
Sect. 3.1). These datacubes do not use the sign convention specified in Calabretta and
Greisen (2002), which is chosen such that east is right and north is top on the celestial
sphere. Since in the input files a different handedness of the coordinate was chosen, now
in Fig. 2.7(b) all extended objects are mirrored along their horizontal (east-west) axis
compared to the original image. Note, however, that they are now in the right position.
Finally, in Fig. 2.8, magnified versions of Fig. 2.6, Fig. 2.7(a) & Fig. 2.7(b) are shown.
Note again that the bright visible galaxies are in the right positions, when comparing
Fig. 2.8(c) with Fig. 2.8(a), but the objects are mirrored along the east-west axis in the
QSim simulated cube compared to the original HUDF image. This will be fixed in a
next iteration of the simulations, but requires modification of the headers of ∼ 3×104
FITS files (the input files used for QSim, see Sect. 3.1 in the next chapter).
13
The relevant formulas for the transformation of MUSE / QSim Data, provided in Eqs. (A.20) and
(A.21), Eqs. (A.11) and (A.12) given in the Appendix, were cross checked with the WCSTools routines
wcs2pix() (converts world coordinates to pixel coordinates) and pix2wcs() (converts pixel to world
coordinates) on a given HDU. These routines produce essentially the same results.
27

3h32m32.00s36.00s40.00s44.00s48.00s
RA (J2000)
49'00.0"
48'00.0"
47'00.0"
-27°46'00.0"
Dec(J2000)
Figure 2.6: HUDF B-Band image with 1×1 arcmin graticule (1 arcmin ˆ= 4 seconds of
right ascension). The image has been resampled from the original HUDF resolution of
0.03×0.03 arcsec2 resolution per pixel to MUSE resolution (0.2×0.2 arcsec2 per spaxel).
28

3h32m32.00s36.00s40.00s44.00s48.00s
RA (J2000)
49'00.0"
48'00.0"
47'00.0"
-27°46'00.0"
Dec(J2000)
(a) simple linear transformation
3h32m32.00s36.00s40.00s44.00s48.00s
RA (J2000)
49'00.0"
48'00.0"
47'00.0"
-27°46'00.0"
Dec(J2000)
(b) wrong sign conention in input
Figure 2.7: Examples of errorneous spatial transformations between world coordinates
and FITS array coordinates used in early versions of the QSim software. Same gratic-
ule as in Fig. 2.6. Top panel: Erroneous simple linear transformation from equatorial
to pixel (spaxel) coordinates was applied. Bottom panel: Right formulas for spatial
transformation applied, but wrong sign-convention used in input FITS Files. Magniﬁed
Regions are shown in Fig. 2.8.
29

(a) H-UDF B-Band
(b) simple linear transformation (c) wrong sign conention in input
Figure 2.8: 36′′ × 36′′ cutout of Fig. 2.6 (top panel), Fig. 2.7(a) (middle panel) &
Fig. 2.7(b) (bottom panel) centered on (α; δ) = (3h 32m 45.6s ; −27◦ 47′ 30′′ ). As in
Fig. 2.6, the H-UDF B-Band image has been down-sampled to MUSE resolution.
30

Chapter 3
Dry Runs for MUSE
The MUSE IFS introduced in Sect. 2.2 will start observing at the telescope in 2013.
Observations with this instrument will produce data products of high complexity and
volume compared to the traditional data products (spectra and images) produced in
optical astronomy. Thus besides the ongoing integration of the instrument a lot of effort
is spent in simulating the final output MUSE will produce when being operational at
Paranal. The analysis of such a simulation is called “Dry Run”.
A distinction has to be made between the final data products which will be worked on
by the astronomer in the astrophysical analysis of the observation and the raw exposures,
the former being datacubes in FITS representation as detailed in Sect. 2.3. These
datacubes will be generated from the raw exposures (24 CCD images, one from each
spectrograph of the instrument) processed by the data reduction software (DRS) pipeline
(development lead by P. Weilbacher of AIP, for an overview of the pipeline see e.g.
Weilbacher et al. 2008).
A complete dry run would consist of simulating an observation of some astrophysical
scene (e.g. observation of a deep field), taking into consideration all the optical effects
of the instrument and of the atmosphere, employing an instrument numerical model.
Such a simulation would produce as output the raw exposure CCD images. For deep
observations several exposures of the scene need to be accumulated. The next step is
then to reduce and combine these simulated raw exposures using the DRS pipeline, to
produce a final datacube of the simulated observation. In the final step this datacube
gets analyzed and the loop is closed by comparing the output of the analysis with the
astrophysical input. A brief overview of the instrument numerical model is given in Sect.
3.2.
However, given the complexity of this process and the fact that the DRS and the INM
are still under development, a shortcut is to simulate straightaway the final reduced data-
cube, by making some reasonable assumptions about atmospheric- and instrumental
effects. For the generation of such mock datacubes the software QSim was developed by
the MUSE P.I. Roland Bacon.
In this work, I will analyze QSim produced datacubes of mock observations in the
Hubble Ultra Deep Field. Thus in Sect. 3.1, not only an overview of the QSim software
is presented, but also an overview of the input scene, which is used for this particular
simulation, is given. After performing the search for emission line objects with the
methods described in Chapter 4 in this datacube, the detections will finally be compared
31

with the input data presented in this Section, to quantify success rates etc. (e.g. closure
of the dry run loop, see Chapter 5).
3.1 QSim - Generating MUSE Datacubes from Astrophys-
ical Input
QSim is a software simulation package written in Python by R. Bacon of Lyon Univer-
sity (Principal Investigator of MUSE). The aim of QSim is to produce mock-datacubes
resulting from MUSE observations. Output datacubes from QSim are stored as FITS
files (Sect. 2.3), containing 4 HDUs. HDU 1 contains only a header in which information
about the parameters are stored which were used for the simulation, HDU 2 contains
the 3-dimensional array in which the flux is stored (in units of 10−20 erg s−1 ˚A−1) and
the corresponding header (similar to the one displayed in Fig. 2.5), HDU 3 contains the
associated variances to HDU 2 and HDU 4 contains a white-light image (i.e. a collapsed-
representation of the datacube where the flux is summed up along the spectral direction,
see also Sect. 4.2.1). QSim output does not necessarily represent only one MUSE field-
of-view (FOV), instead smaller or larger regions on the sky can be simulated (simulated
FOV). In the latter case the output datacubes can become very large in size and might
have to be split into several subcubes representing typical MUSE FOVs (see Sect. 4.1).
Several instrumental effects are taken empirically into account to make the simu-
lations quite realistic. These effects include the point-spread function (PSF, see also
Sect. 4.3.2) and the line-spread function (LSF, see also Sect. 4.3.3) of the instrument.
Furthermore contributions by photon- and readout noise as well as the the dark current
of the detector are calculated.
Various observational parameters can be defined by the user running a QSim simu-
lation. These are the integration time, the number of exposures, the airmass, the seeing
and the number of days since new moon. Also one can choose in which operation mode
the observations are performed, i.e. narrow- or wide-field mode and normal- or extended
wavelength range (modes are described in Sect. 2.2).
The input-objects for the astrophysical scene that will be simulated need to be pro-
vided. In the current version of QSim three types of input objects can be placed in
the simulated FOV: Point-source objects (PS objects) and two types of galaxies - UDF
Galaxies (UD objects) and SG Galaxies (SG objects). PS objects consist basically only
of a spectrum which is stored at higher resolution than the MUSE resolution (so it can
be convolved with the MUSE LSF). The choice for two types of galaxies was motivated
by the fact, that the Dry Runs for the high-redshift science cases are performed in the
Hubble Ultra Deep Field. In this region of the sky excellent photometric broad-band
information is available for numerous galaxies at redshifts 0 < z 4, but the number
of known objects declines with higher redshift. However, beginning with z = 2.8 Lyα-
Emitters will be visible in MUSE observations of this field. These objects however, due
to their faint continua, are undetected in broad-band photometry. Because of these rea-
sons, it was chosen to put above redshift 2.8 simulated objects into the QSim simulation
of the Hubble Ultra Deep Field (HUDF), while below 2.8 the spectral appearance of
the objects is generated from the photometric information in the field. Since both ob-
jects have a different file-structure, 2 different input classes are necessary. The z < 2.8
UD objects are small FITS datacubes at higher spatial and spectral resolution than
MUSE, while the z > 2.8 SG objects stem from the “MareNostrum Galaxy Formation
32

0 1 2 3 4 5 6
z
0
500
1000
1500
2000
N
z Distribution QSim UDF
UD
SG
Figure 3.1: Redshift Distribution of QSim Scene of the HUDF (green - UD Objects,
yellow - SG objects).
Simulation” (Yepes et al. 2008). The FITS file structure of the SG type input objects
is very complex and even variable. In Fig. 3.1 the redshift distribution of the SG and
UD objects in the QSim Scene of the HUDF is shown. The distribution of UD objects
follows the observed redshift distribution of galaxies in the HUDF (Coe et al. 2006), the
distribution of SG type objects is theoretically motivated.
The placement of the objects in the simulated FOV is controlled by a scene file which
is stored as a FITS table. Mandatory entries in this table must provide the relative X and
Y coordinate (in arc seconds) on the QSim grid, the object type and the filename of the
input file (UD objects also need a position angle). Additionally informative properties
of the input sources can be added to the table, so it can act as a reference for later
comparison with output from analyses performed on the simulated datacube.
Table 3.1 lists the parameters that are present in the scene file that was used for the
dry runs of MUSE observations in the HUDF. In total there are 34980 objects listed
in the scene table for the simulation of the HUDF, from which 42 are of PS type (i.e.
spectra of the few M and later type stars that were found in the HUDF Pirzkal et al.
2005), 14309 are of UD type and 20629 are of SD type. In Fig. 3.2 the relative X and Y
positions of the input objects are shown.
QSim simulations of the HUDF were run with various observational parameters. In
this thesis I will work with a dry run that simulates a shallow survey in the HUDF
with 2h exposure time (2×1h exposures combined) at 1.0 arcsec seeing, 6 days after new
moon at an airmass of 1.2. The total simulated area is 3′ × 3′, indicated by the thick
black frame in Fig. 3.2. A white-light image of the simulation can be seen in Fig. 4.1 in
the next chapter.
3.2 Instrument Numerical Model
As described in Sect. 2.2, the data that is directly produced at the telescope is a set
of numerous×24 4000 × 4000 exposures and calibration images. With the help of the
data-reduction pipeline this data set will then be transformed into the datacubes that
will be used for the astrophysical analyses and which are subject of this thesis.
This data-reduction is a quite complex process (see Weilbacher et al. 2008, for all the
steps involved) and needs to be tested. For this purpose an instrument numerical model
33

Figure 3.2: Spatial distribution of objects in the QSim scene of the HUDF. SG, UD and
PS type objects are indicated by yellow, green and red dots respectively. The relative
XQSim and YQSim coordinates are calculated from the relative X and Y coordinates stored
in the scene table by division with 0.2, thus numbering the spatial resolution elements
of MUSE from the center of the simulated field. The thin dashed lines indicate 9 MUSE
FOVs of 1′ × 1′ that would fit into the simulated region. In my analysis I have cut the
whole simulation into these 9 subfields (Sect. 4.1).
34

(INM) needs to simulate the set of 24 4000 × 4000 CCD images that are collected in an
observation of a particular astrophysical scene. Moreover the corresponding calibration
exposures need to be simulated. To make these simulations as realistic as possible, one
has essentially simulate the instruments response to some simulated region on the sky as
well as to the instruments calibration unit. The INM takes into account all the optical
effects of the instrument and of earths atmosphere, thus it basically simulates the last
meters of the wavefronts from space arriving at the detector. To perform such a complex
task computationally efficient, the INM uses a the Fourier transform approach (Jarno
et al. 2010).
Currently the INM is not in a final state, i.e. it cannot simulate a full FOV of an
astrophysical scene like the HUDF with all 24 integral-field units of MUSE. Nevertheless,
it can produce simple scenes with a few objects included. One raw CCD image of such
an simulation is shown in Fig. 3.3. The rich line spectrum which can be seen in this
figure are emission lines from the night sky. The white dots of various shapes and sizes
are hits by cosmic rays that have been accumulated in this simulated 1 hour exposure
on the CCD. It is the task of the DRS to deal with such effects in a way that the final
datacubes will be affected by those in the least possible way.
The full dry runs will consist of simulations of an astrophysical scene like the HUDF
under defined observing conditions with the INM. Then the DRS will create datacubes
from this dataset which will in turn be analyzed. Closing the loop, the results of this
analyses can than be compared with the input of the simulation. Currently, the loop with
QSim is smaller, by taking the DRS fully out of the game and empirically simulating the
effects of the instrument and the night sky. Nevertheless, QSim enables to create and
test already the analyses methods on the final data product. This enables optimization
of the DRS in the full dry runs, so the necessary tools to optimally address the scientific
questions that motivated the construction of the instrument will be available when it
sees “first light” at Paranal.
Column Description
ID Object Number
Type Object Type (i.e. PS, UD or SG)
Name Object Name
File File Name of Input FITS File
Xpos relative X Coordinate (in arcsec)
Ypos relative Y Coordinate (in arcsec)
Orient Position Angle (degree)
z Redshift z
alpha Right Ascension α (in degree)
delta Declination δ (in degree)
vmag V-Band AB magnitude
imag I-Band AB magnitude
Flya Lyman-alpha Flux FLyα (in erg/s/cm−2)
EWlya Rest-Frame Lyα Equivalent-Width (in ˚Angstrom)
nline Number of Emission Lines
size Approximative Object Size (in arcsec)
Table 3.1: List of columns in the QSim scene table of the HUDF simulation
35

Figure 3.3: Numerical instrument model simulation of raw CCD data for one MUSE
CCD (wavelength increases upwards). One exposure will consist of 24 such frames.
The separation into 4 segments is because the CCD is read out on 4 ports, the areas
containing no ﬂux are over-scan regions.
36

Chapter 4
Building a Catalog of Emission
Line Objects in (simulated) IFS
Data
Several steps need to be performed in order to get from the IFS datacubes to a catalog
of emission line objects (ELOs). In this chapter I will present an approach that is based
on matched filtering (e.g. Das 1991).
But before running any algorithms on big scenes simulated with QSim, it is necessary
to cut those cubes into smaller pieces, that can easily be digested by normal workstation
hardware (Sect. 4.1).
Detecting ELOs by using S/N as a discriminator between detection and non-detection
is problematic with objects that have a bright continuum. So the next step in the method
described here is to mask out such objects (Sect. 4.2.2). To achieve this 2D image
representations of the datacube need to be calculated, which can be done by collapsing
the datacubes along the spectral axis (Sect. 4.2.1).
After continuum objects have been removed from the datacube, matched filtering
can be applied. This will enhance the signal to noise (S/N) of emission lines, while
smearing out noise dominated regions in the datacube, thus minimizing false detections.
More details including an introduction to matched filtering will be provided in Sect. 4.3.
The processed datacubes are then used to build a catalog of ELOs (Sect. 4.4 which,
in combination with the original datacubes, finally is used in to measure the fluxes of
the emission lines (Sect. 4.5).
As the final product of the process described here one gets a catalog of emission lines
with their spatial and spectral positions (x, y, λ) in the datacube, as well as measurements
of their fluxes. The IDs of the catalog are grouped according to spatial position. This
ensures if one object shows several emission lines, these object gets only one ID.
4.1 Cutting an IFS Cube into Smaller Pieces
The simulated datacubes by QSim (Sect. 3.1) can be bigger than the FOV of MUSE.
Due to the large size of these QSim cubes (e.g. 21 GB for a QSim simulation of the
HUDF), it is recommended to cut them spatially into smaller subcubes. Thus it becomes
possible to load these subcubes into the memory of typical workstations (i.e. machines
with 4 to 8 GB of RAM) at once thus performing analyses very efficiently.
37

When cutting a cube into smaller pieces, care has to be taken that the WCS infor-
mation in the FITS header of the subcubes HDU are updated, i.e. one has to choose
a reference pixel in the new cube (CRPIX1 and CRPIX2 keywords of the subcubes HDUs
header as described in Sec. 2.3), find out its position (x, y) on the original cube, then
get its WCS coordinates via Eqs. (A.11) and (A.12) and store those coordinates in the
subcubes header (CRVAL1 and CRVAL2 keywords).
In the future it might also be desirable to cut out regions of MUSE FOV sized
datacubes (∼ 2.5 GB). For example, if one wants to perform analyses on one galaxy
in a MUSE cube, performing the operations on the whole cube would produce lots of
computational overhead. For this purpose I wrote a small program cut-out-cube.py,
which is described in Appendix A.4.1.
For my analyses on the shallow survey HUDF QSim scene I decided to cut the
3×3 arcmin2 simulated cube into 9 1×1 arcmin2 pieces (subcubes). Thus each subcube
mimics a typical MUSE FOV (Fig. 4.1).
4.2 Masking Continuum Objects in IFS Datacubes
In order to mask the continuum objects from the datacube, these objects need to be
identified first. This can be most easily done in a 2-dimensional representation of the
datacube, which has been collapsed along the wavelength axis. The collapsing of IFS
datacubes is described in more detail in Sect. 4.2.1.
Since identifying objects in 2D images is a common problem in astronomy, there are
several tools available for this task. SExtractor (Bertin and Arnouts 1996) is a widely
used and popular software for 2D object identification (Shore 2009). The program
creates catalogs with magnitudes, fluxes, positions and several geometric parameters
(radii, eccentricity etc.) from astronomical images. When working on an image, the
program needs to be provided with several input parameters, the most important ones
being the detection threshold, i.e. the value of S/N above which pixels enter the detection
routine, and the number of joint pixels which form an object. Apart from the catalogs,
SExtractor can also produce segmentation maps. These maps are stored as FITS files,
and contain pixels set to values > 1 for pixels which belong to objects1
One approach of masking the continuum objects from the datacube is to use these
segmentation maps produced by SExtractor to mask out continuum objects. Another
approach would be to make a mask with a certain S/N threshold in the collapsed dat-
acube.
When building statistical samples of ELOs, it should be noted that the ELOs which
will be detected after masking are objects with no significant continuum detections in
the datacube. On the other hand, when searching for emission line objects using narrow-
band images usually an equivalent-width (EW) criterion is adopted such that objects
above a certain EW are regarded as line emitters. Masking out objects with detectable
continuum thus essentially will also remove some objects that have strong emission lines
and thus would formally classified as line emitters.
One weakness of the masking approach is that the spaxels which are masked out
are lost, thus the survey area and volume for certain line emission objects decreases.
However, as we will be shown, even with conservative masking criteria the fraction of
1
The values stored in SExtractor segmentation maps are actually the catalog IDs of the output
catalog.
38

N
E987
654
321
Figure 4.1: White-Light image of QSim Shallow-Survey simulation in the HUDF - the
whole area simulated is 3×3 arcmin2 on the sky. For further analyses it was cut into
nine 1×1 arcmin2 subcubes (labeled 1-9) as indicated by the grid. A compass indicating
north and east in equatorial coordinates is shown in the top right. Each subcube then
contains 300×300=90000 spaxels, each spaxel with 3463 spectral information elements.
Thus these subcubes represent the typical information content and size of one MUSE
FOV.
39

total survey area remaining is still above 85 %. Nevertheless, in the future it would be
desirable to have a method that could subtract the light from the continuum objects.
This would make those spaxels occupied by continuum emission again usable for the
detection of emission line objects that share the same position on the sky with back-
ground continuum objects (or even foreground continuum objects, in case emission line
flux is not severely attenuated by the foreground object). An method that might prove
useful in this respect could make use of analytic light profile models of galaxies. Such
profiles can be modeled with the GALFIT software (Peng et al. 2002). To improve the
accuracy of such models, one could use photometric broadband data from archives, e.g.
the HUDF broad band images for the (simulated) survey area analyzed in this work. In
a second step one would have then to fit those models in each wavelength-slice MUSE
datacube. This would provide one with a spatial-spectral model of the galaxies emission,
which could be subtracted from the datacube. However, this process is very complex
and involves many pitfalls. One of them is that the spatial appearance of a galaxy might
be wavelength dependent (see Fig. 4.2(a) for an example, but Fig. 4.2(b) for a coun-
terexample). Due to these difficulties it did not seem feasible to follow this approach
within the time constraints set for this master thesis. Nevertheless future work should
be invested to clarify how much one could gain by following that route.
In this work I explore both simple masking approaches (S/N & SExtractor). Details
of my implementations, examples and a quantitative analysis on the lost fraction of
survey area are presented in Sect. 4.2.2.
4.2.1 Collapsing an IFS Cube
The collapsing is performed by the program cube-collapse.py, which is described in
Appendix A.4.2. The collapsing of the IFU is done in two different ways, namely:
• Simple Mean: The simple mean is calculated via
F
s
(x,y) =
1
Nz z
F(x,y,z) . (4.1)
Here Nz is the number of spectral elements in the datacube and F(x,y,z) is the flux
value stored in the voxel at position (x, y, z) of the datacube.
• Variance Weighted Mean: Since we have for each voxel (x, y, z) not only the
signal F(x,y,z) but also the associated noise σ(x,y,z) stored in different HDUs of the
QSim simulated MUSE-Datacubes it is possible to calculate the variance weighted
mean for each spaxel F
vw
(x,y):
F
vw
(x,y) =
z F(x,y,z) · σ−2
(x,y,z)
z σ−2
(x,y,z)
(4.2)
The variance weighted mean minimizes the resulting variance, by weighing fluxes
with high S/N values stronger than those with low S/N. It means in practice
that the contribution to noise from telluric emission lines in the collapsed cube
will be reduced. For measuring continuum fluxes with high S/N over the full
wavelength range, this weighing scheme is optimal. However, this statistic should
not be applied on narrow wavelength ranges, since it introduces a bias in such
pseudo narrow-band images, meaning that voxels with strong line flux could be
40

(a) Example for strong wavelength dependent morphology
(b) Example for non-wavelength dependent morphology
Figure 4.2: Illustration of morphological dependence on wavelength for two galaxies
inside QSim simulated MUSE datacubes. The narrow band regions (center and left
panel) are centered around strong emission lines.
41

underrepresented in this weighing scheme. In fact, it will always be biased to
include only high-signal to noise values, which however can be very useful for
constraining continuum properties of objects in the datacubes.
Associated to this collapsed flux statistics also two other collapsed representations, are
calculated, namely:
• Variance Weighted Noise: The error of the variance weighted mean given in
Eq. (4.2) is calculated via
σvw
(x,y) =


z
σ−2
(x,y,z)


−1
. (4.3)
• Signal to Noise: The signal to noise S/N of mean continuum image given in
Eq. (4.4) is calculated via
S
N (x,y)
=
F
s
(x,y)
N−1
z z σ2
(x,y,z)
. (4.4)
The denominator of this equation is the noise σ(x,y).
The 2D representation of the datacube given by the variance weighted mean in
Eq. (4.2) is used for the SExtractor runs, while the S/N in Eq. (4.4) is used for direct
masking. The choice of the variance weighted mean as SExtractor input is motivated
by the fact that in the region of the HUDF the galaxies are very faint. So also the
noise in the datacubes will be dominated by the sky spectrum and not by the objects
itself. This is demonstrated in Fig. 4.3, where the flux Fλ and the associated noise for
on spaxel of a continuum bright object is shown. Clearly the objects signal leaves only
minor imprints on the noise spectrum.
4.2.2 Mask Creation
The collapsed datacube is used to mask out continuum objects, since the detection
algorithm applied to find emission line objects would produce erroneous results, if the
continuum emitters would remain in the cube. Two different approaches are employed -
the first relying on the S/N of the mean continuum image (S/N approach), the second
using a segmentation map produced by SExtractor (SExtractor approach).
For the S/N approach one has to use a S/N image of the cube as given in Eq. (4.4)
and choose a S/N cut. All voxels in spaxels with a S/N continuum value above this
cut will be masked out. In order to create such a mask from a collapsed datacube,
I wrote the utility signal to noise mask.py (described in Appendix A.4.3). For the
SExtractor approach, one has to make the same choice, the corresponding parameter
is called DETECT THRESH in SExtractor, but additionally the minimum number of con-
nected pixels (the 8-connected neighborhood relation is used in SExtractor, i.e. pixels
can be connected vertically, horizontally or diagonally) above that threshold must be set
(pixels in the collapsed image ˆ= spaxels of the datacube). Another difference is the way
SExtractor determines the S/N of a source. The noise image is created artificially by
applying a low-pass filter to the input image (in our case the variance weighted mean,
42

500 600 700 800 9000
20
40
60
80
100
120
140Fλ[10−20
erg/s/cm2
/
◦
A]
500 600 700 800 900
λ [nm]
0
2
4
6
8
10
12
14
σλ[10−20
erg/s/cm2
/
◦
A]
Figure 4.3: Spectrum (top) and noise (bottom) of a bright continuum galaxy in the QSim
simulated shallow HUDF datacube. The spectrum & noise shown is from the brightest
spaxel of the galaxy shown in Fig. 4.2(b).
Eq. 4.2). Also for the detection of objects, the input-image can be convolved on the fly
with selectable high-pass filters. For a full description of SExtractor and all its options
and parameters see Bertin and Arnouts (1996) and the SExtractor manual2. In order
to use SExtractor to generate a mask from a collapsed datacube I wrote a wrapper
script prep-sex.py (described in Appendix A.4.4).
The masks created in both approaches are stored as datacubes in 8-Bit FITS files3,
with masked voxels set to 0 and unmasked voxels set to 1 (mask cube). While this
representation is not memory convenient, it has the advantage that easy pointwise mul-
tiplication of the array elements from the mask cube with the original datacube results
in the masked datacube. Additionally in future refinements of the masking process it
might be useful to mask out more additional voxels. In this case such a mask cube
created here could form the basic building block for further refinements of the masking
process.
To illustrate the visual appearance of these masks in Fig. 4.4 the collapsed S/N image
according to Eq. (4.4) of subcube 1 (lower-left area in Fig. 4.1) are shown. In Fig. 4.5
masks for this cube created with the S/N approach are shown - Fig. 4.5(b) for a S/N cut
of 2.4 and Fig. 4.5(a) for a S/N cut of 3.0. Fig. 4.6 shows masks created for this cube
with the SExtractor approach - Fig. 4.6(b) with a detection threshold (SExtractor
Parameter DETECT THRESH) of 2.4σ and Fig. 4.6(a) with a detection threshold of 2.0σ,
in both cases the number of minimum contiguous pixel for a detection was set to 5, all
other values of SExtractor parameters were set to their default values.
In comparison masks created with the SExtractor approach look much cleaner than
those produced with the S/N-approach. Especially when lowering the DETECT THRESH-
parameter from 2.4 to 2.0 (Fig. 4.7 - left), one sees that in the SExtractor mask several
2
Available online at http://www.astromatic.net/software/sextractor
3
The minimum number of bits to represent a data value in a FITS array is 8 (Pence et al. 2010).
43

0 50 100 150 200 250 300
xpix
0
50
100
150
200
250
300
ypix
S/N
0
2
4
6
8
10
12
14
16
18
20
Figure 4.4: Collapsed S/N representation (Eq. 4.4) of subcube 1 (see Fig. 4.1) of the
HUDF QSim Shallow-Survey simulation.
44

(a) (S/N)cut = 3.0 (b) (S/N)cut = 2.4
Figure 4.5: Masks of subcube 1 created with the S/N approach (black pixels ˆ= spaxels
that will be masked out in the datacube).
(a) DETECT THRESH = 2.4 (b) DETECT THRESH = 2.0
Figure 4.6: As Fig. 4.5, but masks created with SExtractor approach
45

Diff. SExmask Diff. S/N-Mask
Figure 4.7: Diﬀerence in the mask created for subcube 1 when lowering the
DETECT THRESH (for the SExtractor approach - left) or the S/N (for the S/N approach
- right) from 2.4 to 2.0 for masking (black dots represent spaxels that additionally get
masked at the lower S/N or DETECT THRESH respectively)
new small objects get masked out and additionally the regions around the already masked
objects grow slightly. On the other hand, lowering the S/N-cut from 2.4 to 2.0 (Fig. 4.7
- right) produces more masked out noise peaks (single spaxels) and the regions around
already masked out regions grow only in a very fragmentary way. Thus without further
processing, the S/N masks are not be very usable, especially for lower (conservative)
S/N-cuts.
Another caveat with the continuum masks in both approaches is that some galaxies
contain extended emission line regions (e.g. the galaxy shown in Fig. 4.2(b) - while the
overall morphology does not depend on the wavelength, the galaxy grows in radius when
seen in narrow bands centered around strong emission lines, compared to the radius
seen in the average continuum image. These regions, if not masked out, would create
detections of emssion line regions associated to galaxies visible in the continuum. With
the presented method I want to search for galaxies based on their emission line signature
which were not known before. With regard to this, detections of such regions would be
“false” positives.
In order to deal with the problems of noisy continuum masks in the S/N-approach and
extended line emission undetected in the collapsed cube representations I implemented
a post-processing step of the mask. This method, which is similar to one which was
adapted by L. Wisotzki for processing of digitized photographic plates (L. Wisotzki,
priv. comm.), is described best by the term “mask evolution”. Inspired by “Conway’s
Game of Life” (e.g. McIntosh 2010), a simple set of rules is applied to masked (i.e.
with pixel value 0) or unmasked (with pixel value 1) pixels in one evolution cycle. But
contrary to Conway’s Game here during one evolution cycle pixels can either only be
created (grow cycle) or only be destroyed (shrink cycle), but both processes cannot occur
simultaneously in one cycle. Similar to the classic rule set the 8-connected topology is
employed. The rules are then, that in a grow (shrink) cycle an unmasked (masked) pixel
46

Emission Line Objects in Integral Field Spectroscopic Datacubes

Emission Line Objects in Integral Field Spectroscopic Datacubes

Recommended

Recommended

More Related Content

Similar to Emission Line Objects in Integral Field Spectroscopic Datacubes

Similar to Emission Line Objects in Integral Field Spectroscopic Datacubes (20)

More from Edmund Christian Herenz

More from Edmund Christian Herenz (6)

Recently uploaded

Recently uploaded (20)

Emission Line Objects in Integral Field Spectroscopic Datacubes