SlideShare a Scribd company logo
1 of 83
Visel2009_.pdf
ARTICLES
ChIP-seq accurately predicts
tissue-specific activity of enhancers
Axel Visel
1
*, Matthew J. Blow
1,2
*, Zirong Li
3
, Tao Zhang
2
, Jennifer A. Akiyama
1
, Amy Holt
1
, Ingrid Plajzer-Frick
1
,
Malak Shoukry
1
, Crystal Wright
2
, Feng Chen
2
, Veena Afzal
1
, Bing Ren
3
, Edward M. Rubin
1,2
& Len A. Pennacchio
1,2
A major yet unresolved quest in decoding the human genome is
the identification of the regulatory sequences that control
the spatial and temporal expression of genes. Distant-acting
transcriptional enhancers are particularly challenging to
uncover because they are scattered among the vast non-coding
portion of the genome. Evolutionary sequence constraint can
facilitate the discovery of enhancers, but fails to predict when
and where they are active in vivo. Here we present the results
of chromatin immunoprecipitation with the enhancer-associated
protein p300 followed by massively parallel sequencing,
and map several thousand in vivo binding sites of p300 in
mouse embryonic forebrain, midbrain and limb tissue. We tested
86 of these sequences in a transgenic mouse assay, which in
nearly all cases demonstrated reproducible enhancer activity in
the tissues that were predicted by p300 binding. Our results
indicate that in vivo mapping of p300 binding is a highly
accurate means for identifying enhancers and their associated
activities, and suggest that such data sets will be useful to
study the role of tissue-specific enhancers in human biology and
disease on a genome-wide scale.
The initial sequencing of the human genome1,2, complemented
by
effective computational and experimental strategies for
mammalian
gene discovery3,4, has resulted in a virtually complete list of
protein-
coding sequences. In contrast, the genomic location and
function of
regulatory elements that orchestrate gene expression in the
develop-
ing and adult body remain more obscure, hindering studies of
their
contribution to developmental processes and human disease.
Evolutionary constraint of non-coding sequences can predict the
location of enhancers in the genome5–12, but does not reveal
when
and where these enhancers are active in vivo. Furthermore, it
has been
suggested that a substantial proportion of regulatory elements is
not
sufficiently conserved to be detectable by comparative genomic
methods13–16.
Chromatin immunoprecipitation coupled to massively parallel
sequencing (ChIP-seq) has been shown to enable genome-wide
map-
ping of protein binding and epigenetic marks17–22. The ChIP-
seq
approach is dependent on the cross-linking of proteins to
specific
DNA elements, followed by antibody enrichment of the protein–
DNA
complexes, and high-throughput sequencing of the recovered
DNA
fragments. In principle, ChIP-seq using an antibody specific for
an
enhancer-binding protein could provide a conservation-
independent
approach for the identification of candidate enhancer sequences.
The acetyltransferase and transcriptional coactivator p300 is a
near-ubiquitously expressed component of enhancer-associated
protein assemblies and is critically required for embryonic
develop-
ment23–27. In homogeneous cell preparations, p300 has been
shown to
be associated with enhancers28,29, but these in vitro studies
provided
access only to subsets of enhancers that are active in a given
cell type
under culture conditions, providing limited insight into their in
vivo
function. In the present study, we have determined the genome-
wide
occupancy of p300 in forebrain, midbrain and limb tissue
isolated
directly from developing mouse embryos. Using a transgenic
mouse
reporter assay, we show that p300 binding in these embryonic
tissues
predicts with high accuracy not only where enhancers are
located in
the genome, but also in what tissues they are active in vivo.
Depending
on the tissue type, the success rate of predicting forebrain,
midbrain
and limb enhancers was between 5- and 16-fold increased
compared
to previous studies in which such enhancers were discovered by
com-
parative genomics10,11.
Genome-wide mapping of p300 in tissues
To generate genome-wide maps of p300 binding in vivo, we
micro-
dissected forebrain, midbrain and limb tissue from more than
150
embryonic day 11.5 (E11.5) mouse embryos and performed
ChIP
directly from these tissue samples using a p300 antibody (Fig.
1).
Immunoprecipitated DNA fragments were analysed using
massively
parallel sequencing and the resulting 36 base pair (bp) sequence
reads
were aligned to the reference mouse genome17,30.
After appropriate quality filtering, between 2.4 and 3.6 million
aligned reads obtained from each of the tissue samples were
used
to identify regions of the genome with significant enrichment in
p300-associated DNA sequences, hereafter referred to as
‘peaks’
owing to their appearance in genome-wide density plots17
(Supple-
mentary Table 1). Using an estimated false-discovery rate
(FDR)
threshold of ,0.01, we identified 2,543, 561 and 2,105 peaks
from
forebrain, midbrain and limb respectively (Supplementary
Tables
2–4). Most peaks were located at least 10 kb from transcript
start sites
(Supplementary Fig. 1). The smaller number of peaks from
midbrain
is probably due to variability in the efficiency of enrichment by
ChIP
(Supplementary Fig. 2). Re-sampling of subsets of data suggests
that
the major p300-binding sites from these three tissues have been
dis-
covered, whereas with increased sequencing coverage it is
anticipated
that further binding sites can be identified that are occupied
only in
smaller subsets of cells within each tissue (Supplementary Fig.
3).
Although most genomic regions with in vivo p300 binding were
identified by peaks in a single tissue, there were 386 regions at
which
peaks were observed in two tissues, and 21 regions at which
p300
peaks were observed in all three tissues (Supplementary Fig. 4).
*These authors contributed equally to this work.
1
Genomics Division, MS 84-171, Lawrence Berkeley National
Laboratory, Berkeley, California 94720, USA.
2
US Department of Energy Joint Genome Institute, Walnut
Creek, California
94598, USA. 3Ludwig Institute for Cancer Research, University
of California San Diego (UCSD) School of Medicine, La Jolla,
California 92093, USA.
Vol 457 | 12 February 2009 | doi:10.1038/nature07730
854
Macmillan Publishers Limited. All rights reserved©2009
www.nature.com/nature
www.nature.com/nature
www.nature.com/doifinder/10.1038/nature07730
p300 predicts enhancer activity patterns
To test directly whether p300 binding in developing mouse
tissues is
indicative of enhancer activity in vivo, we selected 86 regions
with a
p300 peak in at least one of the tissues for analysis in
transgenic mice,
comprising a total of 122 individual predictions of enhancer
activity in
specific tissues (Supplementary Table 5). These elements were
selected
blind to the identity of genes near which they are located,
showed a
wide range of evolutionary conservation with other vertebrate
species
(see Methods) and approximately reflect the genome-wide
distri-
bution properties of p300 peaks among intronic and intergenic
regions, as well as their distances relative to known genes
(Supple-
mentary Fig. 1).
We cloned the human genomic sequences orthologous to these
enhancer candidate regions into an enhancer reporter vector and
generated transgenic mice as previously described10,31. For
each of
the 86 candidate enhancers, several independent transgenic
embryos
(average of n 5 8) were assessed for reproducible reporter gene
expression. A pattern was considered reproducible if the same
anatom-
ical structure was stained in three or more embryos. In almost
all cases,
this minimum threshold was exceeded and reproducible reporter
staining in forebrain, midbrain or limb was present on average
in more
than 80% of the embryos obtained per construct (Supplementary
Table 5).
First we determined whether p300 binding was predictive of
repro-
ducible in vivo enhancer activity regardless of their tissue
specificity.
Considering peaks from each of the three p300 data sets
separately, 55
out of 63 (87%) forebrain predictions, 30 out of 34 (88%)
midbrain
predictions and 22 out of 25 (88%) limb predictions were active
enhan-
cers in vivo at E11.5 as defined by reproducible LacZ staining
(Fig. 2,
grey and coloured bars). Overall, 87% (75 out of 86) of the
tested
elements were reproducible enhancers at E11.5. This compares
with
a success rate for predicting enhancers of 47% (246 out of 528)
from
our previous studies in which elements were identified on the
basis of
their extreme evolutionary conservation and tested using the
same
transgenic mouse assay10,11. Thus, the rate of false-positive
predictions
using p300 ChIP-seq was more than fourfold lower than that
with
extreme evolutionary conservation (13% compared to 53%
previously;
P 5 4.2 3 10
210
, Fisher’s exact test).
We next determined the accuracy with which p300 binding
predicts
the tissue in which enhancer activity will occur. Of the 63
tested elem-
ents that overlapped a forebrain p300 peak, 49 (78%) were
found to
have reproducible enhancer activity in the developing forebrain
(Fig. 2,
blue). Similarly, 28 out of 34 (82%) tested elements identified
by
midbrain p300 enrichment (Fig. 2, red), and 20 out of 25 (80%)
tested
elements identified by limb p300 enrichment (Fig. 2, green),
were
confirmed to be active in the predicted tissue. The 86 tested
elements
included 32 sequences that were identified by p300 binding in
more
than one tissue. Of these, 27 out of 32 (84%) were active in at
least one
of the predicted tissues, and 22 sequences (69%) perfectly
recapitu-
lated the predicted expression patterns (Supplementary Table
6).
To assess the degree of enrichment of enhancer activities in
predicted
tissues, we compared the relative frequency of enhancers for
each of the
three tissues examined here with a background set of 528
previously
tested sequences predicted to be developmental enhancers on
the basis
of extreme sequence constraint that were not associated with a
prior
tissue specificity prediction10,11. For example, whereas
forebrain enhan-
cers account for only 16% (86 out of 528) of the tested elements
iden-
tified through comparative approaches, 78% (49 out of 63) of
elements
predicted by forebrain p300 peaks were found to be active
enhancers in
the forebrain (Fig. 2). Forebrain predictions are therefore
fivefold
enriched in forebrain enhancers compared with enhancers
identified
through comparative approaches (P , 1 3 10
222
). Similarly, we
observed a sixfold enrichment of midbrain enhancers (P , 1 3 10
211
)
and a 16-fold enrichment of limb enhancers (P , 1 3 10
218
) at mid-
brain and limb p300 peaks, respectively. Representative
examples of
enhancers identified by ChIP-seq are shown in Fig. 3 and
detailed anno-
tations and reproducibility across transgenic mice for all
elements tested
in this study can be found at http://enhancer.lbl.gov (ref. 32).
Taken
together, these results indicate that p300 peaks are a highly
accurate
predictor of in vivo enhancers and their spatial activity patterns.
Forebrain
3,629,292 reads
2,453 peaks
Midbrain
3,530,316 reads
561 peaks
Limb
2,419,480 reads
2,105 peaks
mb
fb
li
1 mm
Microdissection
li
Map reads
to genome
p300
anti-p300
ChIP:
Massively
parallel
sequencing
Identify
peaks
Figure 1 | Tissue dissection boundaries, overview of the ChIP-
seq approach
and summary of p300 results. Tissue dissection boundaries are
indicated in a
representative unstained E11.5 mouse embryo. For each sample,
tissue was
pooled from more than 150 embryos and ChIP-seq was
performed with a p300-
antibody. Reads obtained for each of the three tissues that
unambiguously
aligned to the reference mouse genome were used to define
peaks (FDR , 0.01).
A more comprehensive overview of sequencing and mapping
results is provided
in Supplementary Table 1. fb, forebrain; li, limb; mb, midbrain.
Co
ns
er
va
tio
n
(n
=
5
28
)
Co
ns
er
va
tio
n
(n
=
5
28
)
Co
ns
er
va
tio
n
(n
=
5
28
)
Fo
re
br
ai
n
p3
00
(n
=
6
3)
M
id
br
ai
n
p3
00
(n
=
3
4)
Li
m
b
p3
00
(n
=
2
5)
P
ro
p
o
rt
io
n
o
f
in
v
iv
o
e
n
h
a
n
c
e
rs
(a
m
o
n
g
t
e
st
e
d
e
le
m
e
n
ts
)
0%
20%
40%
60%
80%
100%
Any reproducible
enhancer activity
Pattern includes
or restricted to:
**
* * *
*
forebrain
midbrain
limb
Figure 2 | p300 binding accurately predicts enhancers and their
tissue-
specific activity patterns. Bar height indicates the frequency of
in vivo
enhancers (reproducible at E11.5) that are active in any tissue
(grey bars and
coloured bars) and the fraction of enhancers in which the
pattern includes or
is restricted to reproducible forebrain (blue bars), midbrain (red
bars) or
limb activity (green bars). In each case, candidate elements
predicted by
p300 peaks in forebrain, midbrain or limb were compared to the
frequency
of the respective pattern in a background set of 528 previously
tested
sequences identified through extreme evolutionary conservation
(combined
data sets from refs 10 and 11). The component activities of
elements
predicted to be active in several tissues were counted
separately.
*P , 0.00005, Fisher’s exact test, one-tailed.
NATURE | Vol 457 | 12 February 2009 ARTICLES
855
Macmillan Publishers Limited. All rights reserved©2009
http://enhancer.lbl.gov
Most p300-bound regions are conserved
Previous studies have indicated a positive correlation between
enhancer activity during development and non-coding sequence
conservation6,8–11,33, but it has also been suggested that not
all regu-
latory elements in vertebrate genomes are under detectable
evolu-
tionary constraint13–16. To test whether p300 binding in E11.5
tissues
is generally associated with evolutionarily constrained non-
coding
sequences, we determined if ChIP-seq reads are overall enriched
at
previously identified extremely conserved non-coding
sequences9,11.
We observed strong enrichment of p300 ChIP-seq reads at these
conserved sequences, but not at random sites or exons (Fig. 4
and
Supplementary Table 7). Vice versa, between 86% and 91% of
the
p300 peaks overlap sequences that are under evolutionary
constraint
in vertebrates34, compared to less than 30% of size-matched
random
regions (P , 1 3 10
2172
, Fisher’s exact test; Supplementary Fig. 5).
Using a more stringent constraint threshold score, we observed
that
between 10% and 21% of peaks are highly constrained,
compared to
1% of random regions (P , 1 3 10
282
). These results indicate that
most p300 peaks in the investigated tissues are under
evolutionary
constraint and support a global enrichment of p300 in highly
con-
served non-coding regions of the genome previously correlated
with
developmental enhancers.
Correlation with gene expression patterns
To examine the correlation of p300-enriched regions in
embryonic
tissues with the transcriptional regulation of neighbouring
genes, we
compared the genomic distribution of p300 peaks in E11.5
forebrain
with gene expression data from this tissue. Using high-density
micro-
arrays, we identified a set of 885 genes that are overexpressed
in fore-
brain at E11.5 compared to whole embryos (Supplementary
Table 8).
When we compared the genomic position of these forebrain
genes to
the genome-wide distribution of 2,453 forebrain-derived p300
peaks,
we observed that the intervals 90 kb up- and downstream of
their pro-
moters are 2.4-fold enriched overall in p300-binding sites (P ,
0.05,
Fig. 5a). In total, 14% of all forebrain p300 peaks are located
within
101 kb from a promoter of a forebrain-overexpressed gene. The
most
20
2
20
2
20
2
Limb p300
Conservation
Reproducibility
In vivo
LacZ pattern
b
a
5/5 4/4 11/11 8/8 5/6 9/11
Midbrain p300
Forebrain p300 ** * *
* * *
*
5 kb
Enhancer 1334 1301 1383 901 1327 1278
Figure 3 | Examples of successful prediction of in vivo
enhancers by p300
binding in embryonic tissues. a, Coverage by extended p300
reads in
forebrain (blue), midbrain (red) and limb (green). Asterisks
indicate
significant (FDR , 0.01) p300 enrichment in chromatin isolated
from the
respective tissue. Multispecies vertebrate conservation plots
(black) were
obtained from the UCSC genome browser50. Grey boxes
correspond to
candidate enhancer regions. Numbers at the right indicate
overlapping
extended reads. b, Representative LacZ-stained embryos with in
vivo
enhancer activity at E11.5. Reproducible staining in forebrain,
midbrain and
limb is indicated by arrows. Numbers show the reproducibility
of LacZ
reporter staining. Additional embryos obtained with each
construct and
genomic coordinates are available using the enhancer ID in the
bottom
portion of a at the Vista Enhancer Browser32.
50,000
random sites in
genome
50,000
conserved
non-coding
elements
0 0
–2
0
kb
+2
0
kb
–2
0
kb
+2
0
kb
R
e
la
ti
ve
p
3
0
0
c
o
ve
ra
g
e
Forebrain p300
Midbrain p300
Limb p300
30,000
internal
exons
0
–2
0
kb
+2
0
kb
1.2
1.0
1.4
1.6
1.8
*
*
*
Median element or exon
size (124 bp)
Figure 4 | In all tissues examined, p300 is enriched at highly
conserved
non-coding regions. We used a genome-wide set of 50,000
extremely
constrained non-coding sequences identified in human-mouse-
rat genome
alignments11 to assess the correlation between p300 enrichment
and non-
coding sequence conservation. Even though only subsets of the
constrained
non-coding elements are expected to be active enhancers in any
given
embryonic tissue, we observe strong enrichment in p300 binding
in all three
tissues compared to input DNA. *P , 1 3 10
2100
, Fisher’s exact test.
Relative p300 coverage near random sites and internal exons is
shown for
comparison. Brown bars indicate the median sizes of conserved
elements or
exons (124 bp in both cases). For further details, see
Supplementary Table 7.
ARTICLES NATURE | Vol 457 | 12 February 2009
856
Macmillan Publishers Limited. All rights reserved©2009
pronounced enrichment (4.8-fold, P , 0.01) was observed within
10 kb
up- and downstream of promoters of forebrain-specifically
expressed
genes. In contrast, forebrain peaks are not enriched near genes
over-
expressed in other parts of the body (Fig. 5b, Supplementary
Table 9).
Near genes that were overexpressed fivefold or more in the
fore-
brain, even higher enrichment of forebrain peaks was observed
(11-fold enrichment within 10 kb from promoters, data not
shown).
We found similar enrichment of limb-derived p300 peaks near
limb-
overexpressed genes (Supplementary Fig. 6 and Supplementary
Tables
10 and 11). These observations are consistent with the
sequences bound
by p300 in the forebrain or limb of day 11.5 embryos being
enhancers
that drive the expression of adjacent genes in these tissues at
this time
point.
Discussion
In the present study, we have determined the genome-wide
distribution
of the transcriptional coactivator protein p300 (ref. 23) using
ChIP-
seq17 directly from developing mouse tissues. Notably,
enrichment of
p300 in different mouse tissues correctly predicted the spatial
enhancer
activities of human non-coding sequences in 80% of cases
tested in a
transgenic mouse assay, whereas absence of p300 enrichment
corre-
lated in 93% of cases with absence of enhancer activity in the
respective
tissue (Supplementary Table 5). The few elements that did not
drive
reporter gene expression in the tissue predicted by p300 ChIP-
seq may
represent cases in which the function of regulatory elements has
diverged between the mouse sequences identified by ChIP-seq
and
the human orthologous regions tested in the transgenic mouse
assay.
In support of this hypothesis, we observed several cases in
which the
non-coding p300-bound region from mouse, but not the
orthologous
human sequence, had reproducible enhancer activities as
predicted by
p300 ChIP-seq from mouse tissues (data not shown). Taken
together,
the present approach provides a markedly improved specificity
for
locating enhancers in the human genome compared to
conservation-
based methods10,11 and also predicts their in vivo activity
patterns with
higher accuracy than motif-based computational methods
available at
present (for example, refs 35, 36).
Most p300-binding regions identified in developing mouse
tissues
are under detectable evolutionary constraint. They typically
overlap
conserved non-coding sequences whose length (median of 113
bp) far
exceeds that of an individual transcription factor binding site,
suggest-
ing the presence of larger functional modules. In cell culture-
based
chromatin studies, a sizeable fraction of non-coding regions in
the
human genome was found to be functional yet not
constrained13,14.
This apparent discrepancy might be due to differences in
evolutionary
constraint between enhancers active in developing tissues
compared
to those in individual cell types, but highlights the intrinsic
challenge
of inferring in vivo functionality from studies in cell culture.
A generalized picture of the epigenetic marks and proteins
assoc-
iated with different types of functional non-coding elements has
started to emerge from genome-wide chromatin
studies13,18,28,37–41.
We can now begin to use these signatures to unravel gene
regulation
on a genomic scale in the context of living organisms. The
highly
specific approach for identification of developmental enhancers
and
their activity patterns presented here represents a step in this
direction.
Complementary in-vivo-derived genomic data sets may be
produced
in the future, covering additional embryonic stages, anatomical
regions and subregions, and perhaps considering extra
molecular
markers28,42–45. Focused experiments informed by such
insights will
expedite studies of the genome-wide activity dynamics of
enhancers in
developmental, physiological and pathological processes.
METHODS SUMMARY
Embryonic forebrain, midbrain and limb tissue was isolated
from mouse
embryos at E11.5. Cross-linking, chromatin isolation,
sonication and immuno-
precipitation using an anti-p300 antibody were performed as
previously
described40,46. ChIP DNA was further sheared by sonication,
end-repaired,
ligated to sequencing adapters and amplified by emulsion PCR
as previously
described47. Gel-purified amplified ChIP DNA between 300
and 500 bp was
sequenced on the Illumina Genome Analyzer II platform to
generate 36-bp reads.
Sequence reads were aligned to the mouse reference genome
(mm9) using
BLAT48. Uniquely aligned reads were extended to 300 bp in the
39 direction and
used to determine the read coverage at individual nucleotides at
25-bp intervals
throughout the mouse genome. p300-enriched regions (peaks)
with an esti-
mated FDR of # 0.01 were identified by comparison with a
random distribution
of the same number of reads. Candidate peaks mapping to
repetitive regions
were removed as probable artefacts.
Candidate regions for transgenic testing were selected based on
ChIP-seq
results and cover a wide spectrum of conservation. Enhancer
candidate regions
were amplified by PCR from human genomic DNA and cloned
into an Hsp68-
promoter-LacZ reporter vector as previously described6,31.
Transgenic mouse
embryos were generated and evaluated for reproducible LacZ
activity at E11.5
as previously described6.
Total RNA from E11.5 whole embryos and forebrain tissue was
hybridized to
GeneChip Mouse Genome 430 2.0 arrays (Affymetrix) and
analysed according to
the manufacturer’s recommendations. Forebrain- and whole-
embryo-enriched
genes were identified as having at least 2.5-fold greater
expression in one data set
compared with the other, and a minimum signal intensity of
100. Limb-enriched
genes were identified by comparison with publicly available
wild-type E11.5
proximal hindlimb gene expression data (Gene Expression
Omnibus (GEO)
series GSE10516, samples GSM264689, GSM264690 and
GSM264691)49.
Full Methods and any associated references are available in the
online version of
the paper at www.nature.com/nature.
Received 16 October; accepted 18 December 2008.
1. Venter, J. C. et al. The sequence of the human genome.
Science 291, 1304–1351
(2001).
2. Lander, E. S. et al. Initial sequencing and analysis of the
human genome. Nature
409, 860–921 (2001).
Distance to nearest transcript start site (kb)
**
**
**
** ** ** **
**
Distance to nearest transcript start site (kb)
3.0
3.5
2.5
2.0
1.5
1.0
0.5
0.0
3.0
3.5
2.5
2.0
1.5
1.0
0.5
0.0
21–311–11 11–21 31–41 41–51 51–61 61–71 71–81 81–91 91–
101
21–311–11 11–21 31–41 41–51 51–61 61–71 71–81 81–91 91–
101
a
b
P
ro
p
o
rt
io
n
o
f
p
e
a
ks
o
r
si
te
s
(%
)
P
ro
p
o
rt
io
n
o
f
p
e
a
ks
o
r
si
te
s
(%
)
885 forebrain overexpressed genes
495 forebrain underexpressed genes
Figure 5 | p300 peaks are enriched near genes that are expressed
in the
same tissue. We compared the genome-wide distribution of
p300-enriched
regions in forebrain tissue at E11.5 with microarray expression
data for
forebrain at the same stage. Eight-hundred-and-eighty-five
genes were
forebrain-specifically overexpressed, and 495 genes were
underexpressed
relative to whole embryo RNA at the selected thresholds.
Promoters (defined
as 1 kb upstream and downstream of transcription start sites)
were excluded
from the analysis. Blue bars denote comparison to 2,435
forebrain-derived
peaks, grey bars denote comparison to 2,435 random sites. a,
Ten-kilobase
bins up to 91 kb away from forebrain-overexpressed genes were
significantly
enriched in forebrain p300 peaks. b, No peak enrichment was
observed for
forebrain-underexpressed genes. Error bars indicate the 90%
confidence
interval on the basis of 1,000 iterations of randomized
distribution.
*P , 0.05, **P , 0.01, both one-tailed.
NATURE | Vol 457 | 12 February 2009 ARTICLES
857
Macmillan Publishers Limited. All rights reserved©2009
www.nature.com/nature
3. Burge, C. & Karlin, S. Prediction of complete gene structures
in human genomic
DNA. J. Mol. Biol. 268, 78–94 (1997).
4. Okazaki, Y. et al. Analysis of the mouse transcriptome based
on functional
annotation of 60,770 full-length cDNAs. Nature 420, 563–573
(2002).
5. Marshall, H. et al. A conserved retinoic acid response
element required for early
expression of the homeobox gene Hoxb-1. Nature 370, 567–571
(1994).
6. Nobrega, M. A., Ovcharenko, I., Afzal, V. & Rubin, E. M.
Scanning human gene
deserts for long-range enhancers. Science 302, 413 (2003).
7. de la Calle-Mustienes, E. et al. A functional survey of the
enhancer activity of
conserved non-coding sequences from vertebrate Iroquois
cluster gene deserts.
Genome Res. 15, 1061–1072 (2005).
8. Woolfe, A. et al. Highly conserved non-coding sequences are
associated with
vertebrate development. PLoS Biol. 3, e7 (2005).
9. Prabhakar, S. et al. Close sequence comparisons are
sufficient to identify human
cis-regulatory elements. Genome Res. 16, 855–863 (2006).
10. Pennacchio, L. A. et al. In vivo enhancer analysis of human
conserved non-coding
sequences. Nature 444, 499–502 (2006).
11. Visel, A. et al. Ultraconservation identifies a small subset of
extremely constrained
developmental enhancers. Nature Genet. 40, 158–160 (2008).
12. Holland, L. Z. et al. The amphioxus genome illuminates
vertebrate origins and
cephalochordate biology. Genome Res. 18, 1100–1111 (2008).
13. ENCODE Project Consortium. Identification and analysis of
functional elements in
1% of the human genome by the ENCODE pilot project. Nature
447, 799–816
(2007).
14. Margulies, E. H. et al. Analyses of deep mammalian
sequence alignments and
constraint predictions for 1% of the human genome. Genome
Res. 17, 760–774
(2007).
15. Cooper, G. M. & Brown, C. D. Qualifying the relationship
between sequence
conservation and molecular function. Genome Res. 18, 201–205
(2008).
16. McGaughey, D. M. et al. Metrics of sequence constraint
overlook regulatory
sequences in an exhaustive analysis at phox2b. Genome Res. 18,
252–260 (2008).
17. Robertson, G. et al. Genome-wide profiles of STAT1 DNA
association using
chromatin immunoprecipitation and massively parallel
sequencing. Nature
Methods 4, 651–657 (2007).
18. Mikkelsen, T. S. et al. Genome-wide maps of chromatin
state in pluripotent and
lineage-committed cells. Nature 448, 553–560 (2007).
19. Robertson, A. G. et al. Genome-wide relationship between
histone H3 lysine 4
mono- and tri-methylation and transcription factor binding.
Genome Res. 18,
1906–1917 (2008).
20. Cuddapah, S. et al. Global analysis of the insulator binding
protein CTCF in
chromatin barrier regions reveals demarcation of active and
repressive domains.
Genome Res. 19, 24–32 (2009).
21. Wederell, E. D. et al. Global analysis of in vivo Foxa2-
binding sites in mouse adult
liver using massively parallel sequencing. Nucleic Acids Res.
36, 4549–4564
(2008).
22. Valouev, A. et al. Genome-wide analysis of transcription
factor binding sites
based on ChIP-Seq data. Nature Methods 5, 829–834 (2008).
23. Eckner, R. et al. Molecular cloning and functional analysis
of the adenovirus E1A-
associated 300-kD protein (p300) reveals a protein with
properties of a
transcriptional adaptor. Genes Dev. 8, 869–884 (1994).
24. Eckner, R., Yao, T. P., Oldread, E. & Livingston, D. M.
Interaction and functional
collaboration of p300/CBP and bHLH proteins in muscle and B-
cell
differentiation. Genes Dev. 10, 2478–2490 (1996).
25. Yao, T. P. et al. Gene dosage-dependent embryonic
development and proliferation
defects in mice lacking the transcriptional integrator p300. Cell
93, 361–372
(1998).
26. Merika, M., Williams, A. J., Chen, G., Collins, T. &
Thanos, D. Recruitment of CBP/
p300 by the IFNb enhanceosome is required for synergistic
activation of
transcription. Mol. Cell 1, 277–287 (1998).
27. Maston, G. A., Evans, S. K. & Green, M. R. Transcriptional
regulatory elements in
the human genome. Annu. Rev. Genomics Hum. Genet. 7, 29–59
(2006).
28. Heintzman, N. D. et al. Distinct and predictive chromatin
signatures of
transcriptional promoters and enhancers in the human genome.
Nature Genet. 39,
311–318 (2007).
29. Xi, H. et al. Identification and characterization of cell type-
specific and ubiquitous
chromatin regulatory structures in the human genome. PLoS
Genet. 3, e136
(2007).
30. Mouse Genome Sequencing Consortium. Initial sequencing
and comparative
analysis of the mouse genome. Nature 420, 520–562 (2002).
31. Kothary, R. et al. A transgene containing lacZ inserted into
the dystonia locus is
expressed in neural tube. Nature 335, 435–437 (1988).
32. Visel, A., Minovitsky, S., Dubchak, I. & Pennacchio, L. A.
VISTA Enhancer
Browser—a database of tissue-specific human enhancers.
Nucleic Acids Res. 35,
D88–D92 (2007).
33. Cheng, Y. et al. Transcriptional enhancement by GATA1-
occupied DNA segments
is strongly associated with evolutionary constraint on the
binding site motif.
Genome Res. 18, 1896–1905 (2008).
34. Siepel, A. et al. Evolutionarily conserved elements in
vertebrate, insect, worm, and
yeast genomes. Genome Res. 15, 1034–1050 (2005).
35. Hallikas, O. et al. Genome-wide prediction of mammalian
enhancers based on
analysis of transcription-factor binding affinity. Cell 124, 47–
59 (2006).
36. Pennacchio, L. A., Loots, G. G., Nobrega, M. A. &
Ovcharenko, I. Predicting tissue-
specific enhancers in the human genome. Genome Res. 17, 201–
211 (2007).
37. Kim, T. H. et al. A high-resolution map of active promoters
in the human genome.
Nature 436, 876–880 (2005).
38. Boyle, A. P. et al. High-resolution mapping and
characterization of open chromatin
across the genome. Cell 132, 311–322 (2008).
39. Schones, D. E. & Zhao, K. Genome-wide approaches to
studying chromatin
modifications. Nature Rev. Genet. 9, 179–191 (2008).
40. Barrera, L. O. et al. Genome-wide mapping and analysis of
active promoters in
mouse embryonic stem cells and adult organs. Genome Res. 18,
46–59 (2008).
41. Chen, X. et al. Integration of external signaling pathways
with the core
transcriptional network in embryonic stem cells. Cell 133,
1106–1117 (2008).
42. Kwok, R. P. et al. Nuclear protein CBP is a coactivator for
the transcription factor
CREB. Nature 370, 223–226 (1994).
43. Ogryzko, V. V., Schiltz, R. L., Russanova, V., Howard, B.
H. & Nakatani, Y. The
transcriptional coactivators p300 and CBP are histone
acetyltransferases. Cell 87,
953–959 (1996).
44. Agalioti, T. et al. Ordered recruitment of chromatin
modifying and general
transcription factors to the IFN-b promoter. Cell 103, 667–678
(2000).
45. Ge, K. et al. Transcription coactivator TRAP220 is required
for PPARc2-
stimulated adipogenesis. Nature 417, 563–567 (2002).
46. Li, Z. et al. A global transcriptional regulatory role for c-
Myc in Burkitt’s lymphoma
cells. Proc. Natl Acad. Sci. USA 100, 8164–8169 (2003).
47. Blow, M. J. et al. Identification of the source of ancient
remains through genomic
sequencing. Genome Res. 18, 1347–1353 (2008).
48. Kent, W. J. BLAT—the BLAST-like alignment tool.
Genome Res. 12, 656–664
(2002).
49. Krawchuk, D. & Kania, A. Identification of genes controlled
by LMX1B in the
developing mouse limb bud. Dev. Dyn. 237, 1183–1192 (2008).
50. Karolchik, D. et al. The UCSC Genome Browser Database:
2008 update. Nucleic
Acids Res. 36, D773–D779 (2008).
Supplementary Information is linked to the online version of the
paper at
www.nature.com/nature.
Acknowledgements We wish to thank R. Hosseini and S.
Phouanenavong for
technical support, and J. Rubenstein, J. Long, J. Choi and Y.
Zhu for help with
microarray experiments. This work was performed under the
auspices of the US
Department of Energy’s Office of Science, Biological and
Environmental Research
Program and by the University of California, Lawrence
Berkeley National
Laboratory under contract no. DE-AC02-05CH11231, Lawrence
Livermore
National Laboratory under contract no. DE-AC52-07NA27344,
and Los Alamos
National Laboratory under contract no. DE-AC02-06NA25396.
L.A.P. and E.M.R.
were supported by the Berkeley-PGA, under the Programs for
Genomic
Applications, funded by National Heart, Lung, & Blood
Institute, and L.A.P. by the
National Human Genome Research Institute. A.V. was
supported by an American
Heart Association postdoctoral fellowship. B.R. was supported
by grants from the
National Human Genome Research Institute and the Ludwig
Institute for Cancer
Research.
Author Information Reprints and permissions information is
available at
www.nature.com/reprints. All ChIP-seq data sets described in
this study have
been deposited at the National Center for Biotechnology
Information (NCBI) in the
Gene Expression Omnibus (GEO) database under accession
number GSE13845.
Correspondence and requests for materials should be addressed
to L.A.P.
([email protected]).
ARTICLES NATURE | Vol 457 | 12 February 2009
858
Macmillan Publishers Limited. All rights reserved©2009
www.nature.com/nature
www.nature.com/reprints
mailto:[email protected]
METHODS
Tissue dissection and chromatin immunoprecipitation.
Embryonic forebrain,
midbrain and limb tissue was isolated from timed-pregnant CD-
1 strain mouse
embryos at E11.5 by microdissection in cold PBS along the
anatomical bound-
aries indicated in Fig. 1. Tissue samples were cross-linked (1%
formaldehyde,
10 mM NaCl, 100 mM EDTA, 50 mM EGTA, 5 mM HEPES, pH
8.0) for 15 min at
room temperature. Cross-linking was terminated by the addition
of 125 mM
glycine and cells were dissociated in a glass douncer.
Chromatin isolation, soni-
cation and immunoprecipitation were performed as previously
described40,46. In
brief, 1 mg of sonicated chromatin (OD260) was incubated with
10 mg of anti-
body (rabbit polyclonal anti-p300 (C-20), Santa Cruz
Biotechnology) coupled to
IgG magnetic beads (Dynal Biotech) overnight at 4 uC. The
magnetic beads were
washed eight times with RIPA buffer (50 mM HEPES, pH 8.0, 1
mM EDTA,
1% NP-40, 0.7% DOC and 0.5 M LiCl, supplemented with
complete protease
inhibitors from Roche Applied Science), and washed once with
TE buffer
(10 mM Tris, pH 8.0, 1 mM EDTA). After washing, bound DNA
was eluted at
65 uC in elution buffer (10 mM Tris, pH 8.0, 1 mM EDTA and
1% SDS) for 10
min and incubated at 65 uC overnight to reverse cross-links.
After the reversal of
cross-linking, immunoprecipitated DNA was treated
sequentially with protei-
nase K and RNase A, and desalted using the QIAquick PCR
purification kit
(Qiagen).
Amplification and Illumina sequencing of ChIP DNA. ChIP
DNA was quan-
tified by Qubit assay HS kit. Approximately 0.1 ng of each
ChIP DNA sample was
sheared using Sonicator XL2020 (Misonix) with a microplate
horn for 10 min at
55% power output and 90% amplitude. Sheared ChIP DNA
extract was end-
repaired using the End-It DNA End-Repair Kit (Epicentre).
Illumina adapters
(56 bp and 34 bp) were ligated using T4 DNA ligase (5 U ml
21
, Fermentas) and
recovered using a MinElute Reaction Cleanup Kit (Qiagen).
Linker ligated ChIP
DNA was amplified by emulsion PCR for 40 cycles as
previously described47.
Amplified ChIP DNA between 300 and 500 bp was gel purified
on 2% agarose
and sequenced on the Illumina Genome Analyzer II according to
the manufac-
turer’s instructions except that emulsion PCR-amplified DNA
containing the GA2
sequencing adaptor was applied directly to the cluster station
for bridge amplifica-
tion. The resulting flow-cell was sequenced for 36 cycles to
generate 36-bp reads.
Processing of Illumina sequence data. Unfiltered 36-bp Illumina
sequence reads
were aligned to the mouse reference genome (NCBI build 37,
mm9) using BLAT48
with optional parameters (minScore 5 20, minIdentity 5 80,
stepSize 5 5). BLAT
was performed in parallel on a sge-cluster. For each read, the
two highest-scoring
alignments were compared and reads were rejected as repetitive
unless the score of
the best alignment was at least two greater than that of the
second best alignment.
The remaining reads were further filtered to reject those with a
BLAT alignment
score ,21, with .1 bp insertion or deletion, or with .2 unaligned
bases at the start
of the read. Finally, reads with identical start sites in the mouse
genome were
considered likely to be duplicate sequences arising as an
artefact of sample amp-
lification or sequencing, and were counted only once. The
remaining reads were
classed as uniquely aligned to the mouse genome.
Uniquely aligned reads were extended to 300 bp in the 39
direction to account
for the average length of size-selected p300 ChIP fragments
used for sequencing.
These extended read coordinates were used to determine the
read coverage at
individual nucleotides at 25 bp intervals throughout the mouse
genome. This
data was used to produce coverage plots for visualization in the
UCSC genome
browser.
To identify p300-enriched regions (peaks), we compared the
observed fre-
quency of coverage depths with those expected from a random
distribution of
the same number of reads generated computationally as
described previously17.
In brief, the probability of observing a peak with a coverage
depth of at least H
reads is given by a sum of Poisson probabilities as:
1{
PH {1
k~0
e{ll
k
k!
in which l is the average genome-wide coverage of extended
reads given by: (read
length 3 number of aligned reads)/alignable genome length. To
estimate the
alignable genome length, one million randomly selected 36-
base-polymers from
the mouse genome were realigned to the mouse genome using
the same align-
ment and filtering scheme as for reads. A total of 77.3% of 36-
base-polymers
were uniquely mapped back to the mouse genome, resulting in
an alignable
genome length of 2.107 Gb.
For each sample, we determined the read coverage depth at
which the observed
frequency of sites with that coverage exceeded the expected
frequency by a factor of
100 (FDR # 0.01). Candidate peaks were identified as sites in
which the coverage
exceeded this threshold, and peak boundaries were extended to
the nearest flanking
positions at which read coverage fell below two reads. All
consecutive regions of
enrichment separated by regions of continuous coverage greater
than two reads
were merged into a single peak. Candidate peaks mapping to
chr_random contigs,
centromeric regions, telomeric regions, segmental duplications,
satellite repeats,
ribosomal RNA repeats or regions of .70% repeat sequence, and
those coinciding
with enriched regions in the control sample (input DNA) were
removed as probable
artefacts due to misalignment of heterochromatic sequences that
are not at present
represented in the mouse reference genome sequence. The
remaining peaks
represent high-confidence p300-enriched regions and putative
enhancers with
activity in specific tissues.
Annotation of p300 ChIP-seq read data sets with respect to
nearby genes
(UCSC known genes50), internal exons (mouse RefSeq51 exons
.30 kb from
the ends of transcripts) and conserved non-coding sequences
(top 50,000 con-
strained non-coding human-mouse-rat conserved elements
identified using
GUMBY with R-ratio parameter R 5 50; refs 9, 11) was
performed using
Galaxy52 and custom Perl scripts. Annotation of p300-enriched
regions with
respect to UCSC known genes and vertebrate phastCons
elements34 was per-
formed using custom Perl scripts.
Transgenic mouse enhancer assay. Candidate regions for
transgenic testing
were selected based on ChIP-seq results. Peaks for which human
orthologous
regions could not be unambiguously established and those
without detectable
conservation in opossum53 were excluded from transgenic
testing. Thus, the
tested peaks cover a wide spectrum of conservation, but are
overall more con-
strained than all peaks identified genome-wide (median score of
457 for all peaks
versus 626 for tested peaks). Enhancer candidate regions
(average size of 2.4 kb)
were amplified by PCR from human genomic DNA (Clontech)
and cloned into
an Hsp68-promoter-LacZ reporter vector upstream of an Hsp68-
promoter
coupled to a LacZ reporter gene as previously described6,31.
Candidate sequences
were not cloned in any particular orientation, effectively
resulting in randomized
insert orientation among the test constructs. Genomic
coordinates of amplified
regions are reported in Supplementary Table 5. Transgenic
mouse embryos were
generated by pronuclear injection and F0 embryos were
collected at E11.5 and
stained for LacZ activity as previously described6. Only
patterns that were
observed in at least three different embryos resulting from
independent trans-
genic integration events of the same construct were considered
reproducible (see
Supplementary Table 5). To account for minor variation in
separating forebrain
from midbrain during tissue dissection, forebrain and midbrain
p300 peaks were
also considered correct predictions if the reproducible in vivo
pattern was located
in the forebrain/midbrain boundary region, whereas absence of a
p300 peak was
only considered a false-negative prediction if the reproducible
in vivo pattern
clearly extended beyond the boundary region.
Microarrays. Tissue was isolated from timed-pregnant CD-1
strain mouse
embryos at E11.5. Forebrains were further subdivided into basal
telencephalon
(subpallium), dorsal telencephalon (pallium), and diencephalon,
which were
processed separately in subsequent steps. For comparison,
whole embryos (lit-
termates) were collected. All samples were collected, processed
and hybridized in
duplicate. Total RNA was extracted using Trizol reagent
(Invitrogen). Synthesis
of complementary RNA, hybridization to GeneChip Mouse
Genome 430 2.0
arrays (Affymetrix) and analysis of hybridization results was
performed accord-
ing to the manufacturer’s recommendations. For each sample,
the average
expression value from duplicates was used for downstream
analyses.
Forebrain-enriched genes were defined as those with expression
at least 2.5-fold
greater expression in at least one of the three forebrain regions
compared with
the whole embryo, and with a minimum signal intensity of 100.
Whole embryo-
enriched genes are defined as those with at least 2.5-fold greater
expression in the
whole embryo than in each of the three forebrain regions, and a
minimum signal
intensity of 100. Distances between p300 peaks and the 59 end
of Affymetrix
consensus complementary DNA sequences from mouse MOE430
(A and B)
aligned to the mouse reference genome (mm9) were used to
determine the
closest forebrain-enriched and whole embryo-enriched genes
(Supplementary
Tables 8 and 9). The same procedure was used to analyse the
correlation of limb
p300 peaks with limb gene expression, except that limb
expressed genes were
identified by comparison of publicly available wild-type E11.5
proximal hind-
limb gene expression data (GEO series GSE10516, samples
GSM264689,
GSM264690 and GSM264691)49, with the whole embryo gene
expression data
generated in the present study (Supplementary Tables 10 and
11).
Animal work. All animal work was performed in accordance
with protocols
reviewed and approved by the Lawrence Berkeley National
Laboratory Animal
Welfare and Research Committee.
51. Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI reference
sequences (RefSeq): a
curated non-redundant sequence database of genomes,
transcripts and proteins.
Nucleic Acids Res. 35, D61–D65 (2007).
52. Giardine, B. et al. Galaxy: a platform for interactive large-
scale genome analysis.
Genome Res. 15, 1451–1455 (2005).
53. Mikkelsen, T. S. et al. Genome of the marsupial
Monodelphis domestica reveals
innovation in non-coding sequences. Nature 447, 167–177
(2007).
doi:10.1038/nature07730
Macmillan Publishers Limited. All rights reserved©2009
www.nature.com/doifinder/10.1038/nature07730
www.nature.com/nature
www.nature.com/natureTitleAuthorsAbstractGenome-wide
mapping of p300 in tissuesp300 predicts enhancer activity
patternsMost p300-bound regions are conservedCorrelation with
gene expression patternsDiscussionMethods
SummaryReferencesMethodsTissue dissection and chromatin
immunoprecipitationAmplification and Illumina sequencing of
ChIP DNAProcessing of Illumina sequence dataTransgenic
mouse enhancer assayMicroarraysAnimal workMethods
ReferencesFigure 1 Tissue dissection boundaries, overview of
the ChIP-seq approach and summary of p300 results.Figure 2
p300 binding accurately predicts enhancers and their tissue-
specific activity patterns.Figure 3 Examples of successful
prediction of in vivo enhancers by p300 binding in embryonic
tissues.Figure 4 In all tissues examined, p300 is enriched at
highly conserved non-coding regions.Figure 5 p300 peaks are
enriched near genes that are expressed in the same tissue.
questions for assignmnet-2.docx
questions for assignmnet-1.docx
Lee2007_paper1.pdf
Discovery of growth hormone-releasing hormones
and receptors in nonmammalian vertebrates
Leo T. O. Lee*, Francis K. Y. Siu*, Janice K. V. Tam*, Ivy T.
Y. Lau*, Anderson O. L. Wong*, Marie C. M. Lin†,
Hubert Vaudry‡, and Billy K. C. Chow*§
Departments of *Zoology and †Chemistry, University of Hong
Kong, Pokfulam Road, Hong Kong, China; and ‡Institut
National de la Santé et de la
Recherche Médicale U-413, Laboratory of Cellular and
Molecular Neuroendocrinology, European Institute for Peptide
Research (Institut Fédératif
de Recherches Multidisciplinaires sur les Peptides 23),
University of Rouen, 76821 Mont-Saint-Aignan, France
Communicated by Roger Guillemin, The Salk Institute for
Biological Studies, La Jolla, CA, December 12, 2006 (received
for review August 28, 2006)
In mammals, growth hormone-releasing hormone (GHRH) is the
most important neuroendocrine factor that stimulates the release
of growth hormone (GH) from the anterior pituitary. In
nonmam-
malian vertebrates, however, the previously named GHRH-like
peptides were unable to demonstrate robust GH-releasing activi-
ties. In this article, we provide evidence that these GHRH-like
peptides are homologues of mammalian PACAP-related peptides
(PRP). Instead, GHRH peptides encoded in cDNAs isolated
from
goldfish, zebrafish, and African clawed frog were identified.
More-
over, receptors specific for these GHRHs were characterized
from
goldfish and zebrafish. These GHRHs and GHRH receptors
(GHRH-
Rs) are phylogenetically and structurally more similar to their
mammalian counterparts than the previously named GHRH-like
peptides and GHRH-like receptors. Information regarding their
chromosomal locations and organization of neighboring genes
confirmed that they share the same origins as the mammalian
genes. Functionally, the goldfish GHRH dose-dependently acti-
vates cAMP production in receptor-transfected CHO cells as
well as
GH release from goldfish pituitary cells. Tissue distribution
studies
showed that the goldfish GHRH is expressed almost exclusively
in
the brain, whereas the goldfish GHRH-R is actively expressed
in
brain and pituitary. Taken together, these results provide
evidence
for a previously uncharacterized GHRH-GHRH-R axis in
nonmam-
malian vertebrates. Based on these data, a comprehensive evolu-
tionary scheme for GHRH, PRP-PACAP, and PHI-VIP genes in
rela-
tion to three rounds of genome duplication early on in
vertebrate
evolution is proposed. These GHRHs, also found in flounder,
Fugu,
medaka, stickleback, Tetraodon, and rainbow trout, provide re-
search directions regarding the neuroendocrine control of
growth
in vertebrates.
molecular evolution � genome duplication � pituitary
adenylate
cyclase-activating polypeptide
Growth hormone-releasing hormone (GHRH), also knownas
growth hormone-releasing factor, was initially isolated
from pancreatic tumors causing acromegaly (1, 2), and hypo-
thalamic human GHRH was shown to be identical to the one
isolated from the pancreas tumor (3). Thereafter, the sequences
of GHRHs were determined in various vertebrate species and in
protochordates (4). In mammals, GHRH is mainly expressed and
released from the arcuate nucleus of the hypothalamus (5). The
primary function of GHRH is to stimulate GH synthesis and
secretion from anterior pituitary somatotrophs via specific in-
teraction with its receptor, GHRH-R (5). In addition, GHRH
activates cell proliferation, cell differentiation, and growth of
somatotrophs (6 – 8). There are many other reported activities
of
GHRH such as modulation of appetite and feeding behavior,
regulation of sleeping (9, 10), control of jejunal motility (11),
and
increase of leptin levels in modest obesity (12).
In mammals, GHRH and pituitar y adenylate cyclase-
activating polypeptide (PACAP) are encoded by separate genes:
GHRH is encoded with a C-peptide with no known function,
whereas PACAP and PACAP-related peptide (PRP) are present
in the same transcript (13, 14). In nonmammalian vertebrates
and protochordates, GHRH and PACAP were believed to be
encoded by the same gene and hence processed from the same
transcript and prepropolypeptide (15). It was suggested that
mammalian GHRH was evolved as a consequence of a gene
duplication event of the PACAP gene which occurred just
before
the divergence that gave rise to the mammalian lineage (4, 16).
Before this gene duplication event, PACAP was the true phys-
iological regulator for GH release while its function was pro-
gressively taken over by the GHRH-like peptides encoded with
PACAP after the emergence of tetrapods (16). The GHRH-like
peptide was later evolved to PRP in mammals, and the second
copy of the ancestral GHRH-like/PACAP gene, after gene
duplication, became the physiological GHRH in mammals (16).
This hypothetical evolutionary scheme of GHRH and PACAP
genes previously proposed could successfully accommodate and
explain most of the information available. However, by data
mining of the genomic sequences from several vertebrates,
genomic sequences encoding for putative GHRHs and
GHRH-Rs from amphibian (Xenopus laevis) and teleost (ze-
brafish) were found. The corresponding proteins share a higher
degree of sequence identity with mammalian GHRH and
GHRH-R, when compared with the previously identified
GHRH-like peptides and receptors. In this report, by analyzing
the phylogeny, chromosomal location, and function of these
ligands and receptors, a previously uncharacterized evolutionary
scheme for GHRH, PRP-PACAP, and PHI-vasoactive intestinal
polypeptide (VIP) genes in vertebrates is proposed. More im-
portantly, the discovery of mammalian GHRH homologues by in
silico analysis in other fish provides new research directions
regarding the neuroendocrine control of growth in vertebrates.
Results
Predicted Amino Acid Sequences of Fish and Amphibian
GHRHs.
Recently, several nonmammalian vertebrate genome databases
of
avian [Gallus gallus (chicken)], amphibian [Xenopus tropicalis
(Af-
rican clawed frog)], and fish [Danio rerio (zebrafish), Takifugu
rubripes (Fugu), and Tetraodon nigroviridis (pufferfish)]
species
were completely or partially released. Using these valuable re-
Author contributions: L.T.O.L., H.V., and B.K.C.C. designed
research; L.T.O.L., F.K.Y.S.,
J.K.V.T., I.T.Y.L., and A.O.L.W. performed research; M.C.M.L.
contributed new reagents/
analytic tools; L.T.O.L., F.K.Y.S., J.K.V.T., I.T.Y.L.,
A.O.L.W., and B.K.C.C. analyzed data; and
L.T.O.L., H.V., and B.K.C.C. wrote the paper.
The authors declare no conflict of interest.
Abbreviations: GH, growth hormone; GHRH, growth hormone-
releasing hormone;
GHRH-R, GHRH receptor; PACAP, pituitary adenylate cyclase-
activating polypeptide;
PAC1-R, PACAP receptor; PHI, peptide histidine isoleucine;
PRP, PACAP-related peptide;
PRP-R, PACAP-related peptide receptor; VIP, vasoactive
intestinal polypeptide.
Data deposition: The sequences reported in this paper have been
deposited in the GenBank
database (accession nos. DQ991243–DQ991247 and ABJ55977–
ABJ55981).
§To whom correspondence should be addressed. E-mail:
[email protected]
This article contains supporting information online at
www.pnas.org/cgi/content/full/
0611008104/DC1.
© 2007 by The National Academy of Sciences of the USA
www.pnas.org�cgi�doi�10.1073�pnas.0611008104 PNAS �
February 13, 2007 � vol. 104 � no. 7 � 2133–2138
B
IO
C
H
EM
IS
TR
Y
http://www.pnas.org/cgi/content/full/0611008104/DC1
http://www.pnas.org/cgi/content/full/0611008104/DC1
sources, we performed bioinformatics analyses to look for
previ-
ously uncharacterized GHRHs and GHRH-Rs in amphibian and
fish. Putative genes for GHRH were predicted from X.
tropicalis,
goldfish (gfGHRH), and zebrafish (zfGHRH), and with the help
of
these sequences, we successfully cloned their full-length
cDNAs.
[supporting information (SI) Figs. 6 – 8]. By aligning all known
vertebrate GHRH precursor sequences (SI Fig. 9), it was found
that
only the N-terminal region (1–27) of GHRHs is conserved.
Human
shares 81.5% and 74.1% sequence identity with
goldfish/zebrafish
and X. laevis GHRHs, respectively (SI Fig. 10). Within the first
7 aa,
there is only one amino acid substitution in goat (position 1),
human (position 1), mouse (position 1), and Xenopus (position
2). These observations clearly show that a strong selective
pressure has acted to preserve the GHRH sequence in evolution,
indicating that this peptide plays important functions from fish
to mammals. It should be noted that within the first 27 aa, the
previously named GHRH-like peptides share much lower level
of sequence identity with these fish and mammalian GHRHs
(51.9 –59.3% for gfPRPsalmon, and 37– 44.4% for
gfPRPcatfish;
SI Fig. 10). Instead, these GHRH-like peptides are structurally
homologous to mammalian PRPs (Fig. 1A).
Cloning of GHRH-Rs from Zebrafish and Goldfish. Putative
GHRH-R
cDNAs were cloned from zebrafish and goldfish (zfGHRH-R
and
gfGHRH-R) (see SI Figs. 11 and 12). The amino acid sequences
of
these GHRH-R receptors share 78.4% identity among
themselves,
and 51.1% and 49.7%, respectively, with the human GHRH-R
(SI
Fig. 13). Kyte-Doolittle hydrophobicity plots of these receptors
indicated the presence of seven hydrophobic transmembrane re-
gions that are conserved in all class IIB receptors (SI Fig. 14).
In
addition, the eight cysteine residues in the N termini of fish and
human GHRH-Rs, which presumably structurally maintain the
ligand binding pocket (17), are identical. These receptors also
contain the basic motifs in the third endoloop, indicating that
they
can also couple to the cAMP pathway, similar to other members
in
the same PACAP/glucagon receptor family (18).
A comparison of the fish GHRH-R with other class IIB
receptors
(SI Fig. 13) clearly shows that fish GHRH-Rs, sharing 64.8 –
78.4%
sequence identity, are distinct from the previously identified
GHRH-like receptors (PRP-Rs). The sequence identity between
goldfish GHRH-R and PRP-R is 39.7%, which is not different
(36.4 –38.3%) from other goldfish receptors in the same gene
family, including PAC1-R, VPAC1-R, and PHI-R. The sequence
identity shared by goldfish, avian, and mammalian GHRH-Rs is
48.9 –51.1%, which is significantly higher than any other
receptors
in the same class IIB family, whether in goldfish or in human.
Phylogenetic Analysis of the Novel GHRH and GHRH-Rs.
Based on the
previously uncharacterized GHRH sequences, a revised
phyloge-
netic tree for GHRH, GHRH-like, and PRP peptides was con-
structed by using PACAP as the out-group (Fig. 1 A). The
identified
GHRH peptides from fish and amphibian were grouped together
with all other previously cloned or predicted GHRH sequences
from mammals to fish. Interestingly, the previously named
GHRH-
like peptides were phylogenetically closer to mammalian PRPs
and
resembled to form a distinct branch together. Within the GHRH
grouping, GHRHs clustered together according to their class
with
the exception of rodents. Within all PRPs, avian, amphibian,
and
fish PRPsalmon-like (formerly GHRHsalmon-like) clustered to
form a sub-branch, whereas, the fish PRPcatfish-like peptides
formed a different group. Apparently, there are two forms of
PRPs
in fish, but only the salmon-like PRPs are observed in avian and
amphibian species. Catfish-like PRP, therefore, is uniquely
present
in fish and hence must have evolved by gene duplication after
the
split that gave rise to tetrapod and ray-finned fish.
Similarly, the discovered gfGHRH-R and zfGHRH-R, and
the predicted GHRH-R from chicken and Fugu, branched
together with mammalian GHRH-Rs, whereas the fish PRP-Rs
(formerly GHRH-like receptor) formed a separate branch (Fig.
1B). These data confirmed that, together with GHRHs, recep-
tors that are structurally closely related to mammalian
GHRH-Rs also exist in other classes. It is interestingly to note
that PRP-R might have been lost in mammals (see below). As
the
function of PRP in nonmammalian vertebrates is not known, the
implications of losing PRP-R in mammals is unclear.
Chromosomal Synteny of GHRH and GHRH-R Genes.
According to the
latest genome assembly versions, the locations, gene structures,
and
organizations of neighboring genes of GHRHs and GHRH-Rs in
zebrafish (zebrafish assembly Version 4, Zv4) and Xenopus (X.
tropicalis 4.1) were determined (SI Fig. 15). All exon-intron
splice
junctions agree with the canonical GT/AG rule. Both fish and
Fig. 1. Phylogenetic analysis of GHRH, PRP, and GHRH-like
peptide (1–27),
GHRH-Rs, and PRP-Rs. Predicted peptide and receptor
sequences are marked
with asterisks. (A) PACAP sequences were used as an out-
group. (B) Receptors
identified in this report are underlined.
2134 � www.pnas.org�cgi�doi�10.1073�pnas.0611008104
Lee et al.
http://www.pnas.org/cgi/content/full/0611008104/DC1
http://www.pnas.org/cgi/content/full/0611008104/DC1
http://www.pnas.org/cgi/content/full/0611008104/DC1
http://www.pnas.org/cgi/content/full/0611008104/DC1
http://www.pnas.org/cgi/content/full/0611008104/DC1
http://www.pnas.org/cgi/content/full/0611008104/DC1
http://www.pnas.org/cgi/content/full/0611008104/DC1
http://www.pnas.org/cgi/content/full/0611008104/DC1
http://www.pnas.org/cgi/content/full/0611008104/DC1
http://www.pnas.org/cgi/content/full/0611008104/DC1
amphibian GHRH genes have four exons and the mature
peptides
are located in exon 3. In mammals, GHRH genes also have four
exons, whereas the mature peptides are found in exon 2 instead
of
exon 3. Also, exon 2 and its encoded sequences in frog and
goldfish
are different and are unique in their own species (SI Fig. 9).
This
information indicated that, although there were frequent exon
rearrangements, the mature peptide-encoding exon, in terms of
sequence and structure, have been relatively well preserved
during
evolution. Finally, the GHRH genes are also structurally
distinct
from PRP-PACAP genes (SI Fig. 15), in which the three exons
encoding cryptic peptide, PRP and PACAP are arranged in tan-
dem. Thus, the organizations of GHRH and PRP-PACAP genes
also indicate that they have different origins in evolution.
Fig. 2 summarizes the genomic locations of GHRH and
GHRH-R in various vertebrates. For the GHRH genes, the
nearest
neighboring genes of GHRH are RPN2 and MANBAL in human
and chicken. In fact, the RPN2 genes are found also in similar
locations in all species analyzed except zebrafish (Fig. 2 A).
Other
than RPN2, EPB41L1, C20orf4, DLGAP4, MYL9, and
CTNNBL1
are also closely linked to the GHRH gene loci in human and
frog.
In Fugu and Xenopus, BPI and EPB41L1 genes are both linked
to
GHRH gene, whereas in Fugu and zebrafish, the CDK5RAP1
gene
is found close to the GHRH gene locus. These kinds of
similarities
in chromosomal syntenic linkage were not observed when
compar-
ing the GHRH and PRP-PACAP genes in vertebrate genomes,
again confirming the idea that two different genes encoding for
GHRH and PACAP exist from fish to mammal.
Regarding the receptor, the GHRH-R and PAC1-R gene loci are
in close proximity in human, chicken, zebrafish, and Fugu (Fig.
2B),
indicating that they are homologues in different species.
Interest-
ingly, we found that the PRP-R locus is also linked to PAC1-R
gene
in chicken. Despite our efforts, no PRP-R gene could be found
in
mammalian genomes (human, chimpanzee, mouse, rat, and
rabbit),
and hence it is likely that the PRP-R gene was lost after the
divergence that gave rise to the mammalian lineage.
Comparison of Fish GHRH and PRP in Activating GHRH-R and
PRP-R. To
confirm the functional identity of fish GHRH and its receptor,
zfGHRH-R-transfected CHO cells were exposed to 100 nM of
the
various related peptides (Fig. 3A). Both fish and frog (1–27)
GHRHs were able to stimulate cAMP accumulation above back-
ground levels (no peptide or pcDNA3.1-transfected cells).
Graded
concentrations of fishGHRH and xGHRH induced dose-
dependent stimulation of zfGHRH-R with EC50 values of 3.7 �
10�7 and 8.4 � 10�7 M, respectively, the fish peptide being
2.3 folds
more efficacious than xGHRH (Fig. 3B). In contrast, gfPRP-
salmon-like and gfPRPcatfish-like did not induce cAMP produc-
tion in zfGHRH-R-transfected cells. Interestingly, in gfPRP-R-
transfected cells, gfPRPsalmon-like, fishGHRH, and xGHRH
could all stimulate intracellular cA MP production dose-
dependently with EC50 values of gfPRPsalmon-like 5.8 � 10�9
M � fishGHRH 1.8 � 10�7M � xGHRH 4.8 � 10�7 M (Fig.
3C).
To test whether fish GHRH and gfPRPsalmon-like can exert a
regulatory role at the pituitary level to modulate GH secretion,
goldfish pituitary cells were challenged with graded
concentrations
of goldfish GHRH and PRP. A 4-h incubation of cells with
goldfish
GHRH induced a dose-related increase in GH release, the mini-
mum effective dose being 10 nM (Fig. 3D) whereas
gfPRPsalmon-
like, at concentrations up to 1 �M was totally inactive (Fig.
3E).
Tissue Distribution of GHRH and GHRH-R in Goldfish and X.
laevis. The
physiological relevance of GHRH and its cognate receptor in
goldfish was tested by measuring their transcript levels using
real-time PCR (Fig. 4). GHRH mRNA was detected mostly in
the
brain, whereas GHRH-R mRNA was found in both brain and
pituitary. The expression patterns of these genes in the brain-
pituitary axis further support the functional role of the
previously
uncharacterized GHRH acting to stimulate GH release from the
anterior pituitary.
Discussion
The Newly Identified GHRHs, but Not GHRH-Like Peptides,
Are Mam-
malian GHRH Homologues. The primordial gene for the
PACAP/
VIP/glucagon gene family arose �650 million years ago, and
through exon duplication and insertion, gene duplication, point
mutation, and exon loss, the family of peptides has developed
into
the forms that are now recognized. Based on sequence
comparison,
phylogenetic study and chromosomal locations of GHRH and
GHRH-R genes in vertebrates, we here show that the previously
named GHRH-like peptides are homologues of mammalian
PRPs,
and hence should be renamed as PRP from now on. Moreover,
the
tissue distribution of these PRPs in the fish brain is different
when
compared with mammalian GHRHs (19). Functionally, PRPs
were
unable to stimulate zfGHRH-R, as well as incapable of
activating
GH release from fish pituitary as shown in this and previous
studies
(20 –22).
More importantly, the present report demonstrates the presence
of authentic GHRH peptides in nonmammalian vertebrates. In
addition to goldfish and zebrafish, several ray-finned fish
GHRHs
were identified in medaka, strickland, Fugu, Tetraodon, f
lounder,
Fig. 2. Chromosomal locations of GHRH and GHRH-R in
various vertebrate
species. Genes adjacent to GHRH and GHRH-R in different
genomes are
shown. The genes are named according to their annotation in the
human
genome. GHRH and GHRH-R genes are boxed.
Lee et al. PNAS � February 13, 2007 � vol. 104 � no. 7 �
2135
B
IO
C
H
EM
IS
TR
Y
http://www.pnas.org/cgi/content/full/0611008104/DC1
http://www.pnas.org/cgi/content/full/0611008104/DC1
and rainbow trout (see SI Fig. 16) by in silico analysis. The
mature
GHRH sequences are almost identical in these eight fish
species,
with only one substitution in rainbow trout and medaka GHRHs
at
position 25 (S) and 26 (I), respectively. Moreover, the peptide
cleavage sites (R and GKR) are also conserved. This
information
clearly suggests an important physiological function of GHRH
in
such diversified fish species and shows that the discovered
GHRHs
are homologues of mammalian GHRHs. In parallel with the
evolution of GHRHs, previously uncharacterized GHRH-Rs that
are homologous to their mammalian counterparts in structure,
function, chromosomal localization, and tissue distribution are
also
found in fish. We have therefore shown that, unlike what was
hypothesized in previous reports (15, 16) both GHRH and its
receptor genes actually existed before the split for tetrapods and
ray-finned fish.
Another interesting observation is the existence of PRP-
specific receptors from fish (23–25) to chicken, suggesting that
PRP potentially is a physiological regulator in these species.
However, despite our efforts, we were unable to find the PRP-R
gene in any of the mammalian genome databases (in total, 12
databases were analyzed), including the almost completed hu-
man and mouse genomes, indicating that PRP-R (but not PRP)
was lost in the mammalian lineage. A possible explanation is
that
PRP-R was lost in a chromosomal translocation event. As shown
in Fig. 3B, in chicken, GHRH-R, PAC1-R, PRP-R, and
VPAC1-R genes are clustered in the same chromosomal region,
whereas, in mammals, VPAC1-R is translocated to a different
chromosome and the DNA sequences between VPAC1-R and
PAC1-R containing the PRP-R, somehow disappeared.
Implications of the Newly Discovered GHRHs on our
Understanding of
the Evolution of GHRH and PRP-PACAP Genes. The ‘‘one-to-
four’’ rule
is the most widely accepted and prevalent model to explain the
evolution of many vertebrate genes. This model is based on the
‘‘two
round genome duplication’’ hypothesis (26). According to this
2R
hypothesis, a first genome duplication occurred as early as in
deuterostomes and this was followed by a second round of
genome
duplication, that took place before the origin of gnathostomes,
resulting in having four copies of the genome. Based on this
hypothesis and our findings, we propose a scheme for the
evolution
of GHRH and PRP-PACAP genes in vertebrates (Fig. 5). In the
beginning, the ancestral chordates possessed one copy of the
PACAP-like gene which encoded a single peptide, and by exon
duplication, the PRP-PACAP ancestral gene was formed. As a
result of two rounds of genome duplication (1R and 2R), the
ancestral gene produced four paralogous genes, but only three
of
them, PRP-PACAP, PHI-VIP, and GHRH genes, persisted in
later
lineages. One of the duplicated copies could have been lost via
a
Fig. 3. Functional assays. (A) Intracellular cAMP accumulation
in CHO-zfGHRH-R cells stimulated with 100 nM peptides: fish
GHRH, Xenopus GHRH, goldfish
PRPsalmon-like, goldfish PRPcatfish-like, goldfish PHI,
goldfish glucagon, zebrafish GIP, human PACAP-27, and
human PACAP-38. (B and C) Effect of graded
concentrations (from 10�11 to 10�5 M) of various GHRHs and
gfPRPs on cAMP accumulation in CHO-zfGHRH-R cells (B)
and CHO-gfPRP-R cells (C). Values represent
means � SE from three independent experiments each in
duplicate. (D and E) Effect of graded concentrations (from
10�11 to 10�6 M) of fishGHRH (D) and
gfPRPsalmon-like (E) on GH release from cultured goldfish
pituitary cells. Cells were incubated for 4 h with fishGHRH (D;
n � 12) or gfPRPsalmon-like (E; n � 4).
Values represent means � SEM from at least two experiments
performed in duplicate.
Fig. 4. Relative abundance of GHRH (A) and GHRH-R (B)
transcripts in
different tissues in goldfish. A relative abundance of 1 was set
arbitrarily for
both genes in the brain. Data are from at least three experiments
performed
in duplicate. Values are expressed as means � SEM.
2136 � www.pnas.org�cgi�doi�10.1073�pnas.0611008104
Lee et al.
http://www.pnas.org/cgi/content/full/0611008104/DC1
global process called diploidization (27–29). Although there are
two
highly conserved PRP-PACAP genes in fish and tunicates, they
are
produced neither by 1R nor 2R, and this will be discussed later.
The earliest PRP-PACAP genes were identified in protochor-
dates (Chelyosoma productum) (15), in which two PRP-PACAP
genes with little differences in gene structure were found.
Proto-
chordates are invertebrates that split from vertebrates before the
two rounds of genome duplication. Therefore, if these two PRP-
PACAP genes existed before 1R and 2R, there should be eight
PRP-PACAP paralogues in vertebrates. Also, the tunicate PRP-
PACAP genes share very high levels of similarity among
themselves
in terms of gene structure and peptide sequence. Hence, these
duplicated copies of PRP-PACAP genes are the result of a more
recent gene duplication event that occurred after the initial
diver-
gence. Interestingly, in the phylogenetic tree, the tunicate PRP-
like
peptide belongs neither to GHRH nor to PRP groupings. The N
termini of these peptides are similar to PACAP, whereas the C
termini share higher similarity with PRP. Therefore, it is likely
that
these tunicate PRP-like peptides are evolved from the same an-
cestral sequences of PRP and/or PACAP.
Based on our model, the evolution of GHRH, PRP-PACAP, and
PHI-VIP genes could be explained by using the concept of
dupli-
cation– degeneration– complementation (30, 31). After the first
round of genome duplication (1R), the ancestral PRP-PACAP
gene duplicated into two and each of them independently
mutated.
One of the genes had mutations mainly in the PRP-like region,
thus
giving rise to the PRP-PACAP gene with the function of the
ancestral gene mostly preserved in the PACAP domain.
Together
with PRP’s changes, a duplicated PACAP receptor evolved to
interact specifically with PRP. Although we have no
information
regarding the function of PRP from fish to birds, the presence
of a
specific receptor that is expressed in a tissue-specific manner
strongly suggests that this peptide exerts some physiological
role. As
the primary function was secured by the conservation of
PACAP,
in the other copy, PRP is free to mutate to GHRH while PACAP
sequences were lost by exon deletion, producing the GHRH
gene
in vertebrates. After 2R, the duplicated gene copy for PRP-
PACAP
was free to mutate and hence to acquire new functions to
become
PHI-VIP, and consistently, PHI and VIP are similar to PRP and
PACAP, respectively. Until now, we have not been able to find
a
second GHRH homologue in vertebrates, and we thus propose
that
the duplicated GHRH gene may have been deleted or lost in the
process of genome rediploidization.
The final question is the origin of the two forms of PRP,
salmon-like and catfish-like, and correspondingly two PRP-
PACAP
genes, in fish. We named these PRPcatfish-like and salmon-like
as
there are clearly two branches in the phylogenetic tree in which
the
avian and amphibian PRPs are more similar to fish PRPsalmon-
like. Based on previous reports and sequence analysis, we found
that, in fact, all bony fish possess a PRPsalmon-like gene,
supporting
the idea that PRPsalmon-like is the ancestral form, whereas
PRP-
catfish-like existed before the tetrapod/fish split, a duplicated
copy
arising from a more recent duplication event that occurred after
the
appearance of the teleost lineage (32). This is confirmed by the
presence of duplicated PHI-VIP and glucagon genes in the Fugu
and zebrafish genomes, as well as two genes for PAC1-R,
VPAC1-R, and VPAC2-R in fish (33). There is further evidence
supporting the hypothesis of a third round of genome
duplication;
for example, zebrafish possesses seven HOX clusters on seven
different chromosomes, instead of four clusters found in
mammals
(34). Also, there are many duplicated segments in the zebrafish
genome where similar duplication are not found in the
mammalian
Fig. 5. A proposed evolutionary scheme of GHRH, PRP-
PACAP, and PHI-VIP genes with respect to several rounds of
genome duplication. Labeling of peptides
and gene organizations are shown in the key. The genome
duplication events are highlighted by light-blue boxes.
Unknown or unclear paths are marked with
question marks. Time for divergence in millions of years ago
was taken from ref. 41.
Lee et al. PNAS � February 13, 2007 � vol. 104 � no. 7 �
2137
B
IO
C
H
EM
IS
TR
Y
genome. This information suggests that some of the genome was
duplicated in an ancient teleost. Based on this, we hypothesize
that
catfish-like and salmon-like PRP encoding genes were produced
from the actinopterygian-specific genome duplication at �300 –
450
million years ago.
Materials and Methods
Animals and Peptides. Zebrafish and goldfish were purchased
from local fish markets, and X. laevis was bought from Xenopus
I. Peptides including X. laevis GHRH (xGHRH), zebrafish or
goldfish GHRH (fish GHRH), goldfish PRPsalmon-like (previ-
ously gfGHRHsalmon-like) (24), and goldfish PRPcatfish-like
(previously gfGHRHcatfish-like) (24) were synthesized by the
Rockefeller University.
Data Mining and Phylogenetic Analysis. By searching genome
data-
bases (Ensembl genome browse and NCBI database) of chicken
(Gallus gallus), Xenopus (X. tropicalis), zebrafish (Danio
rerio), and
Fugu (Fugu rubripes), putative GHRH and GHRH-R sequences
were found. These sequences were used for phylogenetic
analyses
by MEGA3.0 to produce neighbor-joining trees with Dayhoff
matrix (35). Predicted protein sequences were then aligned by
using
ClustalW. Sequence-specific or degenerate primers were
designed
accordingly for cDNA cloning by PCR amplifications.
Molecular Cloning of GHRHs (X. laevis, Zebrafish, and
Goldfish) and
GHRH-Rs (Zebrafish and Goldfish). Total RNAs from brain
(zebrafish
and goldfish) and pituitary (X. laevis) were extracted according
to
the manufacturer’s protocol (Invitrogen, Carlsbad, CA). RACE
(5�
and 3�) was performed by using the 5� and 3� RACE
amplification
kits (Invitrogen). Full-length cDNA clones encompassing the
5� to
3� untranslated regions were produced by PCR with specific
primers
and confirmed by DNA sequencing. Full-length GHRH-R
cDNAs
were subcloned into pcDNA3.1� (Invitrogen) for functional ex-
pression. Primers used in this study are listed in SI Fig. 17.
Tissue Distribution of GHRH and GHRH-R in Goldfish.
Goldfish
GHRH and GHRH-R transcript levels in various tissues were
measured by real-time RT-PCR. First-strand cDNAs from var-
ious tissues of sexually matured goldfish were prepared by
using
the oligo(dT) primer (Invitrogen). GHRH and GHRH-R levels
were determined by using the SYBR Green PCR Master Mix kit
(Applied Biosystems, Foster City, CA) together with specific
primers (sequences listed in SI Fig. 18). Fluorescence signals
were monitored in real-time by the iCycler iQ system (Bio-Rad,
Hercules, CA). The threshold cycle (Ct) is defined as the
fractional cycle number at which the f luorescence reaches
10-fold standard deviation of the baseline (from cycle 2 to 10).
The ratio change in target gene relative to the �-actin control
gene was determined by the 2� Ct method (36).
Functional Studies of Fish GHRH-R and PRP-R. For functional
studies,
zfGHRH-R cDNA in pcDNA3.1 (Invitrogen) was permanently
transfected into CHO cells (American Type Culture Collection,
Manassas, VA) by using the GeneJuice reagent (Novagen,
Darm-
stadt, Germany) and followed by G418 selection (500 �g/ml)
for 3
weeks. Several clones of receptor-transfected CHO cells were
produced by dilution into 96-well plates. After RT-PCR to
monitor
the expression levels of individual clones, the colony with the
highest expression was expanded and used for cAMP assays. A
gfPRP-R-transfected (previously gfGHRH-R) (25) permanent
CHO cell line was also used for comparison in functional
expression
studies.
To measure cAMP production upon ligand activation, 2 days
before stimulation, 2.5 � 105 CHO-zfGHRH-R or CHO-gfPRP-
R
cells were seeded onto six-well plates (Costar). The assay was
performed essentially as described (37) by using the Correlate-
EIA
immunoassay kits (Assay Design, Ann Arbor, MI).
Measurement of GH Release from Goldfish Pituitary Cells.
Goldfish in
late stages of sexual regression were used for the preparation of
pituitary cell cultures. Fish were killed by spinosectomy after
anesthesia in 0.05% tricane methanesulphonate (Syndel, Van-
couver, BC, Canada). Pituitaries were excised, diced into 0.6-
mm
fragments, and dispersed by using the trypsin/DNA II digestion
method (38) with minor modifications (39). Pituitary cells were
cultured in 24-well cluster plates at a density of 0.25 � 106
cells
per well. After overnight incubation, cells were stimulated for
4 h, and culture medium was harvested for GH measurement by
using a RIA previously validated for goldfish GH (40).
Data Analysis. Data from real-time PCR are shown as the means
�
SEM of duplicated assays in at least three independent experi-
ments. All data were analyzed by one-way ANOVA followed by
a
Dunnett’s test in PRISM (Version 3.0; GraphPad, San Diego,
CA).
This work was supported by Hong Kong Government RGC
Grants
HKU7639/07M, HKU7566/06M, and CRCG10203410 (to
B.K.C.C.), and
by the Institut National de la Santé et de la Recherche Médicale
(H.V.).
1. Guillemin R, Brazeau P, Böhlen P, Esch F, Ling N,
Wehrenberg WB (1982) Science
218:585–587.
2. Thorner MO, Perryman RL, Cronin MJ, Rogol AD, Draznin
M, Johanson A, Vale W,
Horvath E, Kovacs K (1982) J Clin Invest 70:965–977.
3. Ling N, Esch F, Böhlen P, Brazeau P, Wehrenberg WB,
Guillemin R (1984) Proc Natl Acad
Sci USA 81:4302– 4306.
4. Sherwood NM, Krueckl SL, McRory JE (2000) Endocr Rev
21:619 – 670.
5. Bloch B, Brazeau P, Ling N, Bohlen P, Esch F, Wehrenberg
WB, Benoit R, Bloom F,
Guillemin R (1983) Nature 301:607– 608.
6. Billestrup N, Swanson LW, Vale W (1986) Proc Natl Acad
Sci USA 83:6854 – 6857.
7. Mayo KE, Hammer RE, Swanson LW, Brinster RL, Rosenfeld
MG, Evans RM (1988) Mol
Endocrinol 2:606 – 612.
8. Lin C, Lin SC, Chang CP, Rosenfeld MG (1992) Nature
360:765–768.
9. Ehlers CL, Reed TK, Henriksen SJ (1986)
Neuroendocrinology 42:467– 474.
10. Obal F, Jr, Alfoldi P, Cady AB, Johannsen L, Sary G,
Krueger JM (1988) Am J Physiol
255:R310 –R316.
11. Bueno L, Fioramonti J, Primi MP (1985) Peptides 6:403–
407.
12. Cai A, Hyde JF (1999) Endocrinology 140:3609 –3614.
13. Mayo KE, Cerelli GM, Lebo RV, Bruce BD, Rosenfeld MG,
Evans RM (1985) Proc Natl
Acad Sci USA 82:63– 67.
14. Hosoya M, Kimura C, Ogi K, Ohkubo S, Miyamoto Y,
Kugoh H, Shimizu M, Onda H,
Oshimura M, Arimura A, et al. (1992) Biochim Biophys Acta
1129:199 –206.
15. McRory J, Sherwood NM (1997) Endocrinology 138:2380 –
2390.
16. Montero M, Yon L, Kikuyama S, Dufour S, Vaudry H
(2000) J Mol Endocrinol 25:157–168.
17. DeAlmeida VI, Mayo KE (1998) Mol Endocrinol 12:750 –
765.
18. Laburthe M, Couvineau A, Gaudin P, Maoret JJ, Rouyer-
Fessard C, Nicole P (1996) Ann
NY Acad Sci 805:94 –109.
19. McRory JE, Parker DB, Ngamvongchon S, Sherwood NM
(1995) Mol Cell Endocrinol
108:169 –177.
20. Montero M, Yon L, Rousseau K, Arimura A, Fournier A,
Dufour S, Vaudry H (1998)
Endocrinology 139:4300 – 4310.
21. Parker DB, Power ME, Swanson P, Rivier J, Sherwood NM
(1997) Endocrinology 138:414–423.
22. Melamed P, Eliahu N, Levavi-Sivan B, Ofir M, Farchi-
Pisanty O, Rentier-Delrue F, Smal
J, Yaron Z, Naor Z (1995) Gen Comp Endocrinol 97:13–30.
23. Chan KW, Yu KL, Rivier J, Chow BK (1998)
Neuroendocrinology 68:44 –56.
24. Wong AO, Li WS, Lee EK, Leung MY, Tse LY, Chow BK,
Lin HR, Chang JP (2000)
Biochem Cell Biol 78:329 –343.
25. Kee F, Ng SS, Vaudry H, Pang RT, Lau EH, Chan SM,
Chow BK (2005) Gen Comp
Endocrinol 140:41–51.
26. Hughes AL (1999) J Mol Evol 48:565–576.
27. Wolfe KH (2001) Nat Rev Genet 2:333–341.
28. Jaillon O, Aury JM, Brunet F, Petit JL, Stange-Thomann N,
Mauceli E, Bouneau L, Fischer
C, Ozouf-Costaz C, Bernot A, et al. (2004) Nature 431:946 –
957.
29. Kellis M, Patterson N, Birren B, Berger B, Lander ES
(2004) J Comput Biol 11:319 –355.
30. Force A, Lynch M, Pickett FB, Amores A, Yan YL,
Postlethwait J (1999) Genetics 151:1531–1545.
31. Lynch M, Force A (2000) Genetics 154:459 – 473.
32. Van de Peer Y, Taylor JS, Meyer A (2003) J Struct Funct
Genomics 3:65–73.
33. Cardoso JC, Power DM, Elgar G, Clark MS (2004) J Mol
Endocrinol 33:411– 428.
34. Amores A, Force A, Yan YL, Joly L, Amemiya C, Fritz A,
Ho RK, Langeland J, Prince V,
Wang YL, et al. (1998) Science 282:1711–1714.
35. Kumar S, Tamura K, Nei M (2004) Brief Bioinform 5:150 –
163.
36. Livak KJ, Schmittgen TD (2001) Methods 25:402– 408.
37. Kwok YY, Chu JY, Vaudry H, Yon L, Anouar Y, Chow BK
(2006) Gen Comp Endocrinol
145:188 –196.
38. Chang JP, Cook H, Freedman GL, Wiggs AJ, Somoza GM,
de Leeuw R, Peter RE (1990)
Gen Comp Endocrinol 77:256 –273.
39. Lee EK, Chan VC, Chang JP, Yunker WK, Wong AO (2000)
J Neuroendocrinol 12:311–322.
40. Marchant TA, Fraser RA, Andrews PC, Peter RE (1987)
Regul Pept 17:41–52.
41. Kumar S, Hedges SB (1998) Nature 392:917–920.
2138 � www.pnas.org�cgi�doi�10.1073�pnas.0611008104
Lee et al.
http://www.pnas.org/cgi/content/full/0611008104/DC1
http://www.pnas.org/cgi/content/full/0611008104/DC1

More Related Content

Similar to Visel2009_.pdfARTICLESChIP-seq accurately predictsti.docx

A common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organsA common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organs
Kevin Jaglinski
 
Pells et al [2015] PLoS ONE 10[7] e0131102
Pells et al [2015] PLoS ONE 10[7] e0131102Pells et al [2015] PLoS ONE 10[7] e0131102
Pells et al [2015] PLoS ONE 10[7] e0131102
Steve Pells
 

Similar to Visel2009_.pdfARTICLESChIP-seq accurately predictsti.docx (20)

Mazalouskas_2015
Mazalouskas_2015Mazalouskas_2015
Mazalouskas_2015
 
Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...
 
A Comparative Encyclopedia Of DNA Elements In The Mouse Genome
A Comparative Encyclopedia Of DNA Elements In The Mouse GenomeA Comparative Encyclopedia Of DNA Elements In The Mouse Genome
A Comparative Encyclopedia Of DNA Elements In The Mouse Genome
 
In search of tissue specific regulators in periodontium - a bioinformatic ap...
In search of tissue specific regulators in periodontium  - a bioinformatic ap...In search of tissue specific regulators in periodontium  - a bioinformatic ap...
In search of tissue specific regulators in periodontium - a bioinformatic ap...
 
Analisis de la expresion de genes en la depresion
Analisis de la expresion de genes en la depresionAnalisis de la expresion de genes en la depresion
Analisis de la expresion de genes en la depresion
 
A common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organsA common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organs
 
Montgomery expression
Montgomery expressionMontgomery expression
Montgomery expression
 
Austin Neurology & Neurosciences
Austin Neurology & NeurosciencesAustin Neurology & Neurosciences
Austin Neurology & Neurosciences
 
A systematic, data driven approach to the combined analysis of microarray and...
A systematic, data driven approach to the combined analysis of microarray and...A systematic, data driven approach to the combined analysis of microarray and...
A systematic, data driven approach to the combined analysis of microarray and...
 
PIIS0016508514604509
PIIS0016508514604509PIIS0016508514604509
PIIS0016508514604509
 
1110.full
1110.full1110.full
1110.full
 
Grindberg - PNAS
Grindberg - PNASGrindberg - PNAS
Grindberg - PNAS
 
Identification of PFOA linked metabolic diseases by crossing databases
Identification of PFOA linked metabolic diseases by crossing databasesIdentification of PFOA linked metabolic diseases by crossing databases
Identification of PFOA linked metabolic diseases by crossing databases
 
Mouse genome
Mouse genomeMouse genome
Mouse genome
 
P1-01-17_poster
P1-01-17_posterP1-01-17_poster
P1-01-17_poster
 
EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13
 
Reference 1
Reference 1Reference 1
Reference 1
 
Grant Proposal 2006
Grant Proposal 2006Grant Proposal 2006
Grant Proposal 2006
 
D03011027030
D03011027030D03011027030
D03011027030
 
Pells et al [2015] PLoS ONE 10[7] e0131102
Pells et al [2015] PLoS ONE 10[7] e0131102Pells et al [2015] PLoS ONE 10[7] e0131102
Pells et al [2015] PLoS ONE 10[7] e0131102
 

More from dickonsondorris

Copyright © eContent Management Pty Ltd. Health Sociology Revi.docx
Copyright © eContent Management Pty Ltd. Health Sociology Revi.docxCopyright © eContent Management Pty Ltd. Health Sociology Revi.docx
Copyright © eContent Management Pty Ltd. Health Sociology Revi.docx
dickonsondorris
 
Copyright © Pearson Education 2010 Digital Tools in Toda.docx
Copyright © Pearson Education 2010 Digital Tools in Toda.docxCopyright © Pearson Education 2010 Digital Tools in Toda.docx
Copyright © Pearson Education 2010 Digital Tools in Toda.docx
dickonsondorris
 
Copyright © Jen-Wen Lin 2018 1 STA457 Time series .docx
Copyright © Jen-Wen Lin 2018   1 STA457 Time series .docxCopyright © Jen-Wen Lin 2018   1 STA457 Time series .docx
Copyright © Jen-Wen Lin 2018 1 STA457 Time series .docx
dickonsondorris
 
Copyright © John Wiley & Sons, Inc. All rights reserved..docx
Copyright © John Wiley & Sons, Inc. All rights reserved..docxCopyright © John Wiley & Sons, Inc. All rights reserved..docx
Copyright © John Wiley & Sons, Inc. All rights reserved..docx
dickonsondorris
 
Copyright © by The McGraw-Hill Companies, Inc. The Aztec Accou.docx
Copyright © by The McGraw-Hill Companies, Inc. The Aztec Accou.docxCopyright © by The McGraw-Hill Companies, Inc. The Aztec Accou.docx
Copyright © by The McGraw-Hill Companies, Inc. The Aztec Accou.docx
dickonsondorris
 
Copyright © Cengage Learning. All rights reserved. CHAPTE.docx
Copyright © Cengage Learning.  All rights reserved. CHAPTE.docxCopyright © Cengage Learning.  All rights reserved. CHAPTE.docx
Copyright © Cengage Learning. All rights reserved. CHAPTE.docx
dickonsondorris
 
Copyright © by Holt, Rinehart and Winston. All rights reserved.docx
Copyright © by Holt, Rinehart and Winston. All rights reserved.docxCopyright © by Holt, Rinehart and Winston. All rights reserved.docx
Copyright © by Holt, Rinehart and Winston. All rights reserved.docx
dickonsondorris
 
Copyright © 2020 by Jones & Bartlett Learning, LLC, an Ascend .docx
Copyright © 2020 by Jones & Bartlett Learning, LLC, an Ascend .docxCopyright © 2020 by Jones & Bartlett Learning, LLC, an Ascend .docx
Copyright © 2020 by Jones & Bartlett Learning, LLC, an Ascend .docx
dickonsondorris
 
Copyright © 2019, American Institute of Certified Public Accou.docx
Copyright © 2019, American Institute of Certified Public Accou.docxCopyright © 2019, American Institute of Certified Public Accou.docx
Copyright © 2019, American Institute of Certified Public Accou.docx
dickonsondorris
 
Copyright © 2018 Pearson Education, Inc. All Rights ReservedChild .docx
Copyright © 2018 Pearson Education, Inc. All Rights ReservedChild .docxCopyright © 2018 Pearson Education, Inc. All Rights ReservedChild .docx
Copyright © 2018 Pearson Education, Inc. All Rights ReservedChild .docx
dickonsondorris
 
Copyright © 2018 Pearson Education, Inc. C H A P T E R 6.docx
Copyright © 2018 Pearson Education, Inc. C H A P T E R  6.docxCopyright © 2018 Pearson Education, Inc. C H A P T E R  6.docx
Copyright © 2018 Pearson Education, Inc. C H A P T E R 6.docx
dickonsondorris
 
Copyright © 2018 Capella University. Copy and distribution o.docx
Copyright © 2018 Capella University. Copy and distribution o.docxCopyright © 2018 Capella University. Copy and distribution o.docx
Copyright © 2018 Capella University. Copy and distribution o.docx
dickonsondorris
 
Copyright © 2018 Pearson Education, Inc.C H A P T E R 3.docx
Copyright © 2018 Pearson Education, Inc.C H A P T E R  3.docxCopyright © 2018 Pearson Education, Inc.C H A P T E R  3.docx
Copyright © 2018 Pearson Education, Inc.C H A P T E R 3.docx
dickonsondorris
 
Copyright © 2018 by Steven Levitsky and Daniel.docx
Copyright © 2018 by Steven Levitsky and Daniel.docxCopyright © 2018 by Steven Levitsky and Daniel.docx
Copyright © 2018 by Steven Levitsky and Daniel.docx
dickonsondorris
 
Copyright © 2017, 2014, 2011 Pearson Education, Inc. All Right.docx
Copyright © 2017, 2014, 2011 Pearson Education, Inc. All Right.docxCopyright © 2017, 2014, 2011 Pearson Education, Inc. All Right.docx
Copyright © 2017, 2014, 2011 Pearson Education, Inc. All Right.docx
dickonsondorris
 
Copyright © 2017 Wolters Kluwer Health Lippincott Williams.docx
Copyright © 2017 Wolters Kluwer Health  Lippincott Williams.docxCopyright © 2017 Wolters Kluwer Health  Lippincott Williams.docx
Copyright © 2017 Wolters Kluwer Health Lippincott Williams.docx
dickonsondorris
 
Copyright © 2016, 2013, 2010 Pearson Education, Inc. All Right.docx
Copyright © 2016, 2013, 2010 Pearson Education, Inc. All Right.docxCopyright © 2016, 2013, 2010 Pearson Education, Inc. All Right.docx
Copyright © 2016, 2013, 2010 Pearson Education, Inc. All Right.docx
dickonsondorris
 
Copyright © 2017 by University of Phoenix. All rights rese.docx
Copyright © 2017 by University of Phoenix. All rights rese.docxCopyright © 2017 by University of Phoenix. All rights rese.docx
Copyright © 2017 by University of Phoenix. All rights rese.docx
dickonsondorris
 
Copyright © 2016 John Wiley & Sons, Inc.Copyright © 20.docx
Copyright © 2016 John Wiley & Sons, Inc.Copyright © 20.docxCopyright © 2016 John Wiley & Sons, Inc.Copyright © 20.docx
Copyright © 2016 John Wiley & Sons, Inc.Copyright © 20.docx
dickonsondorris
 
Copyright © 2016 Pearson Education, Inc. .docx
Copyright © 2016 Pearson Education, Inc.                    .docxCopyright © 2016 Pearson Education, Inc.                    .docx
Copyright © 2016 Pearson Education, Inc. .docx
dickonsondorris
 

More from dickonsondorris (20)

Copyright © eContent Management Pty Ltd. Health Sociology Revi.docx
Copyright © eContent Management Pty Ltd. Health Sociology Revi.docxCopyright © eContent Management Pty Ltd. Health Sociology Revi.docx
Copyright © eContent Management Pty Ltd. Health Sociology Revi.docx
 
Copyright © Pearson Education 2010 Digital Tools in Toda.docx
Copyright © Pearson Education 2010 Digital Tools in Toda.docxCopyright © Pearson Education 2010 Digital Tools in Toda.docx
Copyright © Pearson Education 2010 Digital Tools in Toda.docx
 
Copyright © Jen-Wen Lin 2018 1 STA457 Time series .docx
Copyright © Jen-Wen Lin 2018   1 STA457 Time series .docxCopyright © Jen-Wen Lin 2018   1 STA457 Time series .docx
Copyright © Jen-Wen Lin 2018 1 STA457 Time series .docx
 
Copyright © John Wiley & Sons, Inc. All rights reserved..docx
Copyright © John Wiley & Sons, Inc. All rights reserved..docxCopyright © John Wiley & Sons, Inc. All rights reserved..docx
Copyright © John Wiley & Sons, Inc. All rights reserved..docx
 
Copyright © by The McGraw-Hill Companies, Inc. The Aztec Accou.docx
Copyright © by The McGraw-Hill Companies, Inc. The Aztec Accou.docxCopyright © by The McGraw-Hill Companies, Inc. The Aztec Accou.docx
Copyright © by The McGraw-Hill Companies, Inc. The Aztec Accou.docx
 
Copyright © Cengage Learning. All rights reserved. CHAPTE.docx
Copyright © Cengage Learning.  All rights reserved. CHAPTE.docxCopyright © Cengage Learning.  All rights reserved. CHAPTE.docx
Copyright © Cengage Learning. All rights reserved. CHAPTE.docx
 
Copyright © by Holt, Rinehart and Winston. All rights reserved.docx
Copyright © by Holt, Rinehart and Winston. All rights reserved.docxCopyright © by Holt, Rinehart and Winston. All rights reserved.docx
Copyright © by Holt, Rinehart and Winston. All rights reserved.docx
 
Copyright © 2020 by Jones & Bartlett Learning, LLC, an Ascend .docx
Copyright © 2020 by Jones & Bartlett Learning, LLC, an Ascend .docxCopyright © 2020 by Jones & Bartlett Learning, LLC, an Ascend .docx
Copyright © 2020 by Jones & Bartlett Learning, LLC, an Ascend .docx
 
Copyright © 2019, American Institute of Certified Public Accou.docx
Copyright © 2019, American Institute of Certified Public Accou.docxCopyright © 2019, American Institute of Certified Public Accou.docx
Copyright © 2019, American Institute of Certified Public Accou.docx
 
Copyright © 2018 Pearson Education, Inc. All Rights ReservedChild .docx
Copyright © 2018 Pearson Education, Inc. All Rights ReservedChild .docxCopyright © 2018 Pearson Education, Inc. All Rights ReservedChild .docx
Copyright © 2018 Pearson Education, Inc. All Rights ReservedChild .docx
 
Copyright © 2018 Pearson Education, Inc. C H A P T E R 6.docx
Copyright © 2018 Pearson Education, Inc. C H A P T E R  6.docxCopyright © 2018 Pearson Education, Inc. C H A P T E R  6.docx
Copyright © 2018 Pearson Education, Inc. C H A P T E R 6.docx
 
Copyright © 2018 Capella University. Copy and distribution o.docx
Copyright © 2018 Capella University. Copy and distribution o.docxCopyright © 2018 Capella University. Copy and distribution o.docx
Copyright © 2018 Capella University. Copy and distribution o.docx
 
Copyright © 2018 Pearson Education, Inc.C H A P T E R 3.docx
Copyright © 2018 Pearson Education, Inc.C H A P T E R  3.docxCopyright © 2018 Pearson Education, Inc.C H A P T E R  3.docx
Copyright © 2018 Pearson Education, Inc.C H A P T E R 3.docx
 
Copyright © 2018 by Steven Levitsky and Daniel.docx
Copyright © 2018 by Steven Levitsky and Daniel.docxCopyright © 2018 by Steven Levitsky and Daniel.docx
Copyright © 2018 by Steven Levitsky and Daniel.docx
 
Copyright © 2017, 2014, 2011 Pearson Education, Inc. All Right.docx
Copyright © 2017, 2014, 2011 Pearson Education, Inc. All Right.docxCopyright © 2017, 2014, 2011 Pearson Education, Inc. All Right.docx
Copyright © 2017, 2014, 2011 Pearson Education, Inc. All Right.docx
 
Copyright © 2017 Wolters Kluwer Health Lippincott Williams.docx
Copyright © 2017 Wolters Kluwer Health  Lippincott Williams.docxCopyright © 2017 Wolters Kluwer Health  Lippincott Williams.docx
Copyright © 2017 Wolters Kluwer Health Lippincott Williams.docx
 
Copyright © 2016, 2013, 2010 Pearson Education, Inc. All Right.docx
Copyright © 2016, 2013, 2010 Pearson Education, Inc. All Right.docxCopyright © 2016, 2013, 2010 Pearson Education, Inc. All Right.docx
Copyright © 2016, 2013, 2010 Pearson Education, Inc. All Right.docx
 
Copyright © 2017 by University of Phoenix. All rights rese.docx
Copyright © 2017 by University of Phoenix. All rights rese.docxCopyright © 2017 by University of Phoenix. All rights rese.docx
Copyright © 2017 by University of Phoenix. All rights rese.docx
 
Copyright © 2016 John Wiley & Sons, Inc.Copyright © 20.docx
Copyright © 2016 John Wiley & Sons, Inc.Copyright © 20.docxCopyright © 2016 John Wiley & Sons, Inc.Copyright © 20.docx
Copyright © 2016 John Wiley & Sons, Inc.Copyright © 20.docx
 
Copyright © 2016 Pearson Education, Inc. .docx
Copyright © 2016 Pearson Education, Inc.                    .docxCopyright © 2016 Pearson Education, Inc.                    .docx
Copyright © 2016 Pearson Education, Inc. .docx
 

Recently uploaded

會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
中 央社
 
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MysoreMuleSoftMeetup
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
EADTU
 
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
EADTU
 

Recently uploaded (20)

8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management
 
Including Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdfIncluding Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdf
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
 
Observing-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxObserving-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptx
 
How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17
 
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
Supporting Newcomer Multilingual Learners
Supporting Newcomer  Multilingual LearnersSupporting Newcomer  Multilingual Learners
Supporting Newcomer Multilingual Learners
 
How to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptxHow to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptx
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
 
An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in Hinduism
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
 

Visel2009_.pdfARTICLESChIP-seq accurately predictsti.docx

  • 1. Visel2009_.pdf ARTICLES ChIP-seq accurately predicts tissue-specific activity of enhancers Axel Visel 1 *, Matthew J. Blow 1,2 *, Zirong Li 3 , Tao Zhang 2 , Jennifer A. Akiyama 1 , Amy Holt 1 , Ingrid Plajzer-Frick 1 , Malak Shoukry 1 , Crystal Wright
  • 2. 2 , Feng Chen 2 , Veena Afzal 1 , Bing Ren 3 , Edward M. Rubin 1,2 & Len A. Pennacchio 1,2 A major yet unresolved quest in decoding the human genome is the identification of the regulatory sequences that control the spatial and temporal expression of genes. Distant-acting transcriptional enhancers are particularly challenging to uncover because they are scattered among the vast non-coding portion of the genome. Evolutionary sequence constraint can facilitate the discovery of enhancers, but fails to predict when and where they are active in vivo. Here we present the results of chromatin immunoprecipitation with the enhancer-associated protein p300 followed by massively parallel sequencing, and map several thousand in vivo binding sites of p300 in mouse embryonic forebrain, midbrain and limb tissue. We tested 86 of these sequences in a transgenic mouse assay, which in nearly all cases demonstrated reproducible enhancer activity in the tissues that were predicted by p300 binding. Our results indicate that in vivo mapping of p300 binding is a highly accurate means for identifying enhancers and their associated activities, and suggest that such data sets will be useful to
  • 3. study the role of tissue-specific enhancers in human biology and disease on a genome-wide scale. The initial sequencing of the human genome1,2, complemented by effective computational and experimental strategies for mammalian gene discovery3,4, has resulted in a virtually complete list of protein- coding sequences. In contrast, the genomic location and function of regulatory elements that orchestrate gene expression in the develop- ing and adult body remain more obscure, hindering studies of their contribution to developmental processes and human disease. Evolutionary constraint of non-coding sequences can predict the location of enhancers in the genome5–12, but does not reveal when and where these enhancers are active in vivo. Furthermore, it has been suggested that a substantial proportion of regulatory elements is not sufficiently conserved to be detectable by comparative genomic methods13–16. Chromatin immunoprecipitation coupled to massively parallel sequencing (ChIP-seq) has been shown to enable genome-wide map- ping of protein binding and epigenetic marks17–22. The ChIP- seq approach is dependent on the cross-linking of proteins to specific DNA elements, followed by antibody enrichment of the protein– DNA complexes, and high-throughput sequencing of the recovered
  • 4. DNA fragments. In principle, ChIP-seq using an antibody specific for an enhancer-binding protein could provide a conservation- independent approach for the identification of candidate enhancer sequences. The acetyltransferase and transcriptional coactivator p300 is a near-ubiquitously expressed component of enhancer-associated protein assemblies and is critically required for embryonic develop- ment23–27. In homogeneous cell preparations, p300 has been shown to be associated with enhancers28,29, but these in vitro studies provided access only to subsets of enhancers that are active in a given cell type under culture conditions, providing limited insight into their in vivo function. In the present study, we have determined the genome- wide occupancy of p300 in forebrain, midbrain and limb tissue isolated directly from developing mouse embryos. Using a transgenic mouse reporter assay, we show that p300 binding in these embryonic tissues predicts with high accuracy not only where enhancers are located in the genome, but also in what tissues they are active in vivo. Depending on the tissue type, the success rate of predicting forebrain, midbrain and limb enhancers was between 5- and 16-fold increased compared
  • 5. to previous studies in which such enhancers were discovered by com- parative genomics10,11. Genome-wide mapping of p300 in tissues To generate genome-wide maps of p300 binding in vivo, we micro- dissected forebrain, midbrain and limb tissue from more than 150 embryonic day 11.5 (E11.5) mouse embryos and performed ChIP directly from these tissue samples using a p300 antibody (Fig. 1). Immunoprecipitated DNA fragments were analysed using massively parallel sequencing and the resulting 36 base pair (bp) sequence reads were aligned to the reference mouse genome17,30. After appropriate quality filtering, between 2.4 and 3.6 million aligned reads obtained from each of the tissue samples were used to identify regions of the genome with significant enrichment in p300-associated DNA sequences, hereafter referred to as ‘peaks’ owing to their appearance in genome-wide density plots17 (Supple- mentary Table 1). Using an estimated false-discovery rate (FDR) threshold of ,0.01, we identified 2,543, 561 and 2,105 peaks from forebrain, midbrain and limb respectively (Supplementary Tables 2–4). Most peaks were located at least 10 kb from transcript start sites
  • 6. (Supplementary Fig. 1). The smaller number of peaks from midbrain is probably due to variability in the efficiency of enrichment by ChIP (Supplementary Fig. 2). Re-sampling of subsets of data suggests that the major p300-binding sites from these three tissues have been dis- covered, whereas with increased sequencing coverage it is anticipated that further binding sites can be identified that are occupied only in smaller subsets of cells within each tissue (Supplementary Fig. 3). Although most genomic regions with in vivo p300 binding were identified by peaks in a single tissue, there were 386 regions at which peaks were observed in two tissues, and 21 regions at which p300 peaks were observed in all three tissues (Supplementary Fig. 4). *These authors contributed equally to this work. 1 Genomics Division, MS 84-171, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA. 2 US Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA. 3Ludwig Institute for Cancer Research, University of California San Diego (UCSD) School of Medicine, La Jolla, California 92093, USA. Vol 457 | 12 February 2009 | doi:10.1038/nature07730
  • 7. 854 Macmillan Publishers Limited. All rights reserved©2009 www.nature.com/nature www.nature.com/nature www.nature.com/doifinder/10.1038/nature07730 p300 predicts enhancer activity patterns To test directly whether p300 binding in developing mouse tissues is indicative of enhancer activity in vivo, we selected 86 regions with a p300 peak in at least one of the tissues for analysis in transgenic mice, comprising a total of 122 individual predictions of enhancer activity in specific tissues (Supplementary Table 5). These elements were selected blind to the identity of genes near which they are located, showed a wide range of evolutionary conservation with other vertebrate species (see Methods) and approximately reflect the genome-wide distri- bution properties of p300 peaks among intronic and intergenic regions, as well as their distances relative to known genes (Supple- mentary Fig. 1). We cloned the human genomic sequences orthologous to these enhancer candidate regions into an enhancer reporter vector and generated transgenic mice as previously described10,31. For each of
  • 8. the 86 candidate enhancers, several independent transgenic embryos (average of n 5 8) were assessed for reproducible reporter gene expression. A pattern was considered reproducible if the same anatom- ical structure was stained in three or more embryos. In almost all cases, this minimum threshold was exceeded and reproducible reporter staining in forebrain, midbrain or limb was present on average in more than 80% of the embryos obtained per construct (Supplementary Table 5). First we determined whether p300 binding was predictive of repro- ducible in vivo enhancer activity regardless of their tissue specificity. Considering peaks from each of the three p300 data sets separately, 55 out of 63 (87%) forebrain predictions, 30 out of 34 (88%) midbrain predictions and 22 out of 25 (88%) limb predictions were active enhan- cers in vivo at E11.5 as defined by reproducible LacZ staining (Fig. 2, grey and coloured bars). Overall, 87% (75 out of 86) of the tested elements were reproducible enhancers at E11.5. This compares with a success rate for predicting enhancers of 47% (246 out of 528) from our previous studies in which elements were identified on the basis of their extreme evolutionary conservation and tested using the same transgenic mouse assay10,11. Thus, the rate of false-positive
  • 9. predictions using p300 ChIP-seq was more than fourfold lower than that with extreme evolutionary conservation (13% compared to 53% previously; P 5 4.2 3 10 210 , Fisher’s exact test). We next determined the accuracy with which p300 binding predicts the tissue in which enhancer activity will occur. Of the 63 tested elem- ents that overlapped a forebrain p300 peak, 49 (78%) were found to have reproducible enhancer activity in the developing forebrain (Fig. 2, blue). Similarly, 28 out of 34 (82%) tested elements identified by midbrain p300 enrichment (Fig. 2, red), and 20 out of 25 (80%) tested elements identified by limb p300 enrichment (Fig. 2, green), were confirmed to be active in the predicted tissue. The 86 tested elements included 32 sequences that were identified by p300 binding in more than one tissue. Of these, 27 out of 32 (84%) were active in at least one of the predicted tissues, and 22 sequences (69%) perfectly recapitu- lated the predicted expression patterns (Supplementary Table 6).
  • 10. To assess the degree of enrichment of enhancer activities in predicted tissues, we compared the relative frequency of enhancers for each of the three tissues examined here with a background set of 528 previously tested sequences predicted to be developmental enhancers on the basis of extreme sequence constraint that were not associated with a prior tissue specificity prediction10,11. For example, whereas forebrain enhan- cers account for only 16% (86 out of 528) of the tested elements iden- tified through comparative approaches, 78% (49 out of 63) of elements predicted by forebrain p300 peaks were found to be active enhancers in the forebrain (Fig. 2). Forebrain predictions are therefore fivefold enriched in forebrain enhancers compared with enhancers identified through comparative approaches (P , 1 3 10 222 ). Similarly, we observed a sixfold enrichment of midbrain enhancers (P , 1 3 10 211 ) and a 16-fold enrichment of limb enhancers (P , 1 3 10 218 ) at mid-
  • 11. brain and limb p300 peaks, respectively. Representative examples of enhancers identified by ChIP-seq are shown in Fig. 3 and detailed anno- tations and reproducibility across transgenic mice for all elements tested in this study can be found at http://enhancer.lbl.gov (ref. 32). Taken together, these results indicate that p300 peaks are a highly accurate predictor of in vivo enhancers and their spatial activity patterns. Forebrain 3,629,292 reads 2,453 peaks Midbrain 3,530,316 reads 561 peaks Limb 2,419,480 reads 2,105 peaks mb fb li 1 mm Microdissection
  • 12. li Map reads to genome p300 anti-p300 ChIP: Massively parallel sequencing Identify peaks Figure 1 | Tissue dissection boundaries, overview of the ChIP- seq approach and summary of p300 results. Tissue dissection boundaries are indicated in a representative unstained E11.5 mouse embryo. For each sample, tissue was pooled from more than 150 embryos and ChIP-seq was performed with a p300- antibody. Reads obtained for each of the three tissues that unambiguously aligned to the reference mouse genome were used to define peaks (FDR , 0.01). A more comprehensive overview of sequencing and mapping results is provided in Supplementary Table 1. fb, forebrain; li, limb; mb, midbrain. Co ns
  • 17. m e n ts ) 0% 20% 40% 60% 80% 100% Any reproducible enhancer activity Pattern includes or restricted to: ** * * * * forebrain midbrain limb
  • 18. Figure 2 | p300 binding accurately predicts enhancers and their tissue- specific activity patterns. Bar height indicates the frequency of in vivo enhancers (reproducible at E11.5) that are active in any tissue (grey bars and coloured bars) and the fraction of enhancers in which the pattern includes or is restricted to reproducible forebrain (blue bars), midbrain (red bars) or limb activity (green bars). In each case, candidate elements predicted by p300 peaks in forebrain, midbrain or limb were compared to the frequency of the respective pattern in a background set of 528 previously tested sequences identified through extreme evolutionary conservation (combined data sets from refs 10 and 11). The component activities of elements predicted to be active in several tissues were counted separately. *P , 0.00005, Fisher’s exact test, one-tailed. NATURE | Vol 457 | 12 February 2009 ARTICLES 855 Macmillan Publishers Limited. All rights reserved©2009 http://enhancer.lbl.gov Most p300-bound regions are conserved Previous studies have indicated a positive correlation between enhancer activity during development and non-coding sequence
  • 19. conservation6,8–11,33, but it has also been suggested that not all regu- latory elements in vertebrate genomes are under detectable evolu- tionary constraint13–16. To test whether p300 binding in E11.5 tissues is generally associated with evolutionarily constrained non- coding sequences, we determined if ChIP-seq reads are overall enriched at previously identified extremely conserved non-coding sequences9,11. We observed strong enrichment of p300 ChIP-seq reads at these conserved sequences, but not at random sites or exons (Fig. 4 and Supplementary Table 7). Vice versa, between 86% and 91% of the p300 peaks overlap sequences that are under evolutionary constraint in vertebrates34, compared to less than 30% of size-matched random regions (P , 1 3 10 2172 , Fisher’s exact test; Supplementary Fig. 5). Using a more stringent constraint threshold score, we observed that between 10% and 21% of peaks are highly constrained, compared to 1% of random regions (P , 1 3 10 282 ). These results indicate that most p300 peaks in the investigated tissues are under
  • 20. evolutionary constraint and support a global enrichment of p300 in highly con- served non-coding regions of the genome previously correlated with developmental enhancers. Correlation with gene expression patterns To examine the correlation of p300-enriched regions in embryonic tissues with the transcriptional regulation of neighbouring genes, we compared the genomic distribution of p300 peaks in E11.5 forebrain with gene expression data from this tissue. Using high-density micro- arrays, we identified a set of 885 genes that are overexpressed in fore- brain at E11.5 compared to whole embryos (Supplementary Table 8). When we compared the genomic position of these forebrain genes to the genome-wide distribution of 2,453 forebrain-derived p300 peaks, we observed that the intervals 90 kb up- and downstream of their pro- moters are 2.4-fold enriched overall in p300-binding sites (P , 0.05, Fig. 5a). In total, 14% of all forebrain p300 peaks are located within 101 kb from a promoter of a forebrain-overexpressed gene. The most 20
  • 21. 2 20 2 20 2 Limb p300 Conservation Reproducibility In vivo LacZ pattern b a 5/5 4/4 11/11 8/8 5/6 9/11 Midbrain p300 Forebrain p300 ** * * * * * * 5 kb Enhancer 1334 1301 1383 901 1327 1278 Figure 3 | Examples of successful prediction of in vivo enhancers by p300
  • 22. binding in embryonic tissues. a, Coverage by extended p300 reads in forebrain (blue), midbrain (red) and limb (green). Asterisks indicate significant (FDR , 0.01) p300 enrichment in chromatin isolated from the respective tissue. Multispecies vertebrate conservation plots (black) were obtained from the UCSC genome browser50. Grey boxes correspond to candidate enhancer regions. Numbers at the right indicate overlapping extended reads. b, Representative LacZ-stained embryos with in vivo enhancer activity at E11.5. Reproducible staining in forebrain, midbrain and limb is indicated by arrows. Numbers show the reproducibility of LacZ reporter staining. Additional embryos obtained with each construct and genomic coordinates are available using the enhancer ID in the bottom portion of a at the Vista Enhancer Browser32. 50,000 random sites in genome 50,000 conserved non-coding elements 0 0
  • 24. ra g e Forebrain p300 Midbrain p300 Limb p300 30,000 internal exons 0 –2 0 kb +2 0 kb 1.2 1.0 1.4 1.6
  • 25. 1.8 * * * Median element or exon size (124 bp) Figure 4 | In all tissues examined, p300 is enriched at highly conserved non-coding regions. We used a genome-wide set of 50,000 extremely constrained non-coding sequences identified in human-mouse- rat genome alignments11 to assess the correlation between p300 enrichment and non- coding sequence conservation. Even though only subsets of the constrained non-coding elements are expected to be active enhancers in any given embryonic tissue, we observe strong enrichment in p300 binding in all three tissues compared to input DNA. *P , 1 3 10 2100 , Fisher’s exact test. Relative p300 coverage near random sites and internal exons is shown for comparison. Brown bars indicate the median sizes of conserved elements or exons (124 bp in both cases). For further details, see Supplementary Table 7.
  • 26. ARTICLES NATURE | Vol 457 | 12 February 2009 856 Macmillan Publishers Limited. All rights reserved©2009 pronounced enrichment (4.8-fold, P , 0.01) was observed within 10 kb up- and downstream of promoters of forebrain-specifically expressed genes. In contrast, forebrain peaks are not enriched near genes over- expressed in other parts of the body (Fig. 5b, Supplementary Table 9). Near genes that were overexpressed fivefold or more in the fore- brain, even higher enrichment of forebrain peaks was observed (11-fold enrichment within 10 kb from promoters, data not shown). We found similar enrichment of limb-derived p300 peaks near limb- overexpressed genes (Supplementary Fig. 6 and Supplementary Tables 10 and 11). These observations are consistent with the sequences bound by p300 in the forebrain or limb of day 11.5 embryos being enhancers that drive the expression of adjacent genes in these tissues at this time point. Discussion In the present study, we have determined the genome-wide
  • 27. distribution of the transcriptional coactivator protein p300 (ref. 23) using ChIP- seq17 directly from developing mouse tissues. Notably, enrichment of p300 in different mouse tissues correctly predicted the spatial enhancer activities of human non-coding sequences in 80% of cases tested in a transgenic mouse assay, whereas absence of p300 enrichment corre- lated in 93% of cases with absence of enhancer activity in the respective tissue (Supplementary Table 5). The few elements that did not drive reporter gene expression in the tissue predicted by p300 ChIP- seq may represent cases in which the function of regulatory elements has diverged between the mouse sequences identified by ChIP-seq and the human orthologous regions tested in the transgenic mouse assay. In support of this hypothesis, we observed several cases in which the non-coding p300-bound region from mouse, but not the orthologous human sequence, had reproducible enhancer activities as predicted by p300 ChIP-seq from mouse tissues (data not shown). Taken together, the present approach provides a markedly improved specificity for locating enhancers in the human genome compared to conservation- based methods10,11 and also predicts their in vivo activity
  • 28. patterns with higher accuracy than motif-based computational methods available at present (for example, refs 35, 36). Most p300-binding regions identified in developing mouse tissues are under detectable evolutionary constraint. They typically overlap conserved non-coding sequences whose length (median of 113 bp) far exceeds that of an individual transcription factor binding site, suggest- ing the presence of larger functional modules. In cell culture- based chromatin studies, a sizeable fraction of non-coding regions in the human genome was found to be functional yet not constrained13,14. This apparent discrepancy might be due to differences in evolutionary constraint between enhancers active in developing tissues compared to those in individual cell types, but highlights the intrinsic challenge of inferring in vivo functionality from studies in cell culture. A generalized picture of the epigenetic marks and proteins assoc- iated with different types of functional non-coding elements has started to emerge from genome-wide chromatin studies13,18,28,37–41. We can now begin to use these signatures to unravel gene regulation on a genomic scale in the context of living organisms. The highly
  • 29. specific approach for identification of developmental enhancers and their activity patterns presented here represents a step in this direction. Complementary in-vivo-derived genomic data sets may be produced in the future, covering additional embryonic stages, anatomical regions and subregions, and perhaps considering extra molecular markers28,42–45. Focused experiments informed by such insights will expedite studies of the genome-wide activity dynamics of enhancers in developmental, physiological and pathological processes. METHODS SUMMARY Embryonic forebrain, midbrain and limb tissue was isolated from mouse embryos at E11.5. Cross-linking, chromatin isolation, sonication and immuno- precipitation using an anti-p300 antibody were performed as previously described40,46. ChIP DNA was further sheared by sonication, end-repaired, ligated to sequencing adapters and amplified by emulsion PCR as previously described47. Gel-purified amplified ChIP DNA between 300 and 500 bp was sequenced on the Illumina Genome Analyzer II platform to
  • 30. generate 36-bp reads. Sequence reads were aligned to the mouse reference genome (mm9) using BLAT48. Uniquely aligned reads were extended to 300 bp in the 39 direction and used to determine the read coverage at individual nucleotides at 25-bp intervals throughout the mouse genome. p300-enriched regions (peaks) with an esti- mated FDR of # 0.01 were identified by comparison with a random distribution of the same number of reads. Candidate peaks mapping to repetitive regions were removed as probable artefacts. Candidate regions for transgenic testing were selected based on ChIP-seq results and cover a wide spectrum of conservation. Enhancer candidate regions were amplified by PCR from human genomic DNA and cloned into an Hsp68- promoter-LacZ reporter vector as previously described6,31. Transgenic mouse embryos were generated and evaluated for reproducible LacZ activity at E11.5
  • 31. as previously described6. Total RNA from E11.5 whole embryos and forebrain tissue was hybridized to GeneChip Mouse Genome 430 2.0 arrays (Affymetrix) and analysed according to the manufacturer’s recommendations. Forebrain- and whole- embryo-enriched genes were identified as having at least 2.5-fold greater expression in one data set compared with the other, and a minimum signal intensity of 100. Limb-enriched genes were identified by comparison with publicly available wild-type E11.5 proximal hindlimb gene expression data (Gene Expression Omnibus (GEO) series GSE10516, samples GSM264689, GSM264690 and GSM264691)49. Full Methods and any associated references are available in the online version of the paper at www.nature.com/nature. Received 16 October; accepted 18 December 2008. 1. Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
  • 32. 2. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001). Distance to nearest transcript start site (kb) ** ** ** ** ** ** ** ** Distance to nearest transcript start site (kb) 3.0 3.5 2.5 2.0 1.5 1.0 0.5 0.0 3.0 3.5
  • 33. 2.5 2.0 1.5 1.0 0.5 0.0 21–311–11 11–21 31–41 41–51 51–61 61–71 71–81 81–91 91– 101 21–311–11 11–21 31–41 41–51 51–61 61–71 71–81 81–91 91– 101 a b P ro p o rt io n o f
  • 35. r si te s (% ) 885 forebrain overexpressed genes 495 forebrain underexpressed genes Figure 5 | p300 peaks are enriched near genes that are expressed in the same tissue. We compared the genome-wide distribution of p300-enriched regions in forebrain tissue at E11.5 with microarray expression data for forebrain at the same stage. Eight-hundred-and-eighty-five genes were forebrain-specifically overexpressed, and 495 genes were underexpressed relative to whole embryo RNA at the selected thresholds. Promoters (defined as 1 kb upstream and downstream of transcription start sites) were excluded from the analysis. Blue bars denote comparison to 2,435 forebrain-derived peaks, grey bars denote comparison to 2,435 random sites. a, Ten-kilobase bins up to 91 kb away from forebrain-overexpressed genes were significantly enriched in forebrain p300 peaks. b, No peak enrichment was observed for
  • 36. forebrain-underexpressed genes. Error bars indicate the 90% confidence interval on the basis of 1,000 iterations of randomized distribution. *P , 0.05, **P , 0.01, both one-tailed. NATURE | Vol 457 | 12 February 2009 ARTICLES 857 Macmillan Publishers Limited. All rights reserved©2009 www.nature.com/nature 3. Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997). 4. Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420, 563–573 (2002). 5. Marshall, H. et al. A conserved retinoic acid response element required for early expression of the homeobox gene Hoxb-1. Nature 370, 567–571 (1994). 6. Nobrega, M. A., Ovcharenko, I., Afzal, V. & Rubin, E. M. Scanning human gene deserts for long-range enhancers. Science 302, 413 (2003). 7. de la Calle-Mustienes, E. et al. A functional survey of the enhancer activity of conserved non-coding sequences from vertebrate Iroquois cluster gene deserts.
  • 37. Genome Res. 15, 1061–1072 (2005). 8. Woolfe, A. et al. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 3, e7 (2005). 9. Prabhakar, S. et al. Close sequence comparisons are sufficient to identify human cis-regulatory elements. Genome Res. 16, 855–863 (2006). 10. Pennacchio, L. A. et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature 444, 499–502 (2006). 11. Visel, A. et al. Ultraconservation identifies a small subset of extremely constrained developmental enhancers. Nature Genet. 40, 158–160 (2008). 12. Holland, L. Z. et al. The amphioxus genome illuminates vertebrate origins and cephalochordate biology. Genome Res. 18, 1100–1111 (2008). 13. ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007). 14. Margulies, E. H. et al. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res. 17, 760–774 (2007). 15. Cooper, G. M. & Brown, C. D. Qualifying the relationship between sequence
  • 38. conservation and molecular function. Genome Res. 18, 201–205 (2008). 16. McGaughey, D. M. et al. Metrics of sequence constraint overlook regulatory sequences in an exhaustive analysis at phox2b. Genome Res. 18, 252–260 (2008). 17. Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature Methods 4, 651–657 (2007). 18. Mikkelsen, T. S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007). 19. Robertson, A. G. et al. Genome-wide relationship between histone H3 lysine 4 mono- and tri-methylation and transcription factor binding. Genome Res. 18, 1906–1917 (2008). 20. Cuddapah, S. et al. Global analysis of the insulator binding protein CTCF in chromatin barrier regions reveals demarcation of active and repressive domains. Genome Res. 19, 24–32 (2009). 21. Wederell, E. D. et al. Global analysis of in vivo Foxa2- binding sites in mouse adult liver using massively parallel sequencing. Nucleic Acids Res. 36, 4549–4564 (2008).
  • 39. 22. Valouev, A. et al. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nature Methods 5, 829–834 (2008). 23. Eckner, R. et al. Molecular cloning and functional analysis of the adenovirus E1A- associated 300-kD protein (p300) reveals a protein with properties of a transcriptional adaptor. Genes Dev. 8, 869–884 (1994). 24. Eckner, R., Yao, T. P., Oldread, E. & Livingston, D. M. Interaction and functional collaboration of p300/CBP and bHLH proteins in muscle and B- cell differentiation. Genes Dev. 10, 2478–2490 (1996). 25. Yao, T. P. et al. Gene dosage-dependent embryonic development and proliferation defects in mice lacking the transcriptional integrator p300. Cell 93, 361–372 (1998). 26. Merika, M., Williams, A. J., Chen, G., Collins, T. & Thanos, D. Recruitment of CBP/ p300 by the IFNb enhanceosome is required for synergistic activation of transcription. Mol. Cell 1, 277–287 (1998). 27. Maston, G. A., Evans, S. K. & Green, M. R. Transcriptional regulatory elements in the human genome. Annu. Rev. Genomics Hum. Genet. 7, 29–59 (2006). 28. Heintzman, N. D. et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome.
  • 40. Nature Genet. 39, 311–318 (2007). 29. Xi, H. et al. Identification and characterization of cell type- specific and ubiquitous chromatin regulatory structures in the human genome. PLoS Genet. 3, e136 (2007). 30. Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002). 31. Kothary, R. et al. A transgene containing lacZ inserted into the dystonia locus is expressed in neural tube. Nature 335, 435–437 (1988). 32. Visel, A., Minovitsky, S., Dubchak, I. & Pennacchio, L. A. VISTA Enhancer Browser—a database of tissue-specific human enhancers. Nucleic Acids Res. 35, D88–D92 (2007). 33. Cheng, Y. et al. Transcriptional enhancement by GATA1- occupied DNA segments is strongly associated with evolutionary constraint on the binding site motif. Genome Res. 18, 1896–1905 (2008). 34. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005). 35. Hallikas, O. et al. Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell 124, 47–
  • 41. 59 (2006). 36. Pennacchio, L. A., Loots, G. G., Nobrega, M. A. & Ovcharenko, I. Predicting tissue- specific enhancers in the human genome. Genome Res. 17, 201– 211 (2007). 37. Kim, T. H. et al. A high-resolution map of active promoters in the human genome. Nature 436, 876–880 (2005). 38. Boyle, A. P. et al. High-resolution mapping and characterization of open chromatin across the genome. Cell 132, 311–322 (2008). 39. Schones, D. E. & Zhao, K. Genome-wide approaches to studying chromatin modifications. Nature Rev. Genet. 9, 179–191 (2008). 40. Barrera, L. O. et al. Genome-wide mapping and analysis of active promoters in mouse embryonic stem cells and adult organs. Genome Res. 18, 46–59 (2008). 41. Chen, X. et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 1106–1117 (2008). 42. Kwok, R. P. et al. Nuclear protein CBP is a coactivator for the transcription factor CREB. Nature 370, 223–226 (1994). 43. Ogryzko, V. V., Schiltz, R. L., Russanova, V., Howard, B. H. & Nakatani, Y. The transcriptional coactivators p300 and CBP are histone
  • 42. acetyltransferases. Cell 87, 953–959 (1996). 44. Agalioti, T. et al. Ordered recruitment of chromatin modifying and general transcription factors to the IFN-b promoter. Cell 103, 667–678 (2000). 45. Ge, K. et al. Transcription coactivator TRAP220 is required for PPARc2- stimulated adipogenesis. Nature 417, 563–567 (2002). 46. Li, Z. et al. A global transcriptional regulatory role for c- Myc in Burkitt’s lymphoma cells. Proc. Natl Acad. Sci. USA 100, 8164–8169 (2003). 47. Blow, M. J. et al. Identification of the source of ancient remains through genomic sequencing. Genome Res. 18, 1347–1353 (2008). 48. Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002). 49. Krawchuk, D. & Kania, A. Identification of genes controlled by LMX1B in the developing mouse limb bud. Dev. Dyn. 237, 1183–1192 (2008). 50. Karolchik, D. et al. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. 36, D773–D779 (2008). Supplementary Information is linked to the online version of the paper at www.nature.com/nature.
  • 43. Acknowledgements We wish to thank R. Hosseini and S. Phouanenavong for technical support, and J. Rubenstein, J. Long, J. Choi and Y. Zhu for help with microarray experiments. This work was performed under the auspices of the US Department of Energy’s Office of Science, Biological and Environmental Research Program and by the University of California, Lawrence Berkeley National Laboratory under contract no. DE-AC02-05CH11231, Lawrence Livermore National Laboratory under contract no. DE-AC52-07NA27344, and Los Alamos National Laboratory under contract no. DE-AC02-06NA25396. L.A.P. and E.M.R. were supported by the Berkeley-PGA, under the Programs for Genomic Applications, funded by National Heart, Lung, & Blood Institute, and L.A.P. by the National Human Genome Research Institute. A.V. was supported by an American Heart Association postdoctoral fellowship. B.R. was supported by grants from the National Human Genome Research Institute and the Ludwig Institute for Cancer Research. Author Information Reprints and permissions information is available at www.nature.com/reprints. All ChIP-seq data sets described in this study have been deposited at the National Center for Biotechnology Information (NCBI) in the Gene Expression Omnibus (GEO) database under accession number GSE13845.
  • 44. Correspondence and requests for materials should be addressed to L.A.P. ([email protected]). ARTICLES NATURE | Vol 457 | 12 February 2009 858 Macmillan Publishers Limited. All rights reserved©2009 www.nature.com/nature www.nature.com/reprints mailto:[email protected] METHODS Tissue dissection and chromatin immunoprecipitation. Embryonic forebrain, midbrain and limb tissue was isolated from timed-pregnant CD- 1 strain mouse embryos at E11.5 by microdissection in cold PBS along the anatomical bound- aries indicated in Fig. 1. Tissue samples were cross-linked (1% formaldehyde, 10 mM NaCl, 100 mM EDTA, 50 mM EGTA, 5 mM HEPES, pH 8.0) for 15 min at room temperature. Cross-linking was terminated by the addition of 125 mM glycine and cells were dissociated in a glass douncer. Chromatin isolation, soni- cation and immunoprecipitation were performed as previously described40,46. In
  • 45. brief, 1 mg of sonicated chromatin (OD260) was incubated with 10 mg of anti- body (rabbit polyclonal anti-p300 (C-20), Santa Cruz Biotechnology) coupled to IgG magnetic beads (Dynal Biotech) overnight at 4 uC. The magnetic beads were washed eight times with RIPA buffer (50 mM HEPES, pH 8.0, 1 mM EDTA, 1% NP-40, 0.7% DOC and 0.5 M LiCl, supplemented with complete protease inhibitors from Roche Applied Science), and washed once with TE buffer (10 mM Tris, pH 8.0, 1 mM EDTA). After washing, bound DNA was eluted at 65 uC in elution buffer (10 mM Tris, pH 8.0, 1 mM EDTA and 1% SDS) for 10 min and incubated at 65 uC overnight to reverse cross-links. After the reversal of cross-linking, immunoprecipitated DNA was treated sequentially with protei- nase K and RNase A, and desalted using the QIAquick PCR purification kit (Qiagen). Amplification and Illumina sequencing of ChIP DNA. ChIP DNA was quan- tified by Qubit assay HS kit. Approximately 0.1 ng of each ChIP DNA sample was
  • 46. sheared using Sonicator XL2020 (Misonix) with a microplate horn for 10 min at 55% power output and 90% amplitude. Sheared ChIP DNA extract was end- repaired using the End-It DNA End-Repair Kit (Epicentre). Illumina adapters (56 bp and 34 bp) were ligated using T4 DNA ligase (5 U ml 21 , Fermentas) and recovered using a MinElute Reaction Cleanup Kit (Qiagen). Linker ligated ChIP DNA was amplified by emulsion PCR for 40 cycles as previously described47. Amplified ChIP DNA between 300 and 500 bp was gel purified on 2% agarose and sequenced on the Illumina Genome Analyzer II according to the manufac- turer’s instructions except that emulsion PCR-amplified DNA containing the GA2 sequencing adaptor was applied directly to the cluster station for bridge amplifica- tion. The resulting flow-cell was sequenced for 36 cycles to generate 36-bp reads.
  • 47. Processing of Illumina sequence data. Unfiltered 36-bp Illumina sequence reads were aligned to the mouse reference genome (NCBI build 37, mm9) using BLAT48 with optional parameters (minScore 5 20, minIdentity 5 80, stepSize 5 5). BLAT was performed in parallel on a sge-cluster. For each read, the two highest-scoring alignments were compared and reads were rejected as repetitive unless the score of the best alignment was at least two greater than that of the second best alignment. The remaining reads were further filtered to reject those with a BLAT alignment score ,21, with .1 bp insertion or deletion, or with .2 unaligned bases at the start of the read. Finally, reads with identical start sites in the mouse genome were considered likely to be duplicate sequences arising as an artefact of sample amp- lification or sequencing, and were counted only once. The remaining reads were classed as uniquely aligned to the mouse genome. Uniquely aligned reads were extended to 300 bp in the 39 direction to account
  • 48. for the average length of size-selected p300 ChIP fragments used for sequencing. These extended read coordinates were used to determine the read coverage at individual nucleotides at 25 bp intervals throughout the mouse genome. This data was used to produce coverage plots for visualization in the UCSC genome browser. To identify p300-enriched regions (peaks), we compared the observed fre- quency of coverage depths with those expected from a random distribution of the same number of reads generated computationally as described previously17. In brief, the probability of observing a peak with a coverage depth of at least H reads is given by a sum of Poisson probabilities as: 1{ PH {1 k~0 e{ll k
  • 49. k! in which l is the average genome-wide coverage of extended reads given by: (read length 3 number of aligned reads)/alignable genome length. To estimate the alignable genome length, one million randomly selected 36- base-polymers from the mouse genome were realigned to the mouse genome using the same align- ment and filtering scheme as for reads. A total of 77.3% of 36- base-polymers were uniquely mapped back to the mouse genome, resulting in an alignable genome length of 2.107 Gb. For each sample, we determined the read coverage depth at which the observed frequency of sites with that coverage exceeded the expected frequency by a factor of 100 (FDR # 0.01). Candidate peaks were identified as sites in which the coverage exceeded this threshold, and peak boundaries were extended to the nearest flanking positions at which read coverage fell below two reads. All consecutive regions of
  • 50. enrichment separated by regions of continuous coverage greater than two reads were merged into a single peak. Candidate peaks mapping to chr_random contigs, centromeric regions, telomeric regions, segmental duplications, satellite repeats, ribosomal RNA repeats or regions of .70% repeat sequence, and those coinciding with enriched regions in the control sample (input DNA) were removed as probable artefacts due to misalignment of heterochromatic sequences that are not at present represented in the mouse reference genome sequence. The remaining peaks represent high-confidence p300-enriched regions and putative enhancers with activity in specific tissues. Annotation of p300 ChIP-seq read data sets with respect to nearby genes (UCSC known genes50), internal exons (mouse RefSeq51 exons .30 kb from the ends of transcripts) and conserved non-coding sequences (top 50,000 con- strained non-coding human-mouse-rat conserved elements
  • 51. identified using GUMBY with R-ratio parameter R 5 50; refs 9, 11) was performed using Galaxy52 and custom Perl scripts. Annotation of p300-enriched regions with respect to UCSC known genes and vertebrate phastCons elements34 was per- formed using custom Perl scripts. Transgenic mouse enhancer assay. Candidate regions for transgenic testing were selected based on ChIP-seq results. Peaks for which human orthologous regions could not be unambiguously established and those without detectable conservation in opossum53 were excluded from transgenic testing. Thus, the tested peaks cover a wide spectrum of conservation, but are overall more con- strained than all peaks identified genome-wide (median score of 457 for all peaks versus 626 for tested peaks). Enhancer candidate regions (average size of 2.4 kb) were amplified by PCR from human genomic DNA (Clontech) and cloned into an Hsp68-promoter-LacZ reporter vector upstream of an Hsp68-
  • 52. promoter coupled to a LacZ reporter gene as previously described6,31. Candidate sequences were not cloned in any particular orientation, effectively resulting in randomized insert orientation among the test constructs. Genomic coordinates of amplified regions are reported in Supplementary Table 5. Transgenic mouse embryos were generated by pronuclear injection and F0 embryos were collected at E11.5 and stained for LacZ activity as previously described6. Only patterns that were observed in at least three different embryos resulting from independent trans- genic integration events of the same construct were considered reproducible (see Supplementary Table 5). To account for minor variation in separating forebrain from midbrain during tissue dissection, forebrain and midbrain p300 peaks were also considered correct predictions if the reproducible in vivo pattern was located in the forebrain/midbrain boundary region, whereas absence of a
  • 53. p300 peak was only considered a false-negative prediction if the reproducible in vivo pattern clearly extended beyond the boundary region. Microarrays. Tissue was isolated from timed-pregnant CD-1 strain mouse embryos at E11.5. Forebrains were further subdivided into basal telencephalon (subpallium), dorsal telencephalon (pallium), and diencephalon, which were processed separately in subsequent steps. For comparison, whole embryos (lit- termates) were collected. All samples were collected, processed and hybridized in duplicate. Total RNA was extracted using Trizol reagent (Invitrogen). Synthesis of complementary RNA, hybridization to GeneChip Mouse Genome 430 2.0 arrays (Affymetrix) and analysis of hybridization results was performed accord- ing to the manufacturer’s recommendations. For each sample, the average expression value from duplicates was used for downstream analyses. Forebrain-enriched genes were defined as those with expression
  • 54. at least 2.5-fold greater expression in at least one of the three forebrain regions compared with the whole embryo, and with a minimum signal intensity of 100. Whole embryo- enriched genes are defined as those with at least 2.5-fold greater expression in the whole embryo than in each of the three forebrain regions, and a minimum signal intensity of 100. Distances between p300 peaks and the 59 end of Affymetrix consensus complementary DNA sequences from mouse MOE430 (A and B) aligned to the mouse reference genome (mm9) were used to determine the closest forebrain-enriched and whole embryo-enriched genes (Supplementary Tables 8 and 9). The same procedure was used to analyse the correlation of limb p300 peaks with limb gene expression, except that limb expressed genes were identified by comparison of publicly available wild-type E11.5 proximal hind- limb gene expression data (GEO series GSE10516, samples GSM264689,
  • 55. GSM264690 and GSM264691)49, with the whole embryo gene expression data generated in the present study (Supplementary Tables 10 and 11). Animal work. All animal work was performed in accordance with protocols reviewed and approved by the Lawrence Berkeley National Laboratory Animal Welfare and Research Committee. 51. Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007). 52. Giardine, B. et al. Galaxy: a platform for interactive large- scale genome analysis. Genome Res. 15, 1451–1455 (2005). 53. Mikkelsen, T. S. et al. Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature 447, 167–177 (2007). doi:10.1038/nature07730 Macmillan Publishers Limited. All rights reserved©2009 www.nature.com/doifinder/10.1038/nature07730 www.nature.com/nature www.nature.com/natureTitleAuthorsAbstractGenome-wide
  • 56. mapping of p300 in tissuesp300 predicts enhancer activity patternsMost p300-bound regions are conservedCorrelation with gene expression patternsDiscussionMethods SummaryReferencesMethodsTissue dissection and chromatin immunoprecipitationAmplification and Illumina sequencing of ChIP DNAProcessing of Illumina sequence dataTransgenic mouse enhancer assayMicroarraysAnimal workMethods ReferencesFigure 1 Tissue dissection boundaries, overview of the ChIP-seq approach and summary of p300 results.Figure 2 p300 binding accurately predicts enhancers and their tissue- specific activity patterns.Figure 3 Examples of successful prediction of in vivo enhancers by p300 binding in embryonic tissues.Figure 4 In all tissues examined, p300 is enriched at highly conserved non-coding regions.Figure 5 p300 peaks are enriched near genes that are expressed in the same tissue. questions for assignmnet-2.docx questions for assignmnet-1.docx Lee2007_paper1.pdf Discovery of growth hormone-releasing hormones and receptors in nonmammalian vertebrates Leo T. O. Lee*, Francis K. Y. Siu*, Janice K. V. Tam*, Ivy T. Y. Lau*, Anderson O. L. Wong*, Marie C. M. Lin†, Hubert Vaudry‡, and Billy K. C. Chow*§ Departments of *Zoology and †Chemistry, University of Hong Kong, Pokfulam Road, Hong Kong, China; and ‡Institut National de la Santé et de la Recherche Médicale U-413, Laboratory of Cellular and Molecular Neuroendocrinology, European Institute for Peptide
  • 57. Research (Institut Fédératif de Recherches Multidisciplinaires sur les Peptides 23), University of Rouen, 76821 Mont-Saint-Aignan, France Communicated by Roger Guillemin, The Salk Institute for Biological Studies, La Jolla, CA, December 12, 2006 (received for review August 28, 2006) In mammals, growth hormone-releasing hormone (GHRH) is the most important neuroendocrine factor that stimulates the release of growth hormone (GH) from the anterior pituitary. In nonmam- malian vertebrates, however, the previously named GHRH-like peptides were unable to demonstrate robust GH-releasing activi- ties. In this article, we provide evidence that these GHRH-like peptides are homologues of mammalian PACAP-related peptides (PRP). Instead, GHRH peptides encoded in cDNAs isolated from goldfish, zebrafish, and African clawed frog were identified. More- over, receptors specific for these GHRHs were characterized from goldfish and zebrafish. These GHRHs and GHRH receptors (GHRH- Rs) are phylogenetically and structurally more similar to their mammalian counterparts than the previously named GHRH-like peptides and GHRH-like receptors. Information regarding their chromosomal locations and organization of neighboring genes confirmed that they share the same origins as the mammalian genes. Functionally, the goldfish GHRH dose-dependently acti- vates cAMP production in receptor-transfected CHO cells as well as GH release from goldfish pituitary cells. Tissue distribution studies showed that the goldfish GHRH is expressed almost exclusively in
  • 58. the brain, whereas the goldfish GHRH-R is actively expressed in brain and pituitary. Taken together, these results provide evidence for a previously uncharacterized GHRH-GHRH-R axis in nonmam- malian vertebrates. Based on these data, a comprehensive evolu- tionary scheme for GHRH, PRP-PACAP, and PHI-VIP genes in rela- tion to three rounds of genome duplication early on in vertebrate evolution is proposed. These GHRHs, also found in flounder, Fugu, medaka, stickleback, Tetraodon, and rainbow trout, provide re- search directions regarding the neuroendocrine control of growth in vertebrates. molecular evolution � genome duplication � pituitary adenylate cyclase-activating polypeptide Growth hormone-releasing hormone (GHRH), also knownas growth hormone-releasing factor, was initially isolated from pancreatic tumors causing acromegaly (1, 2), and hypo- thalamic human GHRH was shown to be identical to the one isolated from the pancreas tumor (3). Thereafter, the sequences of GHRHs were determined in various vertebrate species and in protochordates (4). In mammals, GHRH is mainly expressed and released from the arcuate nucleus of the hypothalamus (5). The primary function of GHRH is to stimulate GH synthesis and secretion from anterior pituitary somatotrophs via specific in- teraction with its receptor, GHRH-R (5). In addition, GHRH activates cell proliferation, cell differentiation, and growth of somatotrophs (6 – 8). There are many other reported activities of
  • 59. GHRH such as modulation of appetite and feeding behavior, regulation of sleeping (9, 10), control of jejunal motility (11), and increase of leptin levels in modest obesity (12). In mammals, GHRH and pituitar y adenylate cyclase- activating polypeptide (PACAP) are encoded by separate genes: GHRH is encoded with a C-peptide with no known function, whereas PACAP and PACAP-related peptide (PRP) are present in the same transcript (13, 14). In nonmammalian vertebrates and protochordates, GHRH and PACAP were believed to be encoded by the same gene and hence processed from the same transcript and prepropolypeptide (15). It was suggested that mammalian GHRH was evolved as a consequence of a gene duplication event of the PACAP gene which occurred just before the divergence that gave rise to the mammalian lineage (4, 16). Before this gene duplication event, PACAP was the true phys- iological regulator for GH release while its function was pro- gressively taken over by the GHRH-like peptides encoded with PACAP after the emergence of tetrapods (16). The GHRH-like peptide was later evolved to PRP in mammals, and the second copy of the ancestral GHRH-like/PACAP gene, after gene duplication, became the physiological GHRH in mammals (16). This hypothetical evolutionary scheme of GHRH and PACAP genes previously proposed could successfully accommodate and explain most of the information available. However, by data mining of the genomic sequences from several vertebrates, genomic sequences encoding for putative GHRHs and GHRH-Rs from amphibian (Xenopus laevis) and teleost (ze- brafish) were found. The corresponding proteins share a higher degree of sequence identity with mammalian GHRH and GHRH-R, when compared with the previously identified GHRH-like peptides and receptors. In this report, by analyzing the phylogeny, chromosomal location, and function of these
  • 60. ligands and receptors, a previously uncharacterized evolutionary scheme for GHRH, PRP-PACAP, and PHI-vasoactive intestinal polypeptide (VIP) genes in vertebrates is proposed. More im- portantly, the discovery of mammalian GHRH homologues by in silico analysis in other fish provides new research directions regarding the neuroendocrine control of growth in vertebrates. Results Predicted Amino Acid Sequences of Fish and Amphibian GHRHs. Recently, several nonmammalian vertebrate genome databases of avian [Gallus gallus (chicken)], amphibian [Xenopus tropicalis (Af- rican clawed frog)], and fish [Danio rerio (zebrafish), Takifugu rubripes (Fugu), and Tetraodon nigroviridis (pufferfish)] species were completely or partially released. Using these valuable re- Author contributions: L.T.O.L., H.V., and B.K.C.C. designed research; L.T.O.L., F.K.Y.S., J.K.V.T., I.T.Y.L., and A.O.L.W. performed research; M.C.M.L. contributed new reagents/ analytic tools; L.T.O.L., F.K.Y.S., J.K.V.T., I.T.Y.L., A.O.L.W., and B.K.C.C. analyzed data; and L.T.O.L., H.V., and B.K.C.C. wrote the paper. The authors declare no conflict of interest. Abbreviations: GH, growth hormone; GHRH, growth hormone- releasing hormone; GHRH-R, GHRH receptor; PACAP, pituitary adenylate cyclase- activating polypeptide; PAC1-R, PACAP receptor; PHI, peptide histidine isoleucine; PRP, PACAP-related peptide; PRP-R, PACAP-related peptide receptor; VIP, vasoactive
  • 61. intestinal polypeptide. Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. DQ991243–DQ991247 and ABJ55977– ABJ55981). §To whom correspondence should be addressed. E-mail: [email protected] This article contains supporting information online at www.pnas.org/cgi/content/full/ 0611008104/DC1. © 2007 by The National Academy of Sciences of the USA www.pnas.org�cgi�doi�10.1073�pnas.0611008104 PNAS � February 13, 2007 � vol. 104 � no. 7 � 2133–2138 B IO C H EM IS TR Y http://www.pnas.org/cgi/content/full/0611008104/DC1 http://www.pnas.org/cgi/content/full/0611008104/DC1 sources, we performed bioinformatics analyses to look for previ-
  • 62. ously uncharacterized GHRHs and GHRH-Rs in amphibian and fish. Putative genes for GHRH were predicted from X. tropicalis, goldfish (gfGHRH), and zebrafish (zfGHRH), and with the help of these sequences, we successfully cloned their full-length cDNAs. [supporting information (SI) Figs. 6 – 8]. By aligning all known vertebrate GHRH precursor sequences (SI Fig. 9), it was found that only the N-terminal region (1–27) of GHRHs is conserved. Human shares 81.5% and 74.1% sequence identity with goldfish/zebrafish and X. laevis GHRHs, respectively (SI Fig. 10). Within the first 7 aa, there is only one amino acid substitution in goat (position 1), human (position 1), mouse (position 1), and Xenopus (position 2). These observations clearly show that a strong selective pressure has acted to preserve the GHRH sequence in evolution, indicating that this peptide plays important functions from fish to mammals. It should be noted that within the first 27 aa, the previously named GHRH-like peptides share much lower level of sequence identity with these fish and mammalian GHRHs (51.9 –59.3% for gfPRPsalmon, and 37– 44.4% for gfPRPcatfish; SI Fig. 10). Instead, these GHRH-like peptides are structurally homologous to mammalian PRPs (Fig. 1A). Cloning of GHRH-Rs from Zebrafish and Goldfish. Putative GHRH-R cDNAs were cloned from zebrafish and goldfish (zfGHRH-R and gfGHRH-R) (see SI Figs. 11 and 12). The amino acid sequences of these GHRH-R receptors share 78.4% identity among
  • 63. themselves, and 51.1% and 49.7%, respectively, with the human GHRH-R (SI Fig. 13). Kyte-Doolittle hydrophobicity plots of these receptors indicated the presence of seven hydrophobic transmembrane re- gions that are conserved in all class IIB receptors (SI Fig. 14). In addition, the eight cysteine residues in the N termini of fish and human GHRH-Rs, which presumably structurally maintain the ligand binding pocket (17), are identical. These receptors also contain the basic motifs in the third endoloop, indicating that they can also couple to the cAMP pathway, similar to other members in the same PACAP/glucagon receptor family (18). A comparison of the fish GHRH-R with other class IIB receptors (SI Fig. 13) clearly shows that fish GHRH-Rs, sharing 64.8 – 78.4% sequence identity, are distinct from the previously identified GHRH-like receptors (PRP-Rs). The sequence identity between goldfish GHRH-R and PRP-R is 39.7%, which is not different (36.4 –38.3%) from other goldfish receptors in the same gene family, including PAC1-R, VPAC1-R, and PHI-R. The sequence identity shared by goldfish, avian, and mammalian GHRH-Rs is 48.9 –51.1%, which is significantly higher than any other receptors in the same class IIB family, whether in goldfish or in human. Phylogenetic Analysis of the Novel GHRH and GHRH-Rs. Based on the previously uncharacterized GHRH sequences, a revised phyloge- netic tree for GHRH, GHRH-like, and PRP peptides was con- structed by using PACAP as the out-group (Fig. 1 A). The
  • 64. identified GHRH peptides from fish and amphibian were grouped together with all other previously cloned or predicted GHRH sequences from mammals to fish. Interestingly, the previously named GHRH- like peptides were phylogenetically closer to mammalian PRPs and resembled to form a distinct branch together. Within the GHRH grouping, GHRHs clustered together according to their class with the exception of rodents. Within all PRPs, avian, amphibian, and fish PRPsalmon-like (formerly GHRHsalmon-like) clustered to form a sub-branch, whereas, the fish PRPcatfish-like peptides formed a different group. Apparently, there are two forms of PRPs in fish, but only the salmon-like PRPs are observed in avian and amphibian species. Catfish-like PRP, therefore, is uniquely present in fish and hence must have evolved by gene duplication after the split that gave rise to tetrapod and ray-finned fish. Similarly, the discovered gfGHRH-R and zfGHRH-R, and the predicted GHRH-R from chicken and Fugu, branched together with mammalian GHRH-Rs, whereas the fish PRP-Rs (formerly GHRH-like receptor) formed a separate branch (Fig. 1B). These data confirmed that, together with GHRHs, recep- tors that are structurally closely related to mammalian GHRH-Rs also exist in other classes. It is interestingly to note that PRP-R might have been lost in mammals (see below). As the function of PRP in nonmammalian vertebrates is not known, the implications of losing PRP-R in mammals is unclear.
  • 65. Chromosomal Synteny of GHRH and GHRH-R Genes. According to the latest genome assembly versions, the locations, gene structures, and organizations of neighboring genes of GHRHs and GHRH-Rs in zebrafish (zebrafish assembly Version 4, Zv4) and Xenopus (X. tropicalis 4.1) were determined (SI Fig. 15). All exon-intron splice junctions agree with the canonical GT/AG rule. Both fish and Fig. 1. Phylogenetic analysis of GHRH, PRP, and GHRH-like peptide (1–27), GHRH-Rs, and PRP-Rs. Predicted peptide and receptor sequences are marked with asterisks. (A) PACAP sequences were used as an out- group. (B) Receptors identified in this report are underlined. 2134 � www.pnas.org�cgi�doi�10.1073�pnas.0611008104 Lee et al. http://www.pnas.org/cgi/content/full/0611008104/DC1 http://www.pnas.org/cgi/content/full/0611008104/DC1 http://www.pnas.org/cgi/content/full/0611008104/DC1 http://www.pnas.org/cgi/content/full/0611008104/DC1 http://www.pnas.org/cgi/content/full/0611008104/DC1 http://www.pnas.org/cgi/content/full/0611008104/DC1 http://www.pnas.org/cgi/content/full/0611008104/DC1 http://www.pnas.org/cgi/content/full/0611008104/DC1 http://www.pnas.org/cgi/content/full/0611008104/DC1 http://www.pnas.org/cgi/content/full/0611008104/DC1 amphibian GHRH genes have four exons and the mature peptides are located in exon 3. In mammals, GHRH genes also have four
  • 66. exons, whereas the mature peptides are found in exon 2 instead of exon 3. Also, exon 2 and its encoded sequences in frog and goldfish are different and are unique in their own species (SI Fig. 9). This information indicated that, although there were frequent exon rearrangements, the mature peptide-encoding exon, in terms of sequence and structure, have been relatively well preserved during evolution. Finally, the GHRH genes are also structurally distinct from PRP-PACAP genes (SI Fig. 15), in which the three exons encoding cryptic peptide, PRP and PACAP are arranged in tan- dem. Thus, the organizations of GHRH and PRP-PACAP genes also indicate that they have different origins in evolution. Fig. 2 summarizes the genomic locations of GHRH and GHRH-R in various vertebrates. For the GHRH genes, the nearest neighboring genes of GHRH are RPN2 and MANBAL in human and chicken. In fact, the RPN2 genes are found also in similar locations in all species analyzed except zebrafish (Fig. 2 A). Other than RPN2, EPB41L1, C20orf4, DLGAP4, MYL9, and CTNNBL1 are also closely linked to the GHRH gene loci in human and frog. In Fugu and Xenopus, BPI and EPB41L1 genes are both linked to GHRH gene, whereas in Fugu and zebrafish, the CDK5RAP1 gene is found close to the GHRH gene locus. These kinds of similarities in chromosomal syntenic linkage were not observed when
  • 67. compar- ing the GHRH and PRP-PACAP genes in vertebrate genomes, again confirming the idea that two different genes encoding for GHRH and PACAP exist from fish to mammal. Regarding the receptor, the GHRH-R and PAC1-R gene loci are in close proximity in human, chicken, zebrafish, and Fugu (Fig. 2B), indicating that they are homologues in different species. Interest- ingly, we found that the PRP-R locus is also linked to PAC1-R gene in chicken. Despite our efforts, no PRP-R gene could be found in mammalian genomes (human, chimpanzee, mouse, rat, and rabbit), and hence it is likely that the PRP-R gene was lost after the divergence that gave rise to the mammalian lineage. Comparison of Fish GHRH and PRP in Activating GHRH-R and PRP-R. To confirm the functional identity of fish GHRH and its receptor, zfGHRH-R-transfected CHO cells were exposed to 100 nM of the various related peptides (Fig. 3A). Both fish and frog (1–27) GHRHs were able to stimulate cAMP accumulation above back- ground levels (no peptide or pcDNA3.1-transfected cells). Graded concentrations of fishGHRH and xGHRH induced dose- dependent stimulation of zfGHRH-R with EC50 values of 3.7 � 10�7 and 8.4 � 10�7 M, respectively, the fish peptide being 2.3 folds more efficacious than xGHRH (Fig. 3B). In contrast, gfPRP- salmon-like and gfPRPcatfish-like did not induce cAMP produc- tion in zfGHRH-R-transfected cells. Interestingly, in gfPRP-R- transfected cells, gfPRPsalmon-like, fishGHRH, and xGHRH
  • 68. could all stimulate intracellular cA MP production dose- dependently with EC50 values of gfPRPsalmon-like 5.8 � 10�9 M � fishGHRH 1.8 � 10�7M � xGHRH 4.8 � 10�7 M (Fig. 3C). To test whether fish GHRH and gfPRPsalmon-like can exert a regulatory role at the pituitary level to modulate GH secretion, goldfish pituitary cells were challenged with graded concentrations of goldfish GHRH and PRP. A 4-h incubation of cells with goldfish GHRH induced a dose-related increase in GH release, the mini- mum effective dose being 10 nM (Fig. 3D) whereas gfPRPsalmon- like, at concentrations up to 1 �M was totally inactive (Fig. 3E). Tissue Distribution of GHRH and GHRH-R in Goldfish and X. laevis. The physiological relevance of GHRH and its cognate receptor in goldfish was tested by measuring their transcript levels using real-time PCR (Fig. 4). GHRH mRNA was detected mostly in the brain, whereas GHRH-R mRNA was found in both brain and pituitary. The expression patterns of these genes in the brain- pituitary axis further support the functional role of the previously uncharacterized GHRH acting to stimulate GH release from the anterior pituitary. Discussion The Newly Identified GHRHs, but Not GHRH-Like Peptides, Are Mam- malian GHRH Homologues. The primordial gene for the PACAP/ VIP/glucagon gene family arose �650 million years ago, and
  • 69. through exon duplication and insertion, gene duplication, point mutation, and exon loss, the family of peptides has developed into the forms that are now recognized. Based on sequence comparison, phylogenetic study and chromosomal locations of GHRH and GHRH-R genes in vertebrates, we here show that the previously named GHRH-like peptides are homologues of mammalian PRPs, and hence should be renamed as PRP from now on. Moreover, the tissue distribution of these PRPs in the fish brain is different when compared with mammalian GHRHs (19). Functionally, PRPs were unable to stimulate zfGHRH-R, as well as incapable of activating GH release from fish pituitary as shown in this and previous studies (20 –22). More importantly, the present report demonstrates the presence of authentic GHRH peptides in nonmammalian vertebrates. In addition to goldfish and zebrafish, several ray-finned fish GHRHs were identified in medaka, strickland, Fugu, Tetraodon, f lounder, Fig. 2. Chromosomal locations of GHRH and GHRH-R in various vertebrate species. Genes adjacent to GHRH and GHRH-R in different genomes are shown. The genes are named according to their annotation in the human genome. GHRH and GHRH-R genes are boxed.
  • 70. Lee et al. PNAS � February 13, 2007 � vol. 104 � no. 7 � 2135 B IO C H EM IS TR Y http://www.pnas.org/cgi/content/full/0611008104/DC1 http://www.pnas.org/cgi/content/full/0611008104/DC1 and rainbow trout (see SI Fig. 16) by in silico analysis. The mature GHRH sequences are almost identical in these eight fish species, with only one substitution in rainbow trout and medaka GHRHs at position 25 (S) and 26 (I), respectively. Moreover, the peptide cleavage sites (R and GKR) are also conserved. This information clearly suggests an important physiological function of GHRH in such diversified fish species and shows that the discovered GHRHs are homologues of mammalian GHRHs. In parallel with the evolution of GHRHs, previously uncharacterized GHRH-Rs that are homologous to their mammalian counterparts in structure,
  • 71. function, chromosomal localization, and tissue distribution are also found in fish. We have therefore shown that, unlike what was hypothesized in previous reports (15, 16) both GHRH and its receptor genes actually existed before the split for tetrapods and ray-finned fish. Another interesting observation is the existence of PRP- specific receptors from fish (23–25) to chicken, suggesting that PRP potentially is a physiological regulator in these species. However, despite our efforts, we were unable to find the PRP-R gene in any of the mammalian genome databases (in total, 12 databases were analyzed), including the almost completed hu- man and mouse genomes, indicating that PRP-R (but not PRP) was lost in the mammalian lineage. A possible explanation is that PRP-R was lost in a chromosomal translocation event. As shown in Fig. 3B, in chicken, GHRH-R, PAC1-R, PRP-R, and VPAC1-R genes are clustered in the same chromosomal region, whereas, in mammals, VPAC1-R is translocated to a different chromosome and the DNA sequences between VPAC1-R and PAC1-R containing the PRP-R, somehow disappeared. Implications of the Newly Discovered GHRHs on our Understanding of the Evolution of GHRH and PRP-PACAP Genes. The ‘‘one-to- four’’ rule is the most widely accepted and prevalent model to explain the evolution of many vertebrate genes. This model is based on the ‘‘two round genome duplication’’ hypothesis (26). According to this 2R hypothesis, a first genome duplication occurred as early as in deuterostomes and this was followed by a second round of genome duplication, that took place before the origin of gnathostomes,
  • 72. resulting in having four copies of the genome. Based on this hypothesis and our findings, we propose a scheme for the evolution of GHRH and PRP-PACAP genes in vertebrates (Fig. 5). In the beginning, the ancestral chordates possessed one copy of the PACAP-like gene which encoded a single peptide, and by exon duplication, the PRP-PACAP ancestral gene was formed. As a result of two rounds of genome duplication (1R and 2R), the ancestral gene produced four paralogous genes, but only three of them, PRP-PACAP, PHI-VIP, and GHRH genes, persisted in later lineages. One of the duplicated copies could have been lost via a Fig. 3. Functional assays. (A) Intracellular cAMP accumulation in CHO-zfGHRH-R cells stimulated with 100 nM peptides: fish GHRH, Xenopus GHRH, goldfish PRPsalmon-like, goldfish PRPcatfish-like, goldfish PHI, goldfish glucagon, zebrafish GIP, human PACAP-27, and human PACAP-38. (B and C) Effect of graded concentrations (from 10�11 to 10�5 M) of various GHRHs and gfPRPs on cAMP accumulation in CHO-zfGHRH-R cells (B) and CHO-gfPRP-R cells (C). Values represent means � SE from three independent experiments each in duplicate. (D and E) Effect of graded concentrations (from 10�11 to 10�6 M) of fishGHRH (D) and gfPRPsalmon-like (E) on GH release from cultured goldfish pituitary cells. Cells were incubated for 4 h with fishGHRH (D; n � 12) or gfPRPsalmon-like (E; n � 4). Values represent means � SEM from at least two experiments performed in duplicate. Fig. 4. Relative abundance of GHRH (A) and GHRH-R (B) transcripts in different tissues in goldfish. A relative abundance of 1 was set
  • 73. arbitrarily for both genes in the brain. Data are from at least three experiments performed in duplicate. Values are expressed as means � SEM. 2136 � www.pnas.org�cgi�doi�10.1073�pnas.0611008104 Lee et al. http://www.pnas.org/cgi/content/full/0611008104/DC1 global process called diploidization (27–29). Although there are two highly conserved PRP-PACAP genes in fish and tunicates, they are produced neither by 1R nor 2R, and this will be discussed later. The earliest PRP-PACAP genes were identified in protochor- dates (Chelyosoma productum) (15), in which two PRP-PACAP genes with little differences in gene structure were found. Proto- chordates are invertebrates that split from vertebrates before the two rounds of genome duplication. Therefore, if these two PRP- PACAP genes existed before 1R and 2R, there should be eight PRP-PACAP paralogues in vertebrates. Also, the tunicate PRP- PACAP genes share very high levels of similarity among themselves in terms of gene structure and peptide sequence. Hence, these duplicated copies of PRP-PACAP genes are the result of a more recent gene duplication event that occurred after the initial diver- gence. Interestingly, in the phylogenetic tree, the tunicate PRP- like peptide belongs neither to GHRH nor to PRP groupings. The N termini of these peptides are similar to PACAP, whereas the C termini share higher similarity with PRP. Therefore, it is likely
  • 74. that these tunicate PRP-like peptides are evolved from the same an- cestral sequences of PRP and/or PACAP. Based on our model, the evolution of GHRH, PRP-PACAP, and PHI-VIP genes could be explained by using the concept of dupli- cation– degeneration– complementation (30, 31). After the first round of genome duplication (1R), the ancestral PRP-PACAP gene duplicated into two and each of them independently mutated. One of the genes had mutations mainly in the PRP-like region, thus giving rise to the PRP-PACAP gene with the function of the ancestral gene mostly preserved in the PACAP domain. Together with PRP’s changes, a duplicated PACAP receptor evolved to interact specifically with PRP. Although we have no information regarding the function of PRP from fish to birds, the presence of a specific receptor that is expressed in a tissue-specific manner strongly suggests that this peptide exerts some physiological role. As the primary function was secured by the conservation of PACAP, in the other copy, PRP is free to mutate to GHRH while PACAP sequences were lost by exon deletion, producing the GHRH gene in vertebrates. After 2R, the duplicated gene copy for PRP- PACAP was free to mutate and hence to acquire new functions to become PHI-VIP, and consistently, PHI and VIP are similar to PRP and PACAP, respectively. Until now, we have not been able to find
  • 75. a second GHRH homologue in vertebrates, and we thus propose that the duplicated GHRH gene may have been deleted or lost in the process of genome rediploidization. The final question is the origin of the two forms of PRP, salmon-like and catfish-like, and correspondingly two PRP- PACAP genes, in fish. We named these PRPcatfish-like and salmon-like as there are clearly two branches in the phylogenetic tree in which the avian and amphibian PRPs are more similar to fish PRPsalmon- like. Based on previous reports and sequence analysis, we found that, in fact, all bony fish possess a PRPsalmon-like gene, supporting the idea that PRPsalmon-like is the ancestral form, whereas PRP- catfish-like existed before the tetrapod/fish split, a duplicated copy arising from a more recent duplication event that occurred after the appearance of the teleost lineage (32). This is confirmed by the presence of duplicated PHI-VIP and glucagon genes in the Fugu and zebrafish genomes, as well as two genes for PAC1-R, VPAC1-R, and VPAC2-R in fish (33). There is further evidence supporting the hypothesis of a third round of genome duplication; for example, zebrafish possesses seven HOX clusters on seven different chromosomes, instead of four clusters found in mammals (34). Also, there are many duplicated segments in the zebrafish genome where similar duplication are not found in the mammalian
  • 76. Fig. 5. A proposed evolutionary scheme of GHRH, PRP- PACAP, and PHI-VIP genes with respect to several rounds of genome duplication. Labeling of peptides and gene organizations are shown in the key. The genome duplication events are highlighted by light-blue boxes. Unknown or unclear paths are marked with question marks. Time for divergence in millions of years ago was taken from ref. 41. Lee et al. PNAS � February 13, 2007 � vol. 104 � no. 7 � 2137 B IO C H EM IS TR Y genome. This information suggests that some of the genome was duplicated in an ancient teleost. Based on this, we hypothesize that catfish-like and salmon-like PRP encoding genes were produced from the actinopterygian-specific genome duplication at �300 – 450 million years ago. Materials and Methods Animals and Peptides. Zebrafish and goldfish were purchased
  • 77. from local fish markets, and X. laevis was bought from Xenopus I. Peptides including X. laevis GHRH (xGHRH), zebrafish or goldfish GHRH (fish GHRH), goldfish PRPsalmon-like (previ- ously gfGHRHsalmon-like) (24), and goldfish PRPcatfish-like (previously gfGHRHcatfish-like) (24) were synthesized by the Rockefeller University. Data Mining and Phylogenetic Analysis. By searching genome data- bases (Ensembl genome browse and NCBI database) of chicken (Gallus gallus), Xenopus (X. tropicalis), zebrafish (Danio rerio), and Fugu (Fugu rubripes), putative GHRH and GHRH-R sequences were found. These sequences were used for phylogenetic analyses by MEGA3.0 to produce neighbor-joining trees with Dayhoff matrix (35). Predicted protein sequences were then aligned by using ClustalW. Sequence-specific or degenerate primers were designed accordingly for cDNA cloning by PCR amplifications. Molecular Cloning of GHRHs (X. laevis, Zebrafish, and Goldfish) and GHRH-Rs (Zebrafish and Goldfish). Total RNAs from brain (zebrafish and goldfish) and pituitary (X. laevis) were extracted according to the manufacturer’s protocol (Invitrogen, Carlsbad, CA). RACE (5� and 3�) was performed by using the 5� and 3� RACE amplification kits (Invitrogen). Full-length cDNA clones encompassing the 5� to 3� untranslated regions were produced by PCR with specific primers
  • 78. and confirmed by DNA sequencing. Full-length GHRH-R cDNAs were subcloned into pcDNA3.1� (Invitrogen) for functional ex- pression. Primers used in this study are listed in SI Fig. 17. Tissue Distribution of GHRH and GHRH-R in Goldfish. Goldfish GHRH and GHRH-R transcript levels in various tissues were measured by real-time RT-PCR. First-strand cDNAs from var- ious tissues of sexually matured goldfish were prepared by using the oligo(dT) primer (Invitrogen). GHRH and GHRH-R levels were determined by using the SYBR Green PCR Master Mix kit (Applied Biosystems, Foster City, CA) together with specific primers (sequences listed in SI Fig. 18). Fluorescence signals were monitored in real-time by the iCycler iQ system (Bio-Rad, Hercules, CA). The threshold cycle (Ct) is defined as the fractional cycle number at which the f luorescence reaches 10-fold standard deviation of the baseline (from cycle 2 to 10). The ratio change in target gene relative to the �-actin control gene was determined by the 2� Ct method (36). Functional Studies of Fish GHRH-R and PRP-R. For functional studies, zfGHRH-R cDNA in pcDNA3.1 (Invitrogen) was permanently transfected into CHO cells (American Type Culture Collection, Manassas, VA) by using the GeneJuice reagent (Novagen, Darm- stadt, Germany) and followed by G418 selection (500 �g/ml) for 3 weeks. Several clones of receptor-transfected CHO cells were produced by dilution into 96-well plates. After RT-PCR to monitor the expression levels of individual clones, the colony with the highest expression was expanded and used for cAMP assays. A
  • 79. gfPRP-R-transfected (previously gfGHRH-R) (25) permanent CHO cell line was also used for comparison in functional expression studies. To measure cAMP production upon ligand activation, 2 days before stimulation, 2.5 � 105 CHO-zfGHRH-R or CHO-gfPRP- R cells were seeded onto six-well plates (Costar). The assay was performed essentially as described (37) by using the Correlate- EIA immunoassay kits (Assay Design, Ann Arbor, MI). Measurement of GH Release from Goldfish Pituitary Cells. Goldfish in late stages of sexual regression were used for the preparation of pituitary cell cultures. Fish were killed by spinosectomy after anesthesia in 0.05% tricane methanesulphonate (Syndel, Van- couver, BC, Canada). Pituitaries were excised, diced into 0.6- mm fragments, and dispersed by using the trypsin/DNA II digestion method (38) with minor modifications (39). Pituitary cells were cultured in 24-well cluster plates at a density of 0.25 � 106 cells per well. After overnight incubation, cells were stimulated for 4 h, and culture medium was harvested for GH measurement by using a RIA previously validated for goldfish GH (40). Data Analysis. Data from real-time PCR are shown as the means � SEM of duplicated assays in at least three independent experi- ments. All data were analyzed by one-way ANOVA followed by a Dunnett’s test in PRISM (Version 3.0; GraphPad, San Diego, CA).
  • 80. This work was supported by Hong Kong Government RGC Grants HKU7639/07M, HKU7566/06M, and CRCG10203410 (to B.K.C.C.), and by the Institut National de la Santé et de la Recherche Médicale (H.V.). 1. Guillemin R, Brazeau P, Böhlen P, Esch F, Ling N, Wehrenberg WB (1982) Science 218:585–587. 2. Thorner MO, Perryman RL, Cronin MJ, Rogol AD, Draznin M, Johanson A, Vale W, Horvath E, Kovacs K (1982) J Clin Invest 70:965–977. 3. Ling N, Esch F, Böhlen P, Brazeau P, Wehrenberg WB, Guillemin R (1984) Proc Natl Acad Sci USA 81:4302– 4306. 4. Sherwood NM, Krueckl SL, McRory JE (2000) Endocr Rev 21:619 – 670. 5. Bloch B, Brazeau P, Ling N, Bohlen P, Esch F, Wehrenberg WB, Benoit R, Bloom F, Guillemin R (1983) Nature 301:607– 608. 6. Billestrup N, Swanson LW, Vale W (1986) Proc Natl Acad Sci USA 83:6854 – 6857. 7. Mayo KE, Hammer RE, Swanson LW, Brinster RL, Rosenfeld MG, Evans RM (1988) Mol Endocrinol 2:606 – 612. 8. Lin C, Lin SC, Chang CP, Rosenfeld MG (1992) Nature 360:765–768. 9. Ehlers CL, Reed TK, Henriksen SJ (1986) Neuroendocrinology 42:467– 474.
  • 81. 10. Obal F, Jr, Alfoldi P, Cady AB, Johannsen L, Sary G, Krueger JM (1988) Am J Physiol 255:R310 –R316. 11. Bueno L, Fioramonti J, Primi MP (1985) Peptides 6:403– 407. 12. Cai A, Hyde JF (1999) Endocrinology 140:3609 –3614. 13. Mayo KE, Cerelli GM, Lebo RV, Bruce BD, Rosenfeld MG, Evans RM (1985) Proc Natl Acad Sci USA 82:63– 67. 14. Hosoya M, Kimura C, Ogi K, Ohkubo S, Miyamoto Y, Kugoh H, Shimizu M, Onda H, Oshimura M, Arimura A, et al. (1992) Biochim Biophys Acta 1129:199 –206. 15. McRory J, Sherwood NM (1997) Endocrinology 138:2380 – 2390. 16. Montero M, Yon L, Kikuyama S, Dufour S, Vaudry H (2000) J Mol Endocrinol 25:157–168. 17. DeAlmeida VI, Mayo KE (1998) Mol Endocrinol 12:750 – 765. 18. Laburthe M, Couvineau A, Gaudin P, Maoret JJ, Rouyer- Fessard C, Nicole P (1996) Ann NY Acad Sci 805:94 –109. 19. McRory JE, Parker DB, Ngamvongchon S, Sherwood NM (1995) Mol Cell Endocrinol 108:169 –177. 20. Montero M, Yon L, Rousseau K, Arimura A, Fournier A, Dufour S, Vaudry H (1998) Endocrinology 139:4300 – 4310. 21. Parker DB, Power ME, Swanson P, Rivier J, Sherwood NM
  • 82. (1997) Endocrinology 138:414–423. 22. Melamed P, Eliahu N, Levavi-Sivan B, Ofir M, Farchi- Pisanty O, Rentier-Delrue F, Smal J, Yaron Z, Naor Z (1995) Gen Comp Endocrinol 97:13–30. 23. Chan KW, Yu KL, Rivier J, Chow BK (1998) Neuroendocrinology 68:44 –56. 24. Wong AO, Li WS, Lee EK, Leung MY, Tse LY, Chow BK, Lin HR, Chang JP (2000) Biochem Cell Biol 78:329 –343. 25. Kee F, Ng SS, Vaudry H, Pang RT, Lau EH, Chan SM, Chow BK (2005) Gen Comp Endocrinol 140:41–51. 26. Hughes AL (1999) J Mol Evol 48:565–576. 27. Wolfe KH (2001) Nat Rev Genet 2:333–341. 28. Jaillon O, Aury JM, Brunet F, Petit JL, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A, et al. (2004) Nature 431:946 – 957. 29. Kellis M, Patterson N, Birren B, Berger B, Lander ES (2004) J Comput Biol 11:319 –355. 30. Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J (1999) Genetics 151:1531–1545. 31. Lynch M, Force A (2000) Genetics 154:459 – 473. 32. Van de Peer Y, Taylor JS, Meyer A (2003) J Struct Funct Genomics 3:65–73. 33. Cardoso JC, Power DM, Elgar G, Clark MS (2004) J Mol Endocrinol 33:411– 428. 34. Amores A, Force A, Yan YL, Joly L, Amemiya C, Fritz A, Ho RK, Langeland J, Prince V, Wang YL, et al. (1998) Science 282:1711–1714. 35. Kumar S, Tamura K, Nei M (2004) Brief Bioinform 5:150 –
  • 83. 163. 36. Livak KJ, Schmittgen TD (2001) Methods 25:402– 408. 37. Kwok YY, Chu JY, Vaudry H, Yon L, Anouar Y, Chow BK (2006) Gen Comp Endocrinol 145:188 –196. 38. Chang JP, Cook H, Freedman GL, Wiggs AJ, Somoza GM, de Leeuw R, Peter RE (1990) Gen Comp Endocrinol 77:256 –273. 39. Lee EK, Chan VC, Chang JP, Yunker WK, Wong AO (2000) J Neuroendocrinol 12:311–322. 40. Marchant TA, Fraser RA, Andrews PC, Peter RE (1987) Regul Pept 17:41–52. 41. Kumar S, Hedges SB (1998) Nature 392:917–920. 2138 � www.pnas.org�cgi�doi�10.1073�pnas.0611008104 Lee et al. http://www.pnas.org/cgi/content/full/0611008104/DC1 http://www.pnas.org/cgi/content/full/0611008104/DC1