Artificial intelligence in the post-deep learning era
Gutell 007.jbc.1984.259.00224
1. THE JOURNALOF BIOLOGICALCHEMISTRY
@ 1984 by The American Society of Biological Chemists, Inc
Vol. 259, No. 1, Issue of January 10, pp. 224-230, 1984
Printed in U,S.A.
The Nucleotide Sequenceof a Rat 18 S Ribosomal Ribonucleic Acid
Gene and a Proposal for theSecondary Structure of 18 S Ribosomal
Ribonucleic Acid*
(Receivedfor publication, July 13, 1983)
Yuen-Ling Chan$, Robin Gutellg, HarryF. NollerglI, and Ira G.WoolSII
From the $Department of Biochemistry, The University of Chicago, Chicago, Illinois 60637 and the SThimann Laboratories,
Universityof Californiaat Santa cruz, Santacruz, California 95064
The nucleotide sequence of a rat 18 S rRNA gene
was determined. The 18 S rRNA encoded in the gene
contains 1874 nucleotides, and the molecular weight
estimated from the sequence is 6.09 X 10’. The se-
quencesof rat andXenopus laevis 18 S rRNAs are very
similar; the only differences of consequence are the
insertions between nucleotides 197 and 206 and be-
tween 258 and 279 of the rat nucleic acid of sequences
rich in guanine and cytosine. A proposal is presented
for the secondary structure of rat 18 S rRNA based on
a comparison with the sequences of 17 other small
ribosomalsubunit RNAs. While the primary sequences
of rat and eubacterial RNAs are different, thededuced
secondary structures are remarkably similar.
Knowledge of the chemistryof the constituentsis required
for an analysis of the structureof ribosomes. Information on
thestructure is, in turn, necessary torelatethe molecular
organization of ribosomes to their function in protein synthe-
sis. Eukaryotic (rat liver)ribosomes contain four molecules
of RNA: 18 S RNA in the40 S subparticle and5, 5.8,and 28
S RNAs in the60 S subparticle. The covalent structuresof 5
S (1)and 5.8 S (2, 3) RNAs from ratliver ribosomes and from
the ribosomes of a number of other eukaryotic species (4)
have beendetermined. Thereis less information on the struc-
ture of the large eukaryotic rRNAs. The sequences of 18 S
rRNAs from Saccharomyces cerevisiae(5)and Xenopus laevis
(6) and of 26 S rRNAs from two speciesof yeast, S. cerevisiae
(7) and Saccharomyces carlsbergensis (8),have been inferred
from the structure of the genes. The sequence of the large
RNAs from the ribosomes of mammalian species had notbeen
determined. The informationis important since a great deal
is known, perhaps more than for any other eukaryoticspecies,
of the chemistryof the othermolecular constituents of mam-
malian ribosomes (9) and of the biochemistry and regulation
of protein synthesis in mammalian cells (10, 11).Moreover,
the most important data for deriving the secondary structure
of ribonucleic acids comesfrom a comparison of the sequence
of the molecule in different species. For all these reasons, we
have determined the covalent structureof a gene for rat 18S
rRNA. We cloned a rat 18 S rRNA gene carried on a phage
(Charon 4A) in plasmidvectors and determined thesequence
of the DNA.
* The costs of publication of this article were defrayed in part by
the payment of page charges. This article must therefore be hereby
marked “aduertisernent” in accordance with 18 U.S.C. Section 1734
solely to indicate this fact.
lISupported by National Institutes of Health Grant GM 17219.
(1 Supported by National Institutes of Health Grants GM 21769
and CA 19265.
Eukaryotic ribosomal RNA genes occur in clusters of tan-
demly repeated transcriptionunits. The mammalian primary
transcript from ribosomal DNA has a sedimentation coeffi-
cient of 45 and contains 18, 5.8, and 28 S rRNAs as well as
transcribed spacerswhich are removed during processing (12).
EXPERIMENTAL PROCEDURES
Materials-DNA from the X phage Charon 4A in which the rat
ribosomal RNA genes had been cloned (13) was kindly provided by
T. D. Sargent and J. Bonner of California Institute of Technology.
Restriction endonucleases and phage T4 DNA ligase were obtained
from Bethesda Research Laboratories or New England Biolabs. [ y -
32P]ATP (approximately 3 X lo3 Ci/mmol) wasfrom Amersham-
Searle, and calf intestinal alkaline phosphatase and phage T4 poly-
nucleotide kinase werefrom Boehringer Mannheim. Digestion of
DNA with restriction enzymes was carried out as recommended by
the supplier.
Preparation of Plasmids ContainingrDNA-The phage Charon 4A,
containing rDNA sequences derived from an EcoRI digest of the rat
genome (13), wasgrown and purified as recommended (14). The
presence of rDNA wasconfirmed by hybridization of rat liver 18,5.8,
and 28 S rRNAs to the recombinant phage DNA. The phage was
dialyzed overnight against 10 mM Tris/HCl, pH 8.0, 50 mM EDTA,
heated at 70 “Cfor 30 min, dialyzed exhaustively against 10mM Tris/
HCI, pH 8.0,5 mM NaCl, 1mM EDTA, and stored at 4 “C.
The rRNA genes were subcloned (15, 16) intothe plasmid pACYC
184 (17).The Charon 4A recombinant DNA (1pg) was digested with
HindIII, and 0.25 pg of pACYC 184 DNA that had been hydrolyzed
with the same restriction enzyme was added. After the HindIII was
inactivated by heating at 65 “C for 15 min, the mixture was cooled,
and sufficient buffer (66 mM Tris/HCl, pH 7.6, 7 mMMgC12, 10 mM
dithiothreitol, 0.4 mM ATP) was added to give a volume of20 rl.
Phage T4 ligasewas added and the sample was incubated at 22 “C
for 2 h. Aliquots were used to transform calcium-treated Escherichia
coli MC 1065.The cellswere grown on Luria-Bertaniagar plates (14)
containing eithertetracycline (10pg/ml) or chloramphenicol (100pg/
ml) to detect transformants. The recombinant plasmid with most of
the 18 S rRNA coding sequence is referred to as pHRR 118. In a
similar manner, an EcoRI insert in Charon 4A containing the 3’ end
of 18S rDNA and most of the 28 S rRNA coding region was cloned
in pACYC 184.This plasmid is referred to aspRRR 228. Thepresence
of the relevant inserts in the two plasmids was confirmed by colony
hybridization (18,19) to32P-labeled18and 28 SrRNAs.The plasmids
were amplified using spectinomycin (17) and isolated (20).
DNA-RNA hybridization-The DNA wasdigested with restriction
enzymes and separated by electrophoresis (0.5-2 V/cm) in agarose
gels (0.8-1.0%) prepared in TBE buffer (50 mM Tris borate, pH 8.3,
50 mM boric acid, 2 mM EDTA). In some cases, ethidium bromide
(0.5 pg/ml) was added to the TBE buffer used for electrophoresis
(21).The DNA wastransferred from the agarose gels to nitrocellulose
paper (22), and the filters were dried in an oven at 80 “C for 2 h.
Hybridization of 32P-labeledrRNA (lo4cpm/cm2of nitrocellulose) to
the DNA fragments was in 50% formamide and 5 X SSC (1 X SSC:
0.015 M sodium citrate, 0.15 M NaCI) incubated overnight at 37 “C.
The filters were washed with 2 X SSC containing heat-treated pan-
creatic ribonuclease (0.001%)and then washed extensively with 2 X
SSC. The filters were dried, and the radioautographs were prepared
at -70 “Cusing intensifying screens.
224
2. The Structure of Rat 18 S rRNA 225
Preparationof Radioactive Nucleic Acids-The DNA or RNA was
maderadioactivebylabeling at the 5' endwithT4polynucleotide
kinase and [y-3ZP]ATPafter removalof phosphate by repeated ad-
dition of 0.1 unit of calf intestine alkalinephosphatase (23). The
reaction mixture was boiled for 1min before phenol extraction.
Isolation of DNAFragments-DNA fragmentsresulting from
digestion with restriction enzymeswere separated by electrophoresis
ineither 1%agarose or 6-8%polyacrylamidegels.Thegelswere
either stainedwith ethidium bromide or radioautographswere made.
The bands containing DNA were cut out of the gel, and the nucleic
acidwasrecoveredfrom the slices by electrophoreticelution into
dialysisbags (21). However,DNA preparedin this way contains
contaminantsthat cause smearing on the gels used to determinethe
sequence. Thus, if the fragment was to be used directlyfor sequence
determination, the DNAwasrecoveredby a modification of the
method recommended by Girvitz et al. (24) in which the orientation
of the dialysis membrane was changed and the DNAwasrecovered
by washing the filter paper and the dialysis membrane with 10 mM
Tris/HCl, pH 7.5, 0.25 M NaCl, 1 mM EDTA.
The strands ofDNA fragmentswere separatedby heating at 90 "C
for 2-3 min in TBE buffer containing 50% dimethyl sulfoxide, 0.05%
bromphenol, 0.05% xylenecyanol. The samplewasfrozenimmedi-
ately andthawedprior to electrophoresiswhichwas as described
above.
Restriction Endonuclease Digestion-Endonuclease sites were de-
terminedusing the multipleenzymes(25) or the partial digestion
methods (26).The fragmentswere separatedby electrophoresis in 5-
8%polyacrylamide gels.
Determination of the Sequence ofDNA and RNA-The sequence
of DNA wasdeterminedusing the method of Maxamand Gilbert (23)
with analysis on 8,10, or 20% polyacrylamidegels(32 x 40 X 0.05
cm) (27). Radioautographs of the gels were prepared at -70 "C with
or without intensifying screens. The sequence of the 5' end of 18 S
rRNAwas determined by both the enzymatic (28-30) and chemical
(31) procedures.
RESULTS AND DISCUSSION
The Determination of the Sequence of 18S rDNA-A pre-
liminaryrestrictionmap of the region of theCharon 4A
recombinant X phage containing the rDNA geneswascon-
structed (Fig. 1).In order to amplify the 18 S rDNA and to
facilitate purification, thegene was subcloned in pACYC 184
at the HindIII and EcoRI sites. Plasmid pHRR118contains
a 5.8-kb' HindIII insert encompassing approximately85% of
the 18 S rRNA gene including the 5' terminus (Fig. 1).The
remainder of the gene was contained in an EcoRI fragmentof
6.4 kb thatwas cloned inpACYC 184to form pRRR 228 (Fig.
1).The latter contains, in addition to the3' end of the 18 S
rRNA gene, the 5.8 S rRNA gene, most of the 28 S rRNA
gene, and two internaltranscribedspacers. It is important
that the 3' end of the HindIII insert (pHRR 118) overlaps
the 5' end of the EcoRI insert (pRRR 228).
A restriction mapof pHRR 118revealed that a combination
of the enzymesBamHI,HindIII, and XhoIwould release two
fragments containing portions of the 18 S rRNA gene (Fig.
1).The entire BamHI-Hind111 fragment (1.07 kb) is located
within the 18 S rRNA gene, whereas the XhoI-BamHI frag-
ment (2.6 kb) contains the 5' end of the gene, the external
transcribed spacer, and partof the nontranscribed spacer. An
analysis of the cleavage patternobtainedwith a series of
restriction endonucleasesidentified enzymes suitablefor gen-
erating fragmentsof a size convenient fora determination of
the nucleotide sequenceand for the productionof overlapping
sequences (Fig. 2). In a similar manner, the 3' endof the 18
S rRNA gene was isolated from pRRR 228 by digesting the
plasmid withEcoRI and XhoI(Fig. 1). This fragment(0.6 kb)
was digested with HpaII to generate subfragments suitable
for a determination of the nucleotide sequence (Fig. 2). The
presence of the 18S rRNA gene in a fragment was sometimes
The abbreviationused is: kb, kilobases.
€COR1
, IKbj , , , , ,I
Xhol XhoI EcoRI
, , ,CHARON 4b
' RECOMBINANT
XhDI
FIG. 1. Restriction endonuclease maps of rat ribosomal
genes cloned in Charon4A and subclonedin pACYC 184. Two
EcoRI fragments from the rat genomewereclonedin the X phage
Charon 4A. The EcoRI fragmenton the left is 9.8 kb and the one on
the right is 6.4 kb.The locationsof the 18, 5.8,and 28 S rRNA genes
weredetermined by Southern blot hybridization. The HindIII frag-
ment was subcloned in pACYC 184to give pHRR 118 and an EcoRI
fragment to give pRRR 228. ETS, external transcribed spacer;ITS,
internal transcribed spacer;NTS, nontranscribed spacer.
confirmed, prior to determinationof the sequence, by South-
ern hybridization (22) tolabeled 18S rRNA. Finally, aUNIX
computer programwas written to search the fragmentswhose
sequence had been determined for additional restriction en-
donuclease sites toprovide overlapping fragments.
The sequence of the 18 S rRNA gene was determined by
the Maxam and Gilbertprocedure (23). The analysisincluded
a determination of the sequence of almost all of both strands
and extensiveoverlaps of portions of the sequences inseparate
fragments (Fig. 2). The order of the fragments was approxi-
mated from the restrictionendonuclease map, from compari-
son with the sequence of X. laevis 18 S rRNA (6), and by
reference to thesequence of the 3' endof rat 18S rRNA (32,
33).
We determined the sequence of the 5' end of 18 S rRNA
by boththe chemical(31) andenzymatic (28-30) methods
(results not shown). This determinationlocated the 5' end of
the gene and confirmed the sequence obtained for the DNA.
The sequence at the 3' end of 18 S rRNAhad beendone
before (32,33). Thesequence of the nucleotides in 18S rRNA
was inferred from the sequence of the gene (Fig. 3). Rat 18S
rRNA has 1874 nucleotides. The molecular weight is approx-
imately 6.09 x lo5(based on the assumption of a molecular
weight of 325 for a nucleotide), somewhat less than the 7 X
lo5 (2154 nucleotides)estimated fromphysiochemical data
(34, 35). The G + C content of rat 18 S rRNA (55.7%) is
higher than thatof X. laevis (53.8%;Ref. 6) andof S. cereuisiae
(45%;Ref. 6).
Comparison of the Sequence of Rat 18 S rRNA with Other
Eukaryotic 18 S rRNAs-When the sequences of rat and X .
laevis 18 S rRNAs were aligned, there were 1752 identities
(the same nucleotide at the same position) of 1822 possible
matches, i.e. 96% identity. Of the nonidentities, 32 (or 26%)
of the totalof 122 nucleotidesare intwo regions,197-206 and
258-279 (see below). It follows thenthatthereare long
stretches of the two rRNAs that are identical. Indeed, it is
the great similarity in the two sequences that enabled us to
use the structure ofX. laevis 18SrRNA to provisionally order
the fragments of the rat gene. It is possible that some of the
scattered singlenucleotidedifferences areactuallyerrors
either in analysisor in the transcribingof the sequence. The
similarity between thesequences of rat and yeast18S rRNA
is not nearlyso great. There isonly 75% identity. While there
are regions of extensive identity, they are interruptedby tracts
3. 226 The Structure of Rat 18 S rRNA
XhO I Srno I Born MI Barn MI
ITS 5.89 ITS
PSt I €COR I X h o I
" " ""
"S" slmnd
"""6
- - ""W
P
-".___)
- f
* - A
4 """"
' c_" "
c_I " - "t_
"C" strond
- -*
-
t _ _ " 3 --" c _ -c(
FIG. 2. Restriction endonuclease map of a rat 18 S rRNA gene and a diagram of the overlapping
fragments whose sequence of nucleotides were determined. The upper portion depicts the restriction
endonuclease sites in therat 18 S rRNA gene that were used to generate fragments (lower portion) for the
determination of the sequence of nucleotides.NTS, nontranscribed spacer, ETS, external transcribed spacer, ITS,
internal transcribed spacer. The numbers indicate the first nucleotide in the restriction endonuclease recognition
sequence counting from the 5' end of the 18 S rRNA gene. Each arrow designates a nucleotide sequence that was
determined. "S"strand is synonymous with the RNA; "C"strand is complementary.
having little similarity (cf. the comparison of yeast and X.
laeuis 18 S rRNAs in Ref. 6). The regions of X.laeuis 18 S
rRNA that are different from the sequence in yeast tend to
be richin G and C. Nonetheless,thesites of nucleotide
methylation are highly conserved (see below also). Finally,
there is little similaritybetween the sequence of E. coli 16 S
rRNA (36) and thatof eukaryotic 18SrRNA (yeast,X. laeuis,
or rat)except in scattered regions and near the3' end (37).
What differences there are in the sequences of rat and X.
laeuis 18SrRNAs occur mainly in the5' region; there are 62
nonidentities in the 400 nucleotides at the 5' terminus and
only 18 in the 3' 400 nucleotides. The difference in the 5'
region of rat andX. laeuis 18S rRNA is accountedfor largely
by 10additional residues at position 197-206 and 22 addi-
tional residues at position 258-279 (Fig. 4). We presume the
acquisition of the additional nucleotides is an evolutionary
change since thetwo tracts do not occur in yeast or X. laeuis
18S rRNAs, but are found in the rabbitnucleic acid (Ref. 38;
Fig. 4). The two insertions are particularly rich in G + C,
84% in rat and72% in rabbit;indeed, they largely account for
the slightly higher G +C content of rat compared toX.laeuis
18 S rRNA. The functional significance, if indeed there is
any, of the insertions is not known.However,variable nu-
cleotide tracts are apparentwhen one comparesX.laeuis and
yeast 18 SrRNAs, and they are G + C-rich also (6); indeed,
it may be true for eukaryotes in general(39).
There is anoligonucleotide stretch (position338-345) pres-
entinrat 18 S rRNAthatisabsentinrabbit.Sincethe
sequence is presentin X. laeuis 18 S rRNA,an organism
phylogenetically far more distant from ratthanrabbit,it
seems more likely to be anerrorin sequencing thanan
evolutionary change. Such errors areknown to be more likely
when one determines the sequence of the RNA, as was the
case for rabbit 18S rRNA (38), rather than theDNA. There
are also some discrepancies in the 3"terminal sequence de-
termined directly from rat18S rRNA (33) and thesequence
obtained here from the DNA (Fig. 5 ) . Once again, we would
ascribe that to the limitations inherent in the formerproce-
dure.
Ribosomal nucleic acidsare methylated. We have not iden-
tified the nucleotides in rat 18 SrRNA that are methylated.
However, the pattern of methylated bases in oligonucleotide
digests from 18SrRNAs from human (HeLa cells), hamster
(BHK/cl3 cells), and mouse (L-cells) could not be distin-
guished(40).Moreover, the methylated nucleotides in 18 S
rRNA from humans (HeLa cells) and from X.laeuis have at
least 95% identity (40). Allof themethylationsitesin X.
laeuis 18SrRNA areconserved in the ratnucleic acids.Thus,
the methylated bases in rat 18 S rRNA are likely to be the
same as those in HeLacells.
A Proposal for theSecondary Structure of Rat 18S rRNA-
An analysis of eubacterial ribosomal RNAs has established
that the mostpowerful and most reliable means available for
deducing secondary structure is comparativesequence analy-
sis (41-45). This methodderives from recognitionof compen-
sating base changes in pairednucleotides in analogous helices
of rRNAs from different organisms. It was originally assumed
that, to a first approximation, the 16 S rRNAs from various
eubacteria were likely to have the same secondary structure.
Now that more than 8 eubacterial 16 SrRNAs are available,
the assumption seems valid (44). However, before extending
this approach to the analysisof eukaryotic 18SrRNAs, which
have 200-300 additional nucleotides, it would seem prudent
to re-examine the assumption. A tentative secondary struc-
ture for rat18SrRNA was derived fromcomparativesequence
analysis (Fig. 6) and compared with E. coli 16 S rRNA (see
the schematic in Fig. 6). The additionalnucleotides in rat 18
S rRNAareaccounted for by three large insertionsand
moderate increases in thesize of four stems.
The secondary structure of rat 18S rRNA (Fig. 6) is based
on the comparative analysisof 17complete sequencesof small
ribosomal subunit RNAs,of which three are from eukaryotic
species (yeast, X . laeuis, and rat), anda RNase TI catalogue
encompassing16S-likeRNAsfrom more than 200 species
(44). Theoverall structural organizations of eubacterial 16 S
and eukaryotic 18 S rRNAsarequitesimilar (Fig. 6). The
difference in sizebetween theeubacterialandeukaryotic
molecules can be attributedtoinsertionsaroundpositions
250, 520, and 680 and extensions of the stems at positions
1120,1430,1570, and1770.By far the largest insertionis that
4. The Structure of Rat 18 S rRNA 227
at position 680,which accounts for 175additional nucleotides.
The foldings of that insertion and of the 25-nucleotide insert
at position 520 were not evident in terms of canonical base
pairs; the sequence of additional eukaryoticrRNAs will prob-
ably be required to solve the secondary structure of these
regions.
Three regions of the secondary structurediffer significantly
PUACCUGGUUG AUCCUGCCAG UAGCAUAUGC UUGUCUCAAA GAUUAAGCCA 50
UGCAUGUCUA AGUACGCACG GCCGGUACAG UGAAACUGCG AAUGGCUCAU 100
UAAAUCAGUU AUGGUUCCUU UGGUCGCUCG CUCCUCUCCU ACUUGGAUAA 1 5 0
CUGUGGUAAU UCUAGAGCUA AUACAUGCCG ACGGGCGCUG ACCCCCCUUC 200
CCGUGGGGGG AACGCGUGCA UUUAUCAGAUCAAAACCAAC CCGGUCAGCC 2 5 0
CCCUCCCGGC UCCGGCCGGG GGUCGGGCGC CCGGCGGCUU UGGUGACUCU 300
AGAUAACCUC GGGCCGAUCG CACGUCCCCG UGGCGGCGAC GACCCAUUCG 350
AACGUCUGCC CUAUCAACUU UCGAUGGUAG UCGCCGUGCC UACCAUGGUG 4 0 0
ACCACGGGUG ACGGGGAAUC AGGGUUCGAU UCCGGAGAGG GAGCCUGAGA 4 5 0
AACGGCUACC ACAUCCAAGG AAGGCAGCAG GCGCGCAAAU UACCCACUCC 500
CGACCCGGGG AGGUAGUGAC GAAAAAUAAC AAUACAGGAC UCUUUCGAGG 550
CCCUGUAAUU GGAAUGAGUC CACUUUAAAU CCUUUAACGA GGAUCCAUUG 600
GAGGGCAAGU CUGGUGCCAG CAGCCGCGGU AAUUCCAGCU CCAAUAGCGU 650
AUAUUAAAGU UGCUGCAGUU AAAAAGCUCG UAGUUGGAUC UUGGGAGCGG 7 0 0
GCGGGCGGUC CGCCGCGAGG CGGCUCAGCG CCCUGUCCCC AGCCCCUGCC 7 5 0
UCUCGGCGCC CCCUCGAUGC UCUUAGCUGA GUGUCCCGCG GGGCCCGAAG 800
CGUUUACUUUGAAAAAAUUA GAGUGUUCAA AGCAGGCCCG AGCCGCCUGG 850
AUACCGCAGC UAGGAAUAAU GGAAUAGGAC CGCGGUUCUA UUUUGUUGGU 9 0 0
UUUCGGAACU GAGGCCAUGA UUAAGAGGGA CGGCCGGGGG CAUUCGUAUU 9 5 0
GCGCCGCUAG AGGUGAAAUU CUUGGACCGG CGCAAGACGA ACCAGAGCGA 1000
AAGCAUUUGC CAAGAAUGUUUUCAUUAAUCAAGAACGAAA GUCGGAGGUU 1050
CGAAGACGAU CAGAUACCGU CGUAGUUCCG ACCAUAAACG AUGCCGACUG 1100
GCGAUGCGGC GGCGUUAUUC CCAUGACCCG CCGGGCAGCU UCCGGGAAAC 1150
CAAAGUCUUU GGGUUCCGGG GGGAGUAUGG UUGCAAAGCUGAAACUUAAA 1200
GGAAUUGACG GAAGGGCACC ACCAGGAGUG GAGCCUGCGG CUUAAUUUGA 1 2 5 0
CUCAACACGGGAAACCUCAC CCGGCCCGGA CACGGACAGG AUUGACAGGU 1 3 0 0
UGAUAGCUCU UUCUCGAUUC CGUGGGUGGU GGUGCAUGGC CGUUCUUAGU 1350
UGGUGGAGCG AUUUGUCUGG UUAAUUCCGA UAACGAACGA GACUCUCGGC 1 4 0 0
AUGCUAACUA GUUACGCGAC CCCCGAGCGG UCGGCGUCCC CCAACUUCUU 1 4 5 0
AGAGGGACAA GUGGCGUUCA GCCACCGAGA UUGAGCAAUA ACAGGUCUGU 1500
GAUGCCCUUA GAUGUCCGGG GCUGCACGCG CGCUACACUG AACUGGUUCA 1 5 5 0
GCGUGUGCCU ACCCUACGCC GGCAGGCGCG GGUAACCCGU UGAACCCCAU 1600
UCGUGAUGGG GAUCGGGGAU UGCAAUUAUU CCCCAUGAAC GAGGAAUUCC 1650
CAGUAAGUGC GGGUCAUAAG CUUGCGUUGA UUAAGUCCCU GCCCUUUGUA 1 7 0 0
CACACCGCCC GUCGCUACUA CCGAUUGGAU GGUUUAGUGA GGCCCUCGGA 1 7 5 0
UCGGCCCCGC CGGGGUCGGC CCACGGCCUU GGCGGAGGCC UGAGAAGACG 1800
GUCGAACUUG ACUAUCUAGAGGAAGUAAAA GUCGUAACAA GGUUUCCGUA 1 8 5 0
GGUGAACCUG CGGAAGGAUC AUUA 1 8 7 4
FIG.3. The sequence of nucleotides in rat 18 S rRNA.
I O 20 30
between eukaryotes and eubacteria. The first is around posi-
tions 120 and 350, which in rat 18 S rRNA can form a series
of loosely paired, interrupted helices, which contrast with the
two stems separated by an internalloop posited for E. coli 16
S rRNA. The flanking structural and sequence similarities
indicate the two eukaryotic loops are analogous to the 122-
142/221-239 region in E. coli 16 S rRNA, but that the eukar-
yotic structure is more irregular. The secondary structure of
this region is well established for eubacterial RNA from
XENOPUS
pUACCUGGUUGAUCCUGCCAGUAGCAUAUGCUUGYCUCAAAGAUUAAGCCARABBIT
pUACCUGGUUGAUCCUGCCAGUAGCAUAUGCUUGUCUCAAAGAUUAAGCCARAT
50pUACCUGGUUGAUCCUGCCAGUAGCAUAUGCUUGUCUCAAAGAUUAAGCCA
XENOPUS
RAT
RABBIT
XENOPUS
RAT
RABB1T
XENOPUS
RAT
RABBIT
XENOPUS
RAT
RABBIT
XENOPUS
RAT
RABBIT
XENOPUS
RAT
RABBIT
XENOPUS
RAT
RABBIT
XENOPUS
RAT
RABBIT
UGC&~&&A AGUACGCACGGCCGGUACAGUGAAACUGCGAAUGGCUCAU
UGCAUGUCUAAGUACGCACGGCCGGUACAGUGAAACUGCGAAUGGCUCAU
100
UGC:U$CUA AGUACGCACG GCCGGUACAG UGAAACUG~GAAUGGCUCAU
ACUUGGAUAA 150
UAAAUCAGUU AUGGUUCCUUUGGUCGCUCGCUCCUCUCCUACUUGGAUAA
UAAAUCAGUU AUGGUUCCUUUGGUCGCUCGCUCCUCUCCUACUUGGAUAA
CUGUGGUAAUUCUAGAGCUAAUAC
CUGUGGUAAUUCUAGAGCUAAUACAUGCCGACGGGCGCUGACCCCCCUUC
CUGUGGUAAU UCUAGAGCUA AUACAUGCCG :CGG*CUG ?cccc@
............&GG bJ=kJGCGUGCA UUUAUCAG~ CAAAACCAUCCGG
CGUGCA UUUAUCAGAU CAAAACCAAC CCGGUCAGCC
CGUGCA UUUAUCAGAU CAAAACCAAC CCEG
C
: : : :::::::::: :::::::::b CCGGCGGCUUUGGUGACUCU 300
CCCUCCCGGCUCCGGCCGGGGGUCGGGCGCCCGGCGGCUUUGGUGACUCU
@CCGGC ~ C B ; c $ q hY Y G G ~ G CRGGC$GCU:UGGUGACUCU
AGAUAACCUC GGGCCGAUCG CACGUCCCCG U ~ G G C G A CG ~ A U U C G
AGAUAACCUC GGGCCGAUCG C A Y G ~ C ~ C GU&G%~ :: :: :::k
350
AGAUAACCUC GGGCCGAUCGCACGUCCCCGUGGCGGCGACGACCCAUUCG
UCUGCC CUAUCAACUU UCGAUGGUDU L#-##xc UACCAUGGUG 400
AACGUCUGCC CUAUCAACUU UCGAUGGUAGUCGCCGUGCCUACCAUGGUG
AACGU~UGCCC:AUCAACUU U C G A U G C ~ GUCGCCGU* UACCAUGGUG
ACCACGGGUGACGGGGA
417
FIG.4. Comparisonof the 5”terminal region of rat, rabbit,
and X. Zuevis 18 S rRNAs. Differences from the sequence of rat
18 S rRNA are indicated by placement of the nucleotide outside of
the box. A colon designatesadeleted nucleotide, and Y a possible
modified pyrimidine in the rabbit sequence.
RAT 18 S rRNA GA-GUGU-GAA-UUGAGUAUGUAGAGGAAGUAAAAGUCGUAAGAAGGUUUCCGUAGGUGAACCUGCGGAAGGAGUCAUUAOH
.. 40 50 60 70 80
t 4 4 t t 4 t t
RAT 18 S rDNA GACG-GUCGAACUUGACUAUCUAGAGGAAGUAAAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCGGAAGGA-UCAUUAOH
RABBIT 18 S rRNA GACG-GUCGAACUUGACUAUCUAGAGGAAGUAAAAGUCGUAAAAGGUUUCCGUAGGUGA ACCUGCGGMGGA-UCAUUAOH
F
X.iaevis 18 S rDNAGACG-AUCAAACUUGACUAU CUAGAGGAAGUAAAACUCGUAACAAGGUUUCCGUAGGUGAACCUGCGGAAGGA-UCAUUAOH
4 4
FIG. 5. A comparisonof the 3’-terminal region of rat, rabbit, and X. laevis 18 S rRNAs. The arrows
indicate nucleotidesthat diverge from the sequence of the rat18SrRNA gene. A space indicates that no nucleotide
was found at thatposition.
5. 228 The Structure of Rat 18 S rRNA
y ~ A A * u
2 uc
FIG.6. A proposal for the secondary structure of rat 18 S rRNA. This structure is based on a general
structure for eukaryotic 18 S rRNAs derived from comparative sequence analysis (44). Shaded helices are those
considered to he proven on the basis of comparative sequence analysis, ie. there are two or more compensating
base changes (42). A drawing of the proposed secondary structure for E. coli 16 S rRNA (41-45) is in the lower
right for comparison. The sequences designated a and b have not been assigned secondary structure andbelong, as
indicated,afterpositions 572 and 679, respectively. The dots after position 79 indicate wherefournucleotides
occur in eubacterial16 S rRNA for whichthere areno counterparts in eukaryotic18S rRNA. The separatehairpin
structure in the center shows an alternate base-paired structurefor residues 3-21.
6. The Structure of Rat 18 S rRNA 229
comparative sequence analysis (41-45). The secondmajor
differenceinvolves a presumptive pairingof 564-570and 880-
886 in E. coli, which we relate to 663-66911165-1171 in rat.
There is not yet sufficient evidence from comparison of se-
quences to establish the existence of the stem in eubacteria
owing to conservationof the primary structure.Evidence from
psoralen cross-linking(46) and fromribosomal protein bind-
ing (47, 48) experiments is, however, consistent with a stem
in that region of 16 S rRNA. Here again, it may be the case
that eukaryotic rRNA tends to be more irregular in its sec-
ondarystructure.Thethirdandmoststriking difference
involves the region corresponding to positions approximately
590-650 of the E. coli structure, which appears tobe missing,
or possibly relocated, in 18S rRNA. In eubacteriall6S rRNA
(44),thisdomain includes thebindingsite forribosomal
protein S8 (49, 50). This compound helical region branches
off a position corresponding toresidue 690 of rat 18 S rRNA
(Fig. 6). The large insertionnearposition 680 inrat 18 s
rRNA does not appear to be analogous to theE. coli S8 stem
for two reasons; it is not capable of forming a similar com-
pound helical structure, and its location, although nearby, is
displaced fromthat of the S8 stem.It is of interestthat
eukaryotic 18 S rRNAs do not bindE. coli ribosomal protein
58.'
Moreprovocative arethesimilaritiesineubacterialand
eukaryotic structures. The lengthof stems, the size of loops,
and the sequence of unpaired regions are often identical or
nearly so (Fig. 6).Theunpaired sequences 617-632,1698-
1714, and 1830-1842 are nearly identicalwith their counter-
parts inE. coli and areenclosed by nearly identicalsecondary
structural features. These threesequences have been directly
implicated in the translationalprocess. Kethoxal modification
experiments have shown thata site in the firstof these (617-
632) is protected onlywhen tRNA is bound toribosomes (51),
whereas the lattertwo sequences (1698-1714 and 1830-1842)
contain nucleotides shielded by ribosomalsubunit association
(52). Finally, the wobble base of the anticodon of tRNA can
be cross-linked to position1706 (or its analogue) in eubacter-
ial and eukaryotic ribosomes (53, 54). Thus, it appears that
the portions of the secondary structures of rRNAs that are
highly conserved are likely to be important for function.
Acknowledgments-We are grateful to Jose Olverafortechnical
assistance, to Francis W. H. Leungforwriting the computer pro-
grams, to C. Woese for valuableadvice while we were formulating the
secondary structure of rat 18 S rRNA, and to Joann Kop,Joany
Chou, and Carol J. Muster for helpful discussions. We are indebted
to Thomas D. Sargent and James Bonner for providing the recom-
binant A phage Charon 4A and toMalcolm Casadaban for the plasmid
pACYC 184 and for MC 1065 cells.
1.
2.
3.
4.
5.
6.
7.
8.
REFERENCES
Aoyama, K., Hidaka, S., Tanaka, T., and Ishikawa, K. (1982) J.
Nazar, R. N., Sitz, T. O., and Busch, H. (1975) J. Biol. Chem.
Subrahmanyam, C. S.,Cassidy, R., Busch, H., and Rothblum, L.
Erdmann, V. A. (1980) Nucleic Acids Res. 8,r31-r47
Rubtsov, P. M., Musakhanov, M. M., Zakharyev, V. M., Krayev,
A. S., Skryabin, K. G., and Bayev, A.A. (1980) Nucleic Acids
Res. 8,5779-5794
Salim, M.,and Maden, B. E. H. (1981) Nature (Lond.)291,205-
208
Georgiev, 0.I., Nikolaev, N., Hadjiolov, A. A,, Skryabin, K. G.,
Zakharyev, V. M., and Bayev, A. A. (1981) Nucleic Acids Res.
Veldman, G. M., Klootwijk, J., de Regt, V. C. H. F., Planta, R.
Biochern. (Tokyo)91, 363-367
250,8591-8597
I. (1982) Nucleic Acids Res. 10. 3667-3680
9,6953-6958
* D. Thurlow and R. A. Zimmermann, personal communication.
J., Branlant, C., Krol, A., and Ebel, J.-P. (1981) Nucleic Acids
Res. 9, 6935-6952
9. Wool, I.G. (1979) Annu. Rev. Biochem. 48,719-754
10. Brot, N. (1982) in ProteinBiosynthesis in Eukaryotes (Perez-
11. Jackson, R. J. (1982) in Protein Biosynthesis in Eukaryotes
12. Maden, B. E. H. (1976) Trends Biochern. Sci. 1,196-199
13. Sargent, T. D., Wu, J. R., Sala-Trepat, J. M., Wallace, R. B.,
Reyes, A. A,, and Bonner,J. (1979) Proc. Natl. Acad. Sci. U. S.
14. Davis, R. W., Botstein, D., and Roth, J.R. (1980) A Manml /or
Genetic Engineering, pp. 80-82, Cold Spring Harbor Labora-
tory, Cold Spring Harbor, NY
15. Clarke, L., and Carbon, J. (1975) Proc. Natl. Acad. Sci. U. S. A.
72,4361-4365
16. Bolivar, F., and Backman,K. (1979) Methods Enzyrnol. 68,245-
267
17. Chang, A. C. Y., and Cohen, S. N. (1978) J. Bacterid. 134,1141-
1156
18. Grunstein, M., and Wallis, J. (1979) Methods Enzyrnol. 68, 379-
389
19. Grunstein, M., and Hogness, D. S. (1975) Proc. Nut[.Acad. Sci.
20. Kupersztoch, Y. M.,and Helinski, D. R. (1973)Biochem. Biophys.
21. McDonnell, M. W., Simon, M. N., and Studier, F. W. (1977) J.
22. Southern, E. M. (1975) J. Mol. Biol. 98, 503-517
23. Maxam, A. M., and Gilbert,W. (1980) MethodsEnzymol. 65,
24. Girvitz, S. C., Bacchetti, S., Rainbow, A. J., and Graham, F. L.
25. Danna, K. J. (1980) Methods Enzymol. 65, 449-467
26. Smith, H. O., and Birnstiel, M. L. (1976) Nucleic Acids Res. 3,
27. Sanger, F., and Coulson, A. R. (1978) FEBS Lett. 87, 107-110
28. Donis-Keller, H., Maxam, A.M., and Gilbert, W. (1977) Nucleic
Acids Res. 4, 2527-2538
29. Lockard, R. E., Alzner-Deweerd, B., Heckman, J. E., MacGee, J.,
Tabor, M. W., andRajBhandary, J. L. (1978) Nucleic Acids
Res. 5,37-56
Bercoff, R., ed) pp. 253-273, Plenum Press, New York
(Perez-Bercoff, R. ed) pp. 363-418, Plenum Press, New York
A. 76,3256-3260
U. S. A. 72, 3961-3965
Res. Commun. 54,1451-1459
Mol. Biol. 110, 119-146
499-560
(1980) Anal. Biochem. 106,492-496
2387-2399
30. Donis-Keller, H. (1980) Nucleic Acids Res. 8,3133-3142
31. Peattie, D. A. (1979) Proc. Natl. Acad. Sci. U. S.A. 76, 1760-
32. Alberty, H., Raba, M., and Gross, H.J. (1978) Nucleic Acids Res.
33. Azad, A. A., and Deacon, N. J. (1980)Nucleic Acids Res. 8, 4365-
34. Loening, U. E. (1968) J. Mol. Biol. 38, 355-365
35. Weinberg,R. A., and Penman, S. (1970) J. Mol. Biol. 47, 169-
178
36. Brosius, J.,Palmer, M. L., Kennedy, P.J.,and Noller, H. F.
(1978) Proc. Natl. Acad. Sci. U. S.A. 75, 4801-4805
37. Hagenbuchle, O., Santer, M., Steitz,J. A., and Mans,R. J. (1978)
Cell 13,551-563
38. Lockard, R. E., Connaugton, J. F., and Kumar, A. (1982) Nucleic
Acids Res. 10, 3445-3457
39. Boseley, P. G., Moss, T., Machler, M., Portmann, R., and Birn-
stiel, M. (1979) Cell 17, 19-31
40. Shah, M., Khan, N., Salim, M., and Maden,B. E. H.(1978)
Biochern. J. 169, 531-542
41. Noller, H. F.,and Woese, C. R. (198l)Science (Wash.D. C.) 212,
403-411
42. Fox, G. E.,and Woese, C. R. (1975) Nature (Lord.)256, 505-
507
43. Woese, C. R., Magrum, L. J., Gupta, R., Siegel, R. B., Stahl, D.
and Noller, H. F. (1980) Nucleic Acids Res. 8.2275-2293
A., Kop, J., Crawford, N., Brosius, J., Gutell, R., Hogan, J. J.,
44. Woese, C. R., Gupta, R., Gutell,R. R., and Noller, H. F. (1983)
Microbial. Rev., in press
45. Noller, H.F., Kop, J., Wheaton, V., Brosius, J., Gutell, R.R.,
Kopylov, A.M., Dohme, F., Herr, W., Stahl, D. A., Gupta, R.,
and Woese, C. R. (1981) Nucleic Acids Res. 9, 6167-6189
46. Wollenzien, P. L., and Cantor, C. R. (1982) J. Mol. Biol. 159,
151-166
47. Ehresmann, C., Stiegler, P., Carbon, P., Ungewickell, E., and
Garrett, R. A. (1980) Eur. J. Biochem. 103, 439-446
1764
5,425-434
4376
7. 230 The Structure of Rat 18 S rRNA
48. Cole, M.D., Beer, M., Koller, T., Strycharz, W. A., and Nomura, 52. Chapman, N. M., and Noller, H. F. (1977) J. Mol. Biol. 109,
49. Muller, R., Garrett, R. A., and Noller, H. F. (1979) J. Biol. Chem. 53. Prince, J. B., Taylor, B. H., Thurlow, D. L., Ofengand, J., and
50. Zimmermann, R. A., andSingh-Bergmann, K. (1979) Biochim. 5450-5454
51. Brow, D.A., and Noller, H. F. (1983) J. Mol. Biol. 163,27-46 (1982) Proc. Nutl. Acad. Sci. U. S. A. 79,2817-2821
M. (1978) Proc. Nutl. Acad. Sci. U. S. A. 75,270-274 131-149
254,3873-3878Zimmermann, R.A. (1982) Proc. Nutl. Acad. Sci. U. S. A. 79,
Biophys. Acta 563,422-431 54. Ofengand, J., Gornicki, P., Chakraburtty, K., and Nurse, K.