This document analyzes spliceosomal introns in ribosomal RNA genes of fungi. The key findings are:
1) Analysis of 49 intron insertion sites in fungal rRNA genes found that the conserved flanking sequence is G-intron-G, suggesting this is the proto-splice site where introns are inserted.
2) The exon sequences flanking the introns contain statistically significant information content, indicating the presence of potential regulatory elements.
3) Spliceosomal introns in fungal rRNA genes tend to occur in less conserved regions of the rRNA, whereas group I introns occur in more functionally important, conserved regions.
This document proposes a hypothesis for the origin of the three cellular domains of life - Archaea, Bacteria, and Eukarya. It suggests that independent transfers of DNA viruses to existing RNA cells gave rise to the three domains. Each transfer stabilized a different version of the proteins involved in translation. The existence of three different founder DNA viruses also explains why each domain has distinct DNA replication machinery. This model aims to address weaknesses in other models and explain why informational proteins differ across domains.
M. Onrubia and colleagues have published numerous papers, book chapters, and conference presentations on their research into improving the production of taxanes like paclitaxel (taxol) in plant cell cultures. Their publications examine the effects of elicitors like methyl jasmonate and coronatine on taxane production and the relationship between gene expression and metabolite biosynthesis. They have also filed several patents on novel genes that could be used to modulate taxane production in plants for biotechnological applications.
This document summarizes a scientific paper on understanding the contributions of prokaryotes (bacteria and archaea) to the eukaryotic genome through a network approach. The paper characterized six types of evolutionary units, five of which involve mosaic lineages generated by horizontal gene transfer. It introduced terminology based on networks of three nodes ("P3s") and "mosaic P3s" to detect these units. Recognizing these evolutionary relationships beyond vertical descent stimulates rethinking key questions in evolution, like early evolution, novelty origins, and lineage formation. This expands understanding of biological complexity beyond genealogy to additional sources of diversity.
This document summarizes key points from a class on microbial phylogenomics taught by Jonathan Eisen. It discusses reading scientific papers, specifically beginning with the introduction rather than the abstract. It also provides guidance on identifying the big question a field is trying to answer, summarizing the background and limitations of prior work, stating the specific questions authors are addressing, and identifying their experimental approach. The document does not summarize any specific paper.
DNA and microbes were the topic of the document. It contained the following key points:
1) Microbes are small but there are lots of them, with more cells on Earth than stars in the universe. They play important roles in ecosystems through processes like nitrogen fixation and the carbon cycle.
2) Studying microbial diversity has progressed through four eras, from identifying the bacterial and archaeal domains using rRNA to metagenomic studies of microbial communities without culturing.
3) Sequencing revolutionized the field by allowing classification of microbes and prediction of their functions from genomes. It revealed novel forms of phototrophy and the vast uncultured majority of microbes.
Gutell 123.app environ micro_2013_79_1803Robin Gutell
This document summarizes a study examining the host specificity of Lactobacillus bacteria associated with different hymenopteran (bee and ant) hosts. The researchers compiled nearly full-length 16S rRNA gene sequences of Lactobacillus from public databases and used these to construct phylogenetic trees. They also included shorter 16S sequences from surveys of bacteria associated with sweat bees, fungus-growing ants, and fire ants. The results showed that lactobacilli associated with honey bees and bumble bees are highly host specific, while sweat bees and ants associate with lactobacilli more closely related to those found in diverse environments or vertebrate hosts. The high host specificity seen in corbiculate bees (honey bees
A very general lecture on the Epigenomics Roadmap and its main contributions.
This lecture was composed for the students of "Genomic and Epigenomic Medicine 2015/2016 (15 credits)"
http://www.uu.se/en/admissions/master/selma/Kurser/?kKod=3MG025&lasar=15/16&typ=1
A course of the Master's program in Molecular Medicine at Uppsala University
http://www.uu.se/en/admissions/master/selma/program/?pKod=MBK2M
This document proposes a hypothesis for the origin of the three cellular domains of life - Archaea, Bacteria, and Eukarya. It suggests that independent transfers of DNA viruses to existing RNA cells gave rise to the three domains. Each transfer stabilized a different version of the proteins involved in translation. The existence of three different founder DNA viruses also explains why each domain has distinct DNA replication machinery. This model aims to address weaknesses in other models and explain why informational proteins differ across domains.
M. Onrubia and colleagues have published numerous papers, book chapters, and conference presentations on their research into improving the production of taxanes like paclitaxel (taxol) in plant cell cultures. Their publications examine the effects of elicitors like methyl jasmonate and coronatine on taxane production and the relationship between gene expression and metabolite biosynthesis. They have also filed several patents on novel genes that could be used to modulate taxane production in plants for biotechnological applications.
This document summarizes a scientific paper on understanding the contributions of prokaryotes (bacteria and archaea) to the eukaryotic genome through a network approach. The paper characterized six types of evolutionary units, five of which involve mosaic lineages generated by horizontal gene transfer. It introduced terminology based on networks of three nodes ("P3s") and "mosaic P3s" to detect these units. Recognizing these evolutionary relationships beyond vertical descent stimulates rethinking key questions in evolution, like early evolution, novelty origins, and lineage formation. This expands understanding of biological complexity beyond genealogy to additional sources of diversity.
This document summarizes key points from a class on microbial phylogenomics taught by Jonathan Eisen. It discusses reading scientific papers, specifically beginning with the introduction rather than the abstract. It also provides guidance on identifying the big question a field is trying to answer, summarizing the background and limitations of prior work, stating the specific questions authors are addressing, and identifying their experimental approach. The document does not summarize any specific paper.
DNA and microbes were the topic of the document. It contained the following key points:
1) Microbes are small but there are lots of them, with more cells on Earth than stars in the universe. They play important roles in ecosystems through processes like nitrogen fixation and the carbon cycle.
2) Studying microbial diversity has progressed through four eras, from identifying the bacterial and archaeal domains using rRNA to metagenomic studies of microbial communities without culturing.
3) Sequencing revolutionized the field by allowing classification of microbes and prediction of their functions from genomes. It revealed novel forms of phototrophy and the vast uncultured majority of microbes.
Gutell 123.app environ micro_2013_79_1803Robin Gutell
This document summarizes a study examining the host specificity of Lactobacillus bacteria associated with different hymenopteran (bee and ant) hosts. The researchers compiled nearly full-length 16S rRNA gene sequences of Lactobacillus from public databases and used these to construct phylogenetic trees. They also included shorter 16S sequences from surveys of bacteria associated with sweat bees, fungus-growing ants, and fire ants. The results showed that lactobacilli associated with honey bees and bumble bees are highly host specific, while sweat bees and ants associate with lactobacilli more closely related to those found in diverse environments or vertebrate hosts. The high host specificity seen in corbiculate bees (honey bees
A very general lecture on the Epigenomics Roadmap and its main contributions.
This lecture was composed for the students of "Genomic and Epigenomic Medicine 2015/2016 (15 credits)"
http://www.uu.se/en/admissions/master/selma/Kurser/?kKod=3MG025&lasar=15/16&typ=1
A course of the Master's program in Molecular Medicine at Uppsala University
http://www.uu.se/en/admissions/master/selma/program/?pKod=MBK2M
This document summarizes a study that identified novel molecules involved in axon-glial interactions during peripheral myelination. The researchers:
1) Developed a method to isolate projections ("pseudopods") that Schwann cells extend in response to signals from axons, allowing proteomic analysis of proteins specifically present at axon-glial contacts.
2) Identified major signaling networks and novel proteins, including members of the Prohibitin family, at the glial leading edge contacting axons.
3) Found that genetic deletion of Prohibitin-2 in mice impairs axon-glial interactions and myelination, validating its importance.
This novel method provides insights into molecular organization
1. The document discusses the nature and structure of genes based on research in microbiology and genetics.
2. It describes genes as units of heredity located on chromosomes that direct protein synthesis. Genes are made of DNA and contain multiple sites where mutations can occur.
3. Research has found that genes have a complex internal structure, with many subunits or sites arranged linearly along the DNA molecule. Mutations at different sites can result in different alleles or variants of a gene.
This document summarizes a study that characterized the ecdysone receptor (EcR) gene in the salmon louse (Lepeophtheirus salmonis), an economically important parasite in salmon farming. The researchers isolated and sequenced cDNA of the predicted L. salmonis EcR gene, which encoded a protein highly similar to other arthropod EcRs. In situ analysis showed the EcR transcript is expressed in ovaries, sub-cuticle, and oocytes of adult female lice. Knockdown of EcR using RNA interference terminated egg production, indicating it plays an important role in reproduction and oocyte maturation. This suggests disrupting EcR signaling may provide a way to control louse reproduction and infestation.
This document summarizes a study that mapped 72 dinucleotide microsatellite loci from Drosophila subobscura onto its polytene chromosomes using fluorescent in situ hybridization (FISH). The distribution of microsatellites was not uniform across chromosomes, with higher density in the sex chromosome than autosomes. Homologous regions were identified in D. pseudoobscura and D. melanogaster genomes, supporting conservation of chromosomal elements among Drosophila species but also intrachromosomal rearrangements within lineages. The lack of microsatellite repeats in homologous D. melanogaster sequences suggests convergent evolution for high density in the X chromosome distal region.
This document describes a study that analyzed the mitochondrial large subunit ribosomal RNA (LSU rRNA) sequences and secondary structures of 10 mollusk species. The researchers determined the complete nucleotide sequences and inferred the secondary structure models for each species. They found substantial length variation among taxa, with gastropods having the shortest lengths. Phylogenetic analysis supported monophyly of several taxa. Most notably, they discovered phylogenetic signal in the secondary structure of mollusk rRNA, with some gastropods uniquely lacking stem/loop structures, explaining much of the observed length variation.
This document discusses the organization and segregation of bacterial chromosomes using Escherichia coli as an example. It finds that: (1) two markers on the same chromosome arm colocalize in the same cell half, while markers on opposite arms are in opposite cell halves, (2) duplicated chromosome arms are usually oriented in a tandem repeat configuration, and (3) sister cells are not usually identical after cell division due to rearrangement of chromosome arms.
Gillespie J.J., Johnston J.S., Cannone J.J., and Gutell R.R. (2006).
Characteristics of the nuclear (18S, 5.8S, 28S and 5S) and mitochondrial (12S and 16S) rRNA genes of Apis mellifera (Insecta:Hymenoptera): structure, organization, and retrotransposable elements.
Insect Molecular Biology, 15(5):657-686.
The document describes research on fragmentation of the large subunit ribosomal RNA (LSU rRNA) gene in oyster mitochondrial genomes. Key findings include:
1) The LSU rRNA gene is split into two fragments separated by thousands of nucleotides in three species of oysters.
2) RT-PCR and EST analysis showed the two fragments are transcribed separately in Crassostrea virginica and are not spliced together.
3) Secondary structure models of the fragmented LSU rRNA genes were predicted for C. virginica, C. gigas, and C. hongkongensis based on comparative sequence analysis. This fragmentation represents a novel phenomenon in bilateral metazoan mitochondrial genomes.
1) The document summarizes recent research on the mechanism of ParA-mediated chromosome segregation in bacteria.
2) Key findings include that ParA forms a structure along the bacterial nucleoid that is involved in segregating specific chromosomal loci.
3) A recent study found that in Caulobacter crescentus, ParA forms a narrow structure along the long axis of the cell, and the ParB/parS complex follows the edge of a receding ParA structure, providing evidence this is the mechanism of segregation.
The document discusses mechanisms of prezygotic isolation between the corn- and rice-strains of the moth Spodoptera frugiperda to determine their relative importance and interactions. It investigates potential isolation due to host plant differentiation, differences in sexual communication, and allochronic differentiation in daily rhythms. The most consistent prezygotic barrier is allochronic differentiation, with genetic analysis identifying a major gene underlying the circadian differentiation between strains.
The document describes a meta-analysis of microbial community samples collected by the Earth Microbiome Project (EMP) that used coordinated protocols and analytical methods to explore patterns of diversity at an unprecedented scale. By tracking individual bacterial and archaeal ribosomal RNA gene sequences across multiple studies, the analysis resulted in both a reference database providing global context to DNA sequence data and an analytical framework for incorporating future study data to further characterize Earth's microbial diversity. The meta-analysis found that standardized environmental descriptors and new analytical methods, particularly using exact sequences instead of clustered operational taxonomic units, enabled comparisons across studies and exploration of large-scale ecological patterns.
This document summarizes a study that reconstructed 7,903 bacterial and archaeal genomes from over 1,500 public metagenomes. Key findings include:
- The genomes increase phylogenetic diversity of bacterial and archaeal trees by over 30% and provide first representatives for 17 bacterial and 3 archaeal candidate phyla.
- 245 genomes were recovered from the Patescibacteria superphylum.
- The genomes vary substantially in quality, with 43.5% considered near-complete, 43.8% medium quality, and 12.7% partial.
- The genomes expand representation of underrepresented phyla like Aminicenantes, Gemmatimonadetes, and Lentisphaera
The complete sequences of RNA 4 from cucumber mosaic virus (CMV) strains Ny (subgroup I) and Sn (subgroup II) were determined and compared to other known CMV RNA 4 sequences. The identification of a unique EcoRI site, present only in subgroup-II RNA 4 sequences, provides a simple method for classifying CMV isolates into subgroups I and II. Sequence variation was greater in the untranslated regions of RNA 4 than previously observed, with 74.9% identity between subgroups and 93.6% within subgroup II.
This study investigates what happens to proteins associated with DNA during chromosome translocation in Bacillus subtilis sporulation. Using fluorescent protein fusions and a mutant that forms two forespores, the authors show that RNA polymerase, chromosome remodeling proteins, and transcription factors are stripped off the chromosome as it is translocated into the forespores. Specifically, they demonstrate that a TetR-GFP fusion bound to an operator array is efficiently removed from the translocating DNA. Additionally, in vitro experiments indicate that the ATPase domain of SpoIIIE can displace RNA polymerase from DNA. These results suggest that SpoIIIE translocates naked DNA and strips associated proteins during chromosome transport, which may play a role in reprogramming gene
This document summarizes a study that used PCR and cloning to analyze the 16S rRNA genes present in a natural marine bacterioplankton population from the Sargasso Sea. Researchers constructed a library of 51 small-subunit rRNA genes and sequenced five unique genes. In addition to genes from known marine Synechococcus and SAR11 lineages, they identified two new classes of genes belonging to alpha- and gamma-proteobacteria, confirming that many planktonic bacteria have not been previously recognized by microbiologists.
RNA localization to the Balbiani body in Xenopus oocytes is regulated by the energy state of the cell and is facilitated by kinesin II. The rate of RNA accumulation in the Balbiani body depends on temperature and intracellular ATP concentration - increasing ATP concentration doubles the localization rate. Inhibition of kinesin II reduces RNA localization to the Balbiani body, and the Xcat-2 RNA recruits kinesin II, indicating it plays a role in this process. The energy state of the cell regulates the rate of RNA transport to the Balbiani body, which involves kinesin II to some extent.
This research article discusses the lateral transfer of group I introns between red and brown algae. The researchers found that a group I intron inserted at position 516 in the small subunit rRNA contained a unique helical insertion in the P5b helix in both bangiophyte red algae and the brown alga Aureoumbra lagunensis, though the host cells are evolutionarily distant. They analyzed the secondary structure and phylogeny of these introns to understand their origin. The highly conserved structure of the insertion suggests it is important functionally, though its specific role is unknown. Their analyses support the scenario that the intron was laterally transferred between red and brown algae after their divergence, rather than being present in
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...Jonathan Eisen
This document describes research into using metagenomic data to search for novel lineages in the tree of life. The researchers developed methods to search for deeply branching small subunit rRNA genes in Global Ocean Sampling data, but were unable to robustly identify any novel lineages due to difficulties aligning short, distantly related sequences. They had more success identifying novel branches in the RecA and RpoB gene families. Some novel sequences likely come from unknown viruses or ancient paralogs, while others may represent truly novel cellular lineages not previously characterized. Metagenomic analysis offers potential for discovering major undiscovered branches in the tree of life.
This document summarizes the creation of a database of group I intron secondary structures derived from comparative sequence analysis. Over 200 publicly available group I intron sequences were analyzed to infer their secondary structures. The database aims to collect and refine group I intron structures as more sequences become available. It will make the secondary structure diagrams accessible online through file transfer protocol and the World Wide Web. The database currently contains 219 intron sequences classified into subgroups based on their phylogenetic diversity and cellular location.
This document discusses exon shuffling, which is a mechanism by which new genes can form through the rearrangement of exons from different genes. Exon shuffling was first proposed in 1978 and involves recombination within introns that allows exons to be assorted independently, generating new exon combinations. There are three main types of exon shuffling: exon duplication, insertion, and deletion. Exon shuffling generates genetic variation and mosaic proteins, and it has played a major role in evolution. The mechanisms involved are crossover during sexual recombination and transposon-mediated movements that can cut, paste, or copy and paste exons into new locations.
This document summarizes a study that identified novel molecules involved in axon-glial interactions during peripheral myelination. The researchers:
1) Developed a method to isolate projections ("pseudopods") that Schwann cells extend in response to signals from axons, allowing proteomic analysis of proteins specifically present at axon-glial contacts.
2) Identified major signaling networks and novel proteins, including members of the Prohibitin family, at the glial leading edge contacting axons.
3) Found that genetic deletion of Prohibitin-2 in mice impairs axon-glial interactions and myelination, validating its importance.
This novel method provides insights into molecular organization
1. The document discusses the nature and structure of genes based on research in microbiology and genetics.
2. It describes genes as units of heredity located on chromosomes that direct protein synthesis. Genes are made of DNA and contain multiple sites where mutations can occur.
3. Research has found that genes have a complex internal structure, with many subunits or sites arranged linearly along the DNA molecule. Mutations at different sites can result in different alleles or variants of a gene.
This document summarizes a study that characterized the ecdysone receptor (EcR) gene in the salmon louse (Lepeophtheirus salmonis), an economically important parasite in salmon farming. The researchers isolated and sequenced cDNA of the predicted L. salmonis EcR gene, which encoded a protein highly similar to other arthropod EcRs. In situ analysis showed the EcR transcript is expressed in ovaries, sub-cuticle, and oocytes of adult female lice. Knockdown of EcR using RNA interference terminated egg production, indicating it plays an important role in reproduction and oocyte maturation. This suggests disrupting EcR signaling may provide a way to control louse reproduction and infestation.
This document summarizes a study that mapped 72 dinucleotide microsatellite loci from Drosophila subobscura onto its polytene chromosomes using fluorescent in situ hybridization (FISH). The distribution of microsatellites was not uniform across chromosomes, with higher density in the sex chromosome than autosomes. Homologous regions were identified in D. pseudoobscura and D. melanogaster genomes, supporting conservation of chromosomal elements among Drosophila species but also intrachromosomal rearrangements within lineages. The lack of microsatellite repeats in homologous D. melanogaster sequences suggests convergent evolution for high density in the X chromosome distal region.
This document describes a study that analyzed the mitochondrial large subunit ribosomal RNA (LSU rRNA) sequences and secondary structures of 10 mollusk species. The researchers determined the complete nucleotide sequences and inferred the secondary structure models for each species. They found substantial length variation among taxa, with gastropods having the shortest lengths. Phylogenetic analysis supported monophyly of several taxa. Most notably, they discovered phylogenetic signal in the secondary structure of mollusk rRNA, with some gastropods uniquely lacking stem/loop structures, explaining much of the observed length variation.
This document discusses the organization and segregation of bacterial chromosomes using Escherichia coli as an example. It finds that: (1) two markers on the same chromosome arm colocalize in the same cell half, while markers on opposite arms are in opposite cell halves, (2) duplicated chromosome arms are usually oriented in a tandem repeat configuration, and (3) sister cells are not usually identical after cell division due to rearrangement of chromosome arms.
Gillespie J.J., Johnston J.S., Cannone J.J., and Gutell R.R. (2006).
Characteristics of the nuclear (18S, 5.8S, 28S and 5S) and mitochondrial (12S and 16S) rRNA genes of Apis mellifera (Insecta:Hymenoptera): structure, organization, and retrotransposable elements.
Insect Molecular Biology, 15(5):657-686.
The document describes research on fragmentation of the large subunit ribosomal RNA (LSU rRNA) gene in oyster mitochondrial genomes. Key findings include:
1) The LSU rRNA gene is split into two fragments separated by thousands of nucleotides in three species of oysters.
2) RT-PCR and EST analysis showed the two fragments are transcribed separately in Crassostrea virginica and are not spliced together.
3) Secondary structure models of the fragmented LSU rRNA genes were predicted for C. virginica, C. gigas, and C. hongkongensis based on comparative sequence analysis. This fragmentation represents a novel phenomenon in bilateral metazoan mitochondrial genomes.
1) The document summarizes recent research on the mechanism of ParA-mediated chromosome segregation in bacteria.
2) Key findings include that ParA forms a structure along the bacterial nucleoid that is involved in segregating specific chromosomal loci.
3) A recent study found that in Caulobacter crescentus, ParA forms a narrow structure along the long axis of the cell, and the ParB/parS complex follows the edge of a receding ParA structure, providing evidence this is the mechanism of segregation.
The document discusses mechanisms of prezygotic isolation between the corn- and rice-strains of the moth Spodoptera frugiperda to determine their relative importance and interactions. It investigates potential isolation due to host plant differentiation, differences in sexual communication, and allochronic differentiation in daily rhythms. The most consistent prezygotic barrier is allochronic differentiation, with genetic analysis identifying a major gene underlying the circadian differentiation between strains.
The document describes a meta-analysis of microbial community samples collected by the Earth Microbiome Project (EMP) that used coordinated protocols and analytical methods to explore patterns of diversity at an unprecedented scale. By tracking individual bacterial and archaeal ribosomal RNA gene sequences across multiple studies, the analysis resulted in both a reference database providing global context to DNA sequence data and an analytical framework for incorporating future study data to further characterize Earth's microbial diversity. The meta-analysis found that standardized environmental descriptors and new analytical methods, particularly using exact sequences instead of clustered operational taxonomic units, enabled comparisons across studies and exploration of large-scale ecological patterns.
This document summarizes a study that reconstructed 7,903 bacterial and archaeal genomes from over 1,500 public metagenomes. Key findings include:
- The genomes increase phylogenetic diversity of bacterial and archaeal trees by over 30% and provide first representatives for 17 bacterial and 3 archaeal candidate phyla.
- 245 genomes were recovered from the Patescibacteria superphylum.
- The genomes vary substantially in quality, with 43.5% considered near-complete, 43.8% medium quality, and 12.7% partial.
- The genomes expand representation of underrepresented phyla like Aminicenantes, Gemmatimonadetes, and Lentisphaera
The complete sequences of RNA 4 from cucumber mosaic virus (CMV) strains Ny (subgroup I) and Sn (subgroup II) were determined and compared to other known CMV RNA 4 sequences. The identification of a unique EcoRI site, present only in subgroup-II RNA 4 sequences, provides a simple method for classifying CMV isolates into subgroups I and II. Sequence variation was greater in the untranslated regions of RNA 4 than previously observed, with 74.9% identity between subgroups and 93.6% within subgroup II.
This study investigates what happens to proteins associated with DNA during chromosome translocation in Bacillus subtilis sporulation. Using fluorescent protein fusions and a mutant that forms two forespores, the authors show that RNA polymerase, chromosome remodeling proteins, and transcription factors are stripped off the chromosome as it is translocated into the forespores. Specifically, they demonstrate that a TetR-GFP fusion bound to an operator array is efficiently removed from the translocating DNA. Additionally, in vitro experiments indicate that the ATPase domain of SpoIIIE can displace RNA polymerase from DNA. These results suggest that SpoIIIE translocates naked DNA and strips associated proteins during chromosome transport, which may play a role in reprogramming gene
This document summarizes a study that used PCR and cloning to analyze the 16S rRNA genes present in a natural marine bacterioplankton population from the Sargasso Sea. Researchers constructed a library of 51 small-subunit rRNA genes and sequenced five unique genes. In addition to genes from known marine Synechococcus and SAR11 lineages, they identified two new classes of genes belonging to alpha- and gamma-proteobacteria, confirming that many planktonic bacteria have not been previously recognized by microbiologists.
RNA localization to the Balbiani body in Xenopus oocytes is regulated by the energy state of the cell and is facilitated by kinesin II. The rate of RNA accumulation in the Balbiani body depends on temperature and intracellular ATP concentration - increasing ATP concentration doubles the localization rate. Inhibition of kinesin II reduces RNA localization to the Balbiani body, and the Xcat-2 RNA recruits kinesin II, indicating it plays a role in this process. The energy state of the cell regulates the rate of RNA transport to the Balbiani body, which involves kinesin II to some extent.
This research article discusses the lateral transfer of group I introns between red and brown algae. The researchers found that a group I intron inserted at position 516 in the small subunit rRNA contained a unique helical insertion in the P5b helix in both bangiophyte red algae and the brown alga Aureoumbra lagunensis, though the host cells are evolutionarily distant. They analyzed the secondary structure and phylogeny of these introns to understand their origin. The highly conserved structure of the insertion suggests it is important functionally, though its specific role is unknown. Their analyses support the scenario that the intron was laterally transferred between red and brown algae after their divergence, rather than being present in
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...Jonathan Eisen
This document describes research into using metagenomic data to search for novel lineages in the tree of life. The researchers developed methods to search for deeply branching small subunit rRNA genes in Global Ocean Sampling data, but were unable to robustly identify any novel lineages due to difficulties aligning short, distantly related sequences. They had more success identifying novel branches in the RecA and RpoB gene families. Some novel sequences likely come from unknown viruses or ancient paralogs, while others may represent truly novel cellular lineages not previously characterized. Metagenomic analysis offers potential for discovering major undiscovered branches in the tree of life.
This document summarizes the creation of a database of group I intron secondary structures derived from comparative sequence analysis. Over 200 publicly available group I intron sequences were analyzed to infer their secondary structures. The database aims to collect and refine group I intron structures as more sequences become available. It will make the secondary structure diagrams accessible online through file transfer protocol and the World Wide Web. The database currently contains 219 intron sequences classified into subgroups based on their phylogenetic diversity and cellular location.
This document discusses exon shuffling, which is a mechanism by which new genes can form through the rearrangement of exons from different genes. Exon shuffling was first proposed in 1978 and involves recombination within introns that allows exons to be assorted independently, generating new exon combinations. There are three main types of exon shuffling: exon duplication, insertion, and deletion. Exon shuffling generates genetic variation and mosaic proteins, and it has played a major role in evolution. The mechanisms involved are crossover during sexual recombination and transposon-mediated movements that can cut, paste, or copy and paste exons into new locations.
Female mammals achieve dosage compensation by inactivating one of their two X chromosomes
during development, a process entirely dependent on Xist, an X-linked long noncoding
RNA (lncRNA). At the onset of X chromosome inactivation (XCI), Xist is up-regulated
and spreads along the future inactive X chromosome. Contextually, it recruits repressive
histone and DNA modifiers that transcriptionally silence the X chromosome. Xist regulation is
tightly coupled to differentiation and its expression is under the control of both pluripotency
and epigenetic factors. Recent evidence has suggested that chromatin remodelers accumulate
at the X Inactivation Center (XIC) and here we demonstrate a new role for Chd8 in Xist
regulation in differentiating ES cells, linked to its control and prevention of spurious
transcription factor interactions occurring within Xist regulatory regions. Our findings have a
broader relevance, in the context of complex, developmentally-regulated gene expression.
1. The document describes the development of a new single-cell RNA-seq method called Quartz-Seq that has higher reproducibility and sensitivity than existing methods.
2. Quartz-Seq can quantitatively detect various types of non-genetic cellular heterogeneity and can distinguish different cell types and cell cycle phases of a single cell type.
3. It can also comprehensively reveal gene expression heterogeneity between single cells of the same cell type in the same cell cycle phase.
This document summarizes research on the DNA translocase SpoIIIE in Bacillus subtilis. The key findings are:
1) SpoIIIE transports DNA directionally from the mother cell to the forespore during sporulation.
2) The g-domain of SpoIIIE is necessary for establishing directionality of DNA transport in vivo, but not for mechanical translocation in vitro.
3) SpoIIIE recognizes specific DNA sequences called SRS that are highly skewed on the B. subtilis chromosome, similar to how the related protein FtsK recognizes KOPS sequences. Interaction with SRS sequences regulates the direction of SpoIIIE-mediated DNA transport.
This document provides secondary structure diagrams for large subunit ribosomal RNA sequences from a variety of organisms. It summarizes 40 complete rRNA sequences and 5 partial sequences published as of 1987. The diagrams are presented in a standardized format based on Escherichia coli rRNA secondary structure. Many regions of unknown structure are indicated. The structures were determined using comparative sequence analysis to identify compensating base changes maintaining Watson-Crick base pairing across evolutionary distances. Limitations of this approach include requiring sequences that contain the structural feature and sufficient sequence variation within those sequences to determine the structure.
This document provides evidence that common machinery is utilized by the early and late RNA localization pathways in Xenopus oocytes. It presents four key findings: 1) Early and late pathway RNAs require the same short sequence motifs for localization. 2) Competition assays show early and late RNAs compete for common localization factors in vivo. 3) A late localization factor, Vg RBP/Vera, binds specifically to localization elements of early pathway RNAs. 4) Confocal imaging reveals early RNAs associate with microtubules, suggesting transport plays a role in both pathways. Together, these findings suggest the early and late pathways share basal localization factors throughout oogenesis.
1) The document discusses RNA interference (RNAi), which is a process by which double-stranded RNA regulates gene expression.
2) The discovery of RNAi was made in 1998 by Fire and Mello through experiments in C. elegans showing that double-stranded RNA could efficiently silence gene expression.
3) RNAi involves the RNA-induced silencing complex (RISC) which is activated by double-stranded RNA. RISC then uses one of the RNA strands as a template to find and cleave or degrade the matching mRNA, preventing its translation into protein.
Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for t...Guy Boulianne
This document analyzes the evolutionary history of SARS-CoV-2, the virus that causes COVID-19, using genomic data from related sarbecoviruses found in bats. The key points are:
- Sarbecoviruses undergo frequent recombination, exhibiting spatially structured genetic diversity in China. SARS-CoV-2 itself shows no evidence of being a recombinant of known sarbecoviruses.
- Bayesian analyses estimate the most recent common ancestor of SARS-CoV-2 and its closest known relative, RaTG13, existed between 1948-2009, indicating the lineage has been circulating unnoticed in bats for decades.
- While pangolins or other species may have facilitated transmission to
This document discusses the nature and structure of genes based on research evidence. It makes three key points:
1) Genes are units of heredity located on chromosomes that direct the synthesis of proteins. While contained in the nucleus of eukaryotic cells, some genes are also found in mitochondria, chloroplasts, and plasmids.
2) Genetic research shows the mechanism of transmitting genetic information from parents to offspring is fundamentally similar across life forms, though some variations exist. Nucleic acids, particularly DNA, carry this genetic information.
3) Individual gene loci are complex, composed of many linearly arranged sites where mutation and recombination can occur. The number of sites per locus appears to
The document discusses the complexity of eukaryotic genomes compared to prokaryotic genomes. While eukaryotes are more complex organisms, their larger genome sizes are not solely due to more genes. Eukaryotic genomes contain large amounts of non-coding DNA sequences, including introns within genes and repetitive sequences. A key discovery was that eukaryotic genes contain introns that are removed from mRNA by splicing. Introns account for much of the non-coding DNA in eukaryotic genomes.
1. Molecular phylogenetic analysis uses DNA, RNA, or protein sequences to reconstruct evolutionary relationships between organisms. The extent of differences between homologous sequences is used to measure divergence.
2. Key steps include deciding sequences to examine, determining sequences experimentally, aligning sequences to identify homologous residues, and comparing sequences to determine relationships and construct phylogenetic trees.
3. The 16S rRNA gene is often used because it is universally present and conserved enough to align while also containing rapidly and slowly evolving regions useful for relationships at different timescales.
The document discusses various aspects of genome organization, including:
1. Chromatin assembly begins with the incorporation of histone proteins to form nucleosomes, which are then folded and organized into higher order structures within the nucleus.
2. Genes can be split, overlapping, or pseudogenes. Split genes contain introns that are spliced out, while overlapping genes share nucleotide sequences. Pseudogenes are non-functional copies of genes.
3. Gene families consist of genes related by common ancestry that may be clustered or dispersed throughout the genome. Members can vary in sequence but often retain similar functions.
Muralidhara C., Gross A.M., Gutell R.R., and Alter O. (2011).
Tensor Decomposition Reveals Concurrent Evolutionary Convergences and Divergences and Correlations with Structural Motifs in Ribosomal RNA.
PLoS ONE, 6(4):e18768.
The document summarizes the latest research on localizing the proteins, RNA sites, and ligands on the ribosomal subunits of E. coli. It presents a consensus model integrating data from neutron scattering experiments and immune electron microscopy. For the 30S subunit, 13 proteins, RNA sites including the 3' and 5' ends of 16S RNA and modified nucleotides, and ligand binding sites for puromycin and tRNA are localized on models of the exterior and interface surfaces. The 50S subunit shape is also described. Overall, the summary provides an overview of the progress toward determining the structure and organization of components on the E. coli ribosome subunits.
The document discusses the origin and evolution of the ribosome. It finds:
1) There is no single self-folding RNA segment that defines the small subunit's decoding site, while the large subunit's peptidyl transfer center is defined by one self-folding RNA segment.
2) The proteins contacting the small subunit's decoding site use universally alignable sequence blocks, while the large subunit's contact proteins use bacterial- or archaeal-specific blocks.
3) These differences support an earlier origin for the large subunit's peptidyl transfer center, with the small subunit's decoding site evolving later as an addition to the ribosome. The implications are that a single self-folding
Similar to Gutell 086.bmc.evol.biol.2003.03.07 (20)
This document summarizes Carl Woese's contributions to science, particularly his discovery of the third domain of life (Archaea) through analysis of rRNA sequences. It describes how his work established the use of comparative analysis to determine rRNA secondary structure and identify structural motifs. It highlights that he envisioned comparative analysis providing details about RNA structure and energetics. The summary discusses Woese's seminal concepts regarding the need for a universal phylogenetic framework and how analysis of rRNA satisfied criteria to reconstruct evolutionary relationships across all life.
Gutell R.R. (2013).
Comparative Analysis of the Higher-Order Structure of RNA.
in: Biophysics of RNA Folding. Volume editor: Rick Russell. Series title: Biophysics for the Life Sciences. Series editors: Norma Allewell, Ivan Rayment, Bertrand Garcia-Moreno, Jonathan Dinman, and Michael McCarthy. pp. 11-22. Publisher: Springer, New York, NY.
Gardner D.P., Xu W., Miranker D.P., Ozer S., Cannone J.J., and Gutell R.R. (2012).
An Accurate Scalable Template-based Alignment Algorithm.
Proceedings of 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2012), Philadelphia, PA. October 4-7, 2012. IEEE Computer Society, Washington, DC, USA. pp. 237-243.
Lee J.C. and Gutell R.R. (2012).
A Comparison of the Crystal Structures of the Eukaryotic and Bacterial SSU Ribosomal RNAs Reveals Common Structural Features in the Hypervariable Regions.
PLoS ONE, 7(5):e38203.
Gardner D.P., Ren P., Ozer S., and Gutell R.R. (2011).
Statistical Potentials for Hairpin and Internal Loops Improve the Accuracy of the Predicted RNA Structure.
Journal of Molecular Biology, 413(2):473-483.2011. pp 15-22.
Ozer S., Doshi K.J., Xu W., and Gutell R.R. (2011).
rCAD: A Novel Database Schema for the Comparative Analysis of RNA.
7th IEEE International Conference on e-Science, Stockholm, Sweden. December 5-8, 2011. pp 15-22.
Jiang Y., Xu W., Thompson L.P., Gutell R., and Miranker D. (2011).
R-PASS: A Fast Structure-based RNA Sequence Alignment Algorithm.
Proceedings of 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2011), Atlanta, GA. November 12-15, 2011. IEEE Computer Society, Washington, DC, USA. pp. 618-622.
Xu W., Wongsa A., Lee J., Shang L., Cannone J.J., and Gutell R.R. (2011).
RNA2DMap: A Visual Exploration Tool of the Information in RNA's Higher-Order Structure.
Proceedings of 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2011), Atlanta, GA. November 12-15, 2011. IEEE Computer Society, Washington, DC, USA. pp. 613-617.
Xia Z., Gardner D.P., Gutell R.R., and Ren P. (2010).
Coarse-Grained Model for Simulation of RNA Three-Dimensional Structures.
The Journal of Physical Chemistry B, 114(42):13497-13506.
Mueller U.G., Ishak H., Lee J.C., Sen R., and Gutell R.R. (2010).
Placement of attine ant-associated Pseudonocardia in a global phylogeny (Pseudonocardiaceae, Actinomycetales): a test of two symbiont-association models.
Antonie van Leeuwenhoek International Journal of General and Molecular Microbiology, 98(2):195-212.
Theriot E.C., Cannone J.J., Gutell R.R., and Alverson A.J. (2009).
The limits of nuclear encoded SSU rDNA for resolving the diatom phylogeny.
European Journal of Phycology, 44(3):277-290.
Wu J.C., Gardner D.P., Ozer S., Gutell R.R. and Ren P. (2009).
Correlation of RNA Secondary Structure Statistics with Thermodynamic Stability and Applications to Folding.
Journal of Molecular Biology, 391(4):769-783.
Xu W., Ozer S., and Gutell R.R. (2009).
Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA.
21st International Conference on Scientific and Statistical Database Management. June 2-4, 2009. Springer-Verlag. pp. 200-216.
Chen Y.P., Evans J.D., Murphy C., Gutell R., Zuker M., Gundersen-Rindal D., and Pettis J.S. (2009).
Morphological, Molecular, and Phylogenetic Characterization of Nosema cerenae, a Microsporidian Parasite Isolated from the European Honey Bee, Apis mellifera.
The Journal of Eukaryotic Microbiology, 56(2):142-147.
Maddison D.R., Moore W., Baker M.D., Ellis T.M., Ober K.A., Cannone J.J., and Gutell R.R. (2009).
Monophyly of terrestrial adephagan beetles as indicated by three nuclear genes (Coleoptera: Carabidae and Trachypachidae).
Zoologica Scripta, 38(1):43-62.
Chandramouli P., Topf M., Ménétret J.-F., Eswar N., Cannone J.J., Gutell R.R., Sali A., and Akey C.W. (2008).
Structure of the Mammalian 80S Ribosome at 8.7 Å Resolution.
Structure, 16(4):535-548.
This document describes a new method called BlockMSA for performing local multiple sequence alignment (MSA) of non-coding RNA sequences. BlockMSA uses a biclustering approach that simultaneously clusters sequences and identifies conserved subsequences within the clusters. The authors test BlockMSA on benchmark RNA datasets and two large biological datasets, finding it outperforms other MSA tools for larger problems with highly variable sequences. BlockMSA is able to scale to larger datasets while identifying functionally conserved regions missed by other methods.
Lee C.-Y., Lee J.C., and Gutell R.R. (2007).
Networks of interactions in the secondary and tertiary structure of ribosomal RNA.
Physica A, 386(1):564-572.
Weinstock et al. (81 authors), Gillespie J.J., Cannone J.J., Gutell R.R., et al. (100 authors) (2006).
Insights into social insects from the genome of the honeybee Apis mellifera.
Nature, 443(7114):931-949.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
2. BMC Evolutionary Biology 2003, 3 http://www.biomedcentral.com/1471-2148/3/7
Page 2 of 13
(page number not for citation purposes)
archaeal and bacterial domains due to strong selection for
compact genomes. Eukaryotes have maintained their in-
trons because they confer the capacity to create evolution-
ary novelty through exon shuffling [2]. The introns-early
theory predicts that at least some of the extant eukaryotic
introns are direct descendants of the primordial sequences
in the LUCA [2–5]. The alternate view, "introns-late", sug-
gests that the last common ancestor was intron-free and
that spliceosomal introns have originated in eukaryotes
from recent invasions by autocatalytic RNAs (e.g., group II
introns) or transposable elements [6–9]. The introns-late
view is compatible with the now-established role of exon
shuffling in creating eukaryotic genes [10]. It is the ancient
origin of introns that is primarily called into question.
In this study, we analyzed the putative spliceosomal in-
trons in Euascomycetes (Ascomycota) small subunit
(SSU) and large subunit (LSU) ribosomal (r)RNA genes
[11,12] to understand how spliceosomal introns of a re-
cent origin (i.e., introns-late) spread to novel genic sites.
Statistical methods were used to study the exon sequences
flanking 49 different spliceosomal intron insertion sites in
Euascomycetes rRNA and show that the introns interrupt
the G – intron – G (hereafter, the intron position is shown
with –) proto-splice site that pre-existed in the coding re-
gion. A proto-splice site is a short sequence motif that has
a high affinity for splicing factors and is a preferred site of
intron insertion. The proto-splice site (e.g., MAG – R in
pre-mRNA genes [13]) need not be perfectly conserved in
organisms but is rather a set of nucleotides that, with
some statistical uncertainty, shows a non-random se-
quence pattern at sites flanking introns. It is also conceiv-
able that proto-splice sites may differ between lineages
reflecting, for example, differences in how the spliceo-
some recognizes introns (e.g., exon definition hypothesis
[14,15]).
Our analysis using information theory [16] shows that the
significant information is found in exons flanking rRNA
spliceosomal introns. We also confirm that introns are not
randomly distributed in the primary and secondary struc-
ture of the SSU and LSU rRNA and that the group I introns
are generally found in the highly conserved (i.e., function-
ally important) regions of these genes, whereas the spli-
ceosomal introns tend to occur in regions of the rRNA that
are not as well conserved or are not directly involved in
protein synthesis.
Results
Analysis of Euascomycetes rRNA Spliceosomal Introns
With our data set of 49 (two diatom-specific introns were
excluded from this analysis) different spliceosomal intron
sites in the SSU and LSU rRNAs of Euascomycetes (align-
ment available at http://www.rna.icmb.utexas.edu/ANAL-
YSIS/FUNGINT/ (for registration details please see http://
www.rnq.icmb.utexas.edu/cgi-access/access/locked.cgi),
we first tested for the presence of a proto-splice site flank-
ing the introns [12]. In this chi-square analysis, the null
hypothesis specified that nucleotide usage in 50 nt of
exon sequence upstream and downstream of the different
intron insertion sites was random and dependent on the
nucleotide composition of Euascomycetes SSU and LSU
rRNA sequences in general. Previously, we found evidence
for the proto-splice site, AG – G, in Euascomycetes rRNA
with the greatest support for the G nucleotides (p < 0.001
[12]). The addition of 18 new Euascomycetes SSU and
LSU rRNA insertion sites in the new analysis supports this
finding (see Fig. 1) but shows strongest evidence for the
proto-splice site to encode G – G (p < 0.01 [three degrees
of freedom]), with the Gs occurring at frequencies of 65%
and 61% in the Euascomycetes rRNAs.
To address the possibility that we were counting as inde-
pendent events cases where introns may have had a single
origin but then spread into neighboring sites through in-
tron sliding [e.g., [11]], we reran the chi-square analysis
after removal of all introns that were within 5 nt of each
other. This substantially reduced our data set to 30 introns
at the following sites; SSU – 265, 297, 330, 390, 400, 514,
674, 882, 939, 1057, 1071, 1083, 1226, 1514; LSU – 678,
711, 775, 824, 830, 858, 978, 1024, 1054, 1091, 1098,
1849, 1903, 1929, 2076, 2445, but addressed independ-
ence of intron insertion events. This data set showed sig-
nificant support for the AG – G proto-splice site with the
A, G, and G, occurring at frequencies of 50% (chi-square
= 12.56, p = 0.0055), 67% (chi-square = 24.48, p <
0.0000), and 67% (chi-square = 25.35, p < 0.0000), re-
spectively. The AG – G and G – G proto-splice sites oc-
curred in 9 and 15 of these sequences, respectively. The
increase in signal of the AG – G proto-splice site with re-
moval of neighboring (potentially slid) introns is consist-
ent with the idea that intron sliding may over time
obscure the targets originally used for insertion. It should
be noted, however, that this procedure was done by re-
taining the most 5' intron in each set of neighboring inser-
tions and this may not represent the original intron.
Determining the role of intron sliding in creating new lin-
eages of insertions will require a fully resolved Euasco-
mycetes phylogeny (not yet available) that can be used to
map intron gains, losses, and potential slides. The present
data for the 300 – 337 spliceosomal introns, for example,
when mapped on the Euascomycetes tree published in
Bhattacharya et al. [11] shows these introns to be distrib-
uted in at least 4 divergent clades within the Lecanoro-
mycetes. These introns may be related through the sliding
of an ancestral intron but without the presence of one of
these insertions in a non-Euascomycetes fungus or a ro-
bust phylogeny of this lineage, it will not be possible to
unambiguously identify the original site of insertion.
3. BMC Evolutionary Biology 2003, 3 http://www.biomedcentral.com/1471-2148/3/7
Page 3 of 13
(page number not for citation purposes)
Next, we used the "Sequence Logo" method developed by
Stephens and Schneider [16] and the expression of Hertz
and Stormo [17] to determine the information content in
the Euascomycetes rRNA introns and exon flanking se-
quence. The logo of a subset of 43 of the original 49 spli-
ceosomal introns for which we had complete 50 nt of
upstream and 50 nt of downstream exon sequence is
shown in Fig. 1. This analysis shows that many of the in-
formative sites encode purines (in particular Gs) and that
the region contains a total of 6.91 bits. In general, the in-
formation content is highest at the site of intron insertion
and the regions within a close proximity (about 10 nt),
and decreases as one moves away from this site, with the
exception of a significant U+G peak at -48 and C-richness
around +40 (Fig. 1). In comparison, the mean value
(100,000 iterations) for the total bits of information in a
100 nt random sequence data set was 5.68 bits. The 95%
quantile for this distribution was 6.47 bits indicating that
the Euascomycetes rRNA exons encode significant infor-
mation (p < 0.001). Logo analysis of the reduced set of 30
non-neighboring spliceosomal introns was consistent
with this analysis but showed a stronger signal at the pro-
to-splice site (A = 0.31 bits, G = 0.52 bits, G = 0.59 bits).
The finding of significant information in the flanking ex-
ons suggests that some regulatory regions (i.e., exonic
splicing enhancers, ESEs [18,19] may exist in these
sequences.
Sliding Window Analysis of Euascomycetes Spliceosomal
Intron Insertion Sites
Intrigued by the finding of G-richness in the upstream
exon region flanking introns (see -7 to -17 in Fig. 1), we
determined the association of G-rich regions in 1434 fun-
gal SSU rRNAs and 880 fungal LSU rRNAs with all report-
ed spliceosomal introns in these genes. The G-frequencies
were calculated at each rRNA site and are plotted as the
green circles in Fig. 2. The SSU (1800 nt [GenBank
U53879]) and LSU (3554 nt [U53879]) rRNAs from S.
cerevisiae were used as the reference sequence for these
alignments. The raw G-frequencies were smoothed (blue
curve in Fig. 2), using the loess local regression method
[20], and smoothing windows of size 50 nt or 100 nt, pri-
or to analyzing the intron-G-frequency association. The
positions of rRNA spliceosomal intron positions are
shown as red lines in Fig. 2. From this analysis we can ob-
serve that regions of intron insertion strongly associate
with high G-frequencies in both the SSU and LSU rRNA.
The association is stronger in the 50 nt (i.e., 25 nt exon se-
quence – intron insertion site – 25 nt exon sequence) win-
dow of weighted averages, suggesting that this window
size includes most of the exon signal. However, the asso-
ciation is still apparent in the 100 nt window, in particular
for the SSU rRNA.
Our analyses show that the average G-frequency at the 25
intron sites using the fitted curve in the SSU rRNA is 0.34,
Figure 1
Logo analysis of 50 nt upstream and downstream of insertion sites of 43 different spliceosomal rRNA introns. The information
content of the 2 Gs of the intron proto-splice site is shown as is a line at p = 0.05 (95% quantile) that is based on simulations
using random sequence data. This exon region contains a total of 6.91 bits of information.
Bits
p = 0.05
Intron
0.45 0.47
Total = 6.91
-10-20-30-40 +1-50 +10 +20 +30 +40 +50-1
0.00
0.10
0.40
Downstream = 3.38Upstream = 3.53
0.20
0.30
4. BMC Evolutionary Biology 2003, 3 http://www.biomedcentral.com/1471-2148/3/7
Page 4 of 13
(page number not for citation purposes)
whereas the average G-frequency at the 24 intron sites us-
ing the fitted curve in the LSU rRNA is 0.32. To test the sig-
nificance of this result with the 25 intron sites and the G-
contents in the LSU rRNA, we randomly selected 25 sites
from the 3554 nt of rRNA and computed the average of
their G-frequencies. We repeated this process 10,000
times and plotted the distribution of these average G-fre-
quencies (results not shown). The observed average G-fre-
quency at the LSU intron sites was significantly greater
than that in the simulated data (p = 0.0268). Similarly, we
carried out the simulation-based test for the SSU rRNA in-
tron sites. In these 10,000 replications, no average from
the randomly generated sites was greater than 0.34. Thus,
the p-value is less than 0.0001, reinforcing the remarkable
association of SSU rRNA introns and G-rich regions ap-
parent in Fig. 2. Taken together, our results suggest that
Euascomycetes rRNA spliceosomal introns are fixed at the
G – G or AG – G proto-splice site that is found in G-rich
regions.
Intron Positions on rRNA Conservation Diagrams
To understand the association of introns with highly con-
served regions in the rRNAs, we mapped the intron posi-
tions on SSU and LSU rRNA conservation diagrams of the
three phylogenetic domains of life and the two eukaryotic
organelles (3Dom2O) and the nuclear-encoded rRNA
genes in the three phylogenetic domains (3Dom). This
analysis shows a significant association of group I intron
sites with rRNA sites that are 98–100% conserved within
both 3Dom2O and 3Dom LSU rRNA analyses (see Table
1). Only in the 3Dom analysis for SSU rRNA was the asso-
ciation weakly non-significant (p = 0.0577). The observed
Figure 2
The distribution of SSU and LSU rRNA spliceosomal introns relative to the G-frequency in these genes. The raw G-frequencies
are shown in the green circles, the smoothed loess curves for 50 nt and 100 nt smoothing windows are shown with the blue
lines, and the positions of introns are shown with the vertical red lines.
rRNA site
rRNA site
rRNA site
rRNA site
G-frequencyG-frequency
G-frequencyG-frequency
SSU rRNA (50) SSU rRNA (100)
LSU rRNA (50) LSU rRNA (100)
5. BMC Evolutionary Biology 2003, 3 http://www.biomedcentral.com/1471-2148/3/7
Page 5 of 13
(page number not for citation purposes)
association of highly conserved rRNA and group I intron
sites is, therefore, unlikely to have occurred by chance
alone. For rRNA spliceosomal introns, however, the asso-
ciation of conserved rRNA and introns sites is less clear.
Within the 3Dom2O analysis of SSU rRNA, spliceosomal
intron positions vary significantly from the null model
but in the direction of fewer than expected introns at the
most highly conserved sites, whereas within the 3Dom
analysis of LSU rRNA no significant difference is found (p
= 0.0969). The 3Dom2O LSU rRNA and 3Dom SSU rRNA
analyses both show an enrichment of spliceosomal in-
trons at the highly conserved genic sites (primarily in sites
conserved between 90–97%). Taken together, our analy-
ses suggest that group I introns are fixed primarily in the
most highly conserved rRNA sites when analyzed in the
3Dom2O or 3Dom data sets, whereas spliceosomal in-
trons are not strongly associated with highly conserved
rRNA sites.
To address more directly the relationship between Euasco-
mycetes spliceosomal introns and rRNA conservation
patterns, we positioned these introns on a conservation
diagram generated from 1042 fungal SSU rRNA sequences
(see Fig. 3). This analysis showed that 19 of 24 fungal SSU
rRNA spliceosomal introns follow sites that are conserved
in more than 95% of the fungal sequences (1114 nt in this
class), one intron follows a site that is 90–95% conserved
(149 nt in this class), two introns follow sites that 80–
89% conserved (134 nt in this class), and two introns fol-
low sites that <80% conserved (402 nt in this class). More
importantly, inspection of the 1800 nt alignment of SSU
rRNAs and 3554 nt of LSU rRNAs of all fungi, of fungi
containing spliceosomal introns, and of fungi lacking
spliceosomal introns shows that most of the introns are
inserted between nucleotides that are 99–100% conserved
(whether they encode G – G or not) in taxa containing in-
trons and sister groups lacking introns (Table 2). This
result provides strong support for the hypothesis that
Euascomycetes spliceosomal introns are fixed in a proto-
splice site that pre-dates intron insertion. Beyond this pat-
tern of conservation, the G-rich regions in the neighbor-
hood of introns are also often highly conserved among all
fungi (see Fig. 3). Most of these Gs are in sites that are
>95% conserved in all fungal SSU rRNAs, suggesting that
their existence also pre-dates intron insertion.
However, several exceptions to this general pattern merit
closer inspection. The upstream nucleotide at the SSU
rRNA 297 site (369 in the S. cerevisiae gene), for example,
occurs at a frequency of 63.9% U in taxa lacking introns
but at a frequency of 97.8% U in taxa containing introns.
On the surface, this suggests that the site may have under-
Table 1: Chi-Square Test of Association of Spliceosomal and Group I Introns with Conserved rRNA Sites
98–100% 90–97% 80–89% <80% Total P-value
3Dom2O: SSU rRNA
sites 178 175 116 1073 1542 -
group I 11 [4.85] 5 [4.77] 5 [3.16] 21 [29.23] 42 0.0106*
spliceosomal 0 [3.00] 3 [2.95] 8 [1.96] 15 [18.09] 26 <0.0000*
3Dom2O: LSU rRNA
sites 150 203 168 2383 2904 -
group I 10 [2.12] 4 [2.82] 4 [2.37] 23 [33.64] 41 <0.0000*
spliceosomal 3 [1.29] 8 [1.75] 2 [1.45] 12 [20.51] 25 <0.0000*
3Dom: SSU rRNA
sites 355 156 80 951 1542 -
group I 17 [9.67] 3 [4.25] 1 [2.18] 21 [25.90] 42 0.0577
spliceosomal 4 [5.99] 9 [2.63] 2 [1.35] 11 [16.04] 26 0.0003*
3Dom: LSU rRNA
sites 595 349 283 1677 2904 -
group I 17 [8.40] 5 [4.93] 1 [4.00] 18 [23.68] 41 0.0059*
spliceosomal 10 [5.12] 3 [3.00] 1 [2.44] 11 [14.44] 25 0.0969
Column headings: Introns are positioned relative to SSU and LSU rRNA sites for positions with a nucleotide in more than 95% of the sequences
that are 1) 98–100%, 2) 90–97%, 3) 80–89%, and 4) either <80% conserved or positions that are present in <95% of the sequences in genes from;
3Dom2O, the three phylogenetic domains and two organelles; 3O, the three phylogenetic domains. Sites are the number of rRNA positions fol-
lowed by group I and spliceosomal introns in each conservation class and the number of observed and expected introns (in brackets [under a null
model of random insertion]) is shown for each gene. The P-values for each analysis are also shown. Significant probability values are marked with an
asterisk.
6. BMC Evolutionary Biology 2003, 3 http://www.biomedcentral.com/1471-2148/3/7
Page 6 of 13
(page number not for citation purposes)
Figure 3
Distribution of Euascomycetes spliceosomal introns on a conservation diagram of fungal SSU rRNA overlaid on a secondary
structure model of the Saccharomyces cerevisiae SSU rRNA. Spliceosomal introns are shown in large text with arrows denoting
their positions. Positions with nucleotides in more than 95% of the 1042 sequences that were studied are shown as following:
upper case, conserved at ≥ 95%, lower case, conserved at 90–94%, filled circle, conserved at 80–89%, and open circle, con-
served at < 80%. Other regions are denoted as arcs. The numbers at the arcs show the upper and lower number of nucle-
otides that are found in these variable regions. The boxed regions are G-rich sequences upstream of intron insertion sites.
Boxed filled circles indicate that the most frequent nucleotide at this site was a G in our alignment of 1434 fungal rRNAs that
included both intron-containing and intron-less taxa.
7. BMC Evolutionary Biology 2003, 3 http://www.biomedcentral.com/1471-2148/3/7
Page 7 of 13
(page number not for citation purposes)
Table 2: Frequencies of Fungal Nucleotides at Sites of Spliceosomal Intron Insertion
Intron Position Insertion Site
Ec Sc All + Int - Int 5'-nt 3'-nt All + Int - Int # Int
265 336 99.8 100.0 99.8 G G 95.0 85.7 95.3 4
297 369 65.3 97.8 63.9 U A 100.0 100.0 99.5 5
298 370 100.0 100.0 99.5 A G 100.0 100.0 100.0 2
299 371 100.0 100.0 100.0 G G 99.8 100.0 99.8 12
300 372 99.8 100.0 99.8 G G 99.7 100.0 99.6 1
330 402 99.1 100.0 99.1 C G 99.5 100.0 99.4 15
331 403 99.5 100.0 99.4 G G 99.7 100.0 99.7 8
332 404 99.7 100.0 99.7 G C 99.6 100.0 99.6 1
333 405 99.6 100.0 99.6 C U 91.5 100.0 91.1 1
337 409 76.7 87.0 76.3 C A 99.8 100.0 99.8 1
390 461 99.3 97.5 99.4 G G 93.1 100.0 92.9 2
393 464 99.6 100.0 99.6 A G 99.6 100.0 99.5 10
400 471 99.6 100.0 99.6 A U 97.6 95.0 97.7 1
514 561 99.5 100.0 99.4 G G 99.5 100.0 99.4 1
674 885 99.7 100.0 99.7 G U 99.8 100.0 99.8 4
882 1106 67.3 75.8 67.0 U G 80.5 84.9 80.3 1
883 1107 80.5 84.9 80.3 G G 99.4 100.0 99.4 6
939 1164 99.2 97.1 99.3 G G 98.7 97.1 98.8 8
1057 1277 99.8 100.0 99.8 G G 98.8 100.0 98.7 1
1071 1291 93.4 100.0 93.3 G G 99.4 100.0 99.4 1
1083 1303 99.5 100.0 99.5 U G 99.6 100.0 99.6 1
1226 1459 99.1 100.0 99.1 C A 99.8 100.0 99.8 2
1229 1462 99.7 100.0 99.7 G C 99.4 100.0 99.3 8
1514 1777 98.6 100.0 98.5 G G 91.1 100.0 90.9 2
678 967 99.9 100.0 99.9 G A 99.9 97.0 100.0 16
681 970 99.7 97.0 99.9 G G 98.1 93.9 98.3 1
711 1000 99.6 97.0 99.7 G A 98.2 81.8 99.0 3
775 1065 99.8 100.0 99.8 G G 100.0 100.0 100.0 1
776 1066 100.0 100.0 100.0 G G 100.0 100.0 100.0 5
777 1067 100.0 100.0 100.0 G G 100.0 100.0 100.0 1
780 1070 100.0 100.0 100.0 G A 100.0 100.0 100.0 1
783 1073 100.0 100.0 100.0 A G 100.0 100.0 100.0 2
784 1074 100.0 100.0 100.0 G A 100.0 100.0 100.0 3
786 1076 99.8 100.0 99.8 C U 95.8 91.2 96.1 1
787 1077 95.8 91.2 96.1 U A 98.7 91.2 99.1 1
824 1114 100.0 100.0 100.0 U C 100.0 100.0 100.0 1
830 1120 99.8 100.0 99.8 A G 99.8 100.0 99.8 1
858 1151 100.0 100.0 100.0 G G 100.0 100.0 100.0 2
978 1306 99.3 95.7 99.6 G G 100.0 100.0 100.0 3
1024 1351 98.6 100.0 98.5 A G 99.3 100.0 99.3 1
1054 1387 97.6 100.0 97.4 G G 100.0 100.0 100.0 4
1091 1424 100.0 100.0 100.0 G U 99.3 100.0 99.2 1
1093 1426 100.0 100.0 100.0 G U 100.0 100.0 100.0 1
1098 1431 100.0 100.0 100.0 A A 99.3 100.0 99.2 1
1849 2367 100.0 100.0 100.0 U G 100.0 100.0 100.0 1
1903 2404 97.3 100.0 97.1 G G 100.0 100.0 100.0 1
1929 2430 97.3 100.0 97.1 G G 97.3 100.0 97.1 1
2076 2576 100.0 100.0 100.0 G A 100.0 100.0 100.0 1
2445 2972 100.0 100.0 100.0 G G 100.0 100.0 100.0 1
Column headings: Intron Position, the sites of spliceosomal intron insertion in the SSU and LSU (below the broken line) rRNA genes. The homol-
ogous intron sites in the Escherichia coli (Ec, GenBank #J01695) and Saccharomyces cerevisiae (Sc, GenBank #U53879) genes are shown. The 5' and 3'
nucleotides (5'-nt, 3'-nt) flanking the intron insertion sites (Insertion Site), the frequency of these nucleotides in the alignment of all fungal SSU and
LSU rRNAs (All, 1434 and 880 sequences, respectively), of fungi containing spliceosomal introns (+ Int, 73 and 40 sequences, respectively), and of
fungi lacking spliceosomal introns (- Int, 1361 and 840 sequences, respectively), and the number of taxa containing introns at each site (# Int) are
shown.
8. BMC Evolutionary Biology 2003, 3 http://www.biomedcentral.com/1471-2148/3/7
Page 8 of 13
(page number not for citation purposes)
gone selective pressure, post-intron insertion, towards a
high frequency of Us. Analysis of the SSU rRNA alignment
shows, however, that the 5 taxa containing the 297 intron
share a U at this site with virtually all other intron-con-
taining fungi that lack this particular insertion. This sug-
gests that the high U frequency in the intron-containing
fungi is a synapomorphy for the monophyletic intron-
containing Euascomycetes and is not an outcome of the
297 intron insertion. A similar result is found when the
proto-splice site is checked in all taxa containing introns
with those lacking any particular intron.
Intron Positions on the rRNA Primary Structure
The positions of spliceosomal, group I, group II, and ar-
chaeal introns were included on a line representing the
primary structures of E. coli SSU and LSU rRNA (Fig. 4A).
The intron distributions were then studied to determine if
they differ significantly from the null hypothesis of a "bro-
ken-stick" distribution [21,22]. This resource division
model, which has been used extensively to test hypotheses
about patterns of species abundance [e.g., [23]], specifies
a distribution that arises when a "stick" of unit length is
divided into n number of events with these events scat-
tered with a uniform probability distribution. The events
break the stick into n + 1 intervals which can then be stud-
ied to determine if they depart from uniformity in the
probability density along the stick. Departure will tend to
make the longest intervals longer and the shortest inter-
vals shorter [24]. In our analyses, the rRNA genes were the
sticks and the intron insertion sites were the events. The
metric used to compare the null (i.e., broken-stick) and
observed distribution was the standard deviation (SD)
from the mean interval length; i.e., lower SDs mean the
more uniform are the lengths of the intervals [e.g., [25]].
Computer simulations were used to determine the level of
significance at which the observed distributions could be
distinguished from those produced by the broken-stick
model.
A cursory analysis of the data suggests that the intron dis-
tribution in both SSU and LSU rRNAs is significantly clus-
tered (in particular, the LSU rRNA) and the statistical
analysis bears this out. The observed standard deviations
for all the analyses (i.e., all the introns together or the spli-
ceosomal and group I introns individually) are signifi-
cantly different from the expectations of the broken stick
model. The departure from the null model is particularly
striking for the LSU rRNA, suggesting that the introns in
this gene are more strongly clustered than in the SSU
rRNA (see Fig. 4A,4B).
Discussion
In this paper, we have focused on spliceosomal introns in
the Euascomycetes fungi to address how introns spread in
rRNA (and perhaps in all) genes. Potentially, the rRNA
spliceosomal introns offer three major advantages over
pre-mRNA introns that are relevant to understanding in-
tron spread: 1) the rRNA spliceosomal introns have been
inserted recently within the Euascomycetes [11,12]. In
contrast, the sporadic distribution of pre-mRNA introns
in different eukaryotes, and the uncertainty about the
phylogenetic relationship of these lineages within the eu-
karyotic radiation often make it difficult to determine un-
ambiguously which spliceosomal introns are of early or
late origins [9]. 2) rRNAs have well-characterized second-
ary and tertiary structures [e.g., [26,27]]; therefore, if the
intron distribution reflects in some way RNA-folding
patterns, then one can detect this by mapping the intron
distribution on rRNA at the primary, secondary, and terti-
ary structure levels [28]. 3) rRNA genes do not encode pro-
teins; therefore, the Euascomycetes intron distribution
will not reflect constraints on sites of intron insertion due
to codon structure. In contrast, the role of intron phase
(i.e., between codons [phase 0] or within codons [phases
1,2]) and exon symmetry in explaining pre-mRNA intron
distribution remains a controversial and unresolved issue
in spliceosomal intron evolution [e.g., [29,30]].
The proto-splice site bounding rRNA introns
Our analysis of 100 nt of exon sequence flanking spliceo-
somal introns in Euascomycetes rRNA shows significant
support for a G – G or AG – G proto-splice site (Fig. 1).
The proto-splice site pre-dates intron insertion because it
is highly conserved in the Euascomycetes rRNAs in both
intron-containing and intron-less taxa (see Fig. 3, Table
2). This finding is not anomalous because analysis of exon
sequences surrounding the total set of introns in S. cerevi-
siae pre-mRNA genes shows a preference for AAAG at the
5' splice site [31]. The final G in this motif has been estab-
lished as significantly conserved in yeast [32]. The se-
quence at the proximal 5' exon region is required for
interactions with the spliceosomal small nuclear ribonu-
cleoprotein particle U1 [19]. Our data are, therefore, con-
sistent with present understanding of yeast pre-mRNA
splicing. Furthermore, taking at least 40% as the mini-
mum for a consensus nucleotide in the proto-splice site,
Long et al. [33] have shown that this region in six model
eukaryotes often encode the AG – G or G – G motif. In hu-
mans, for example, the nucleotides in the AG – G motif
are found in abundances of 61%, 81%, and 56%, respec-
tively. The finding of a similar motif in rRNA genes for
which there is neither a requirement to incorporate amino
acid phase distribution nor to invoke exon-shuffling pro-
vides support for the idea that a proto-splice site for intron
insertion not only exists in Euascomycetes rRNA but also
may exist in pre-mRNA genes. The introns appear to be in-
serted into some of the most conserved regions of Euasco-
mycetes SSU rRNA, as evident in the fungal conservation
diagram (Fig. 3) and the analysis of fungal nucleotide
frequencies at the 5' and 3' nt flanking introns (Table 2).
9. BMC Evolutionary Biology 2003, 3 http://www.biomedcentral.com/1471-2148/3/7
Page 9 of 13
(page number not for citation purposes)
However, the spliceosomal introns do not map to the
most conserved positions in the 3Dom or 3Dom2O rRNA
datasets (Table 1).
Furthermore, exon sequences, outside of the proto-splice
site, may be required for splice site recognition by the spli-
ceosome [34–38]. Our rRNA analyses suggest that G-rich
regions in the neighborhood (often upstream) of the in-
tron insertion sites may be potential ESEs. The exon con-
text may, therefore, play a fundamental role in controlling
intron splicing and, thus, sites of intron fixation. This idea
has growing support in the literature [e.g., [19,38,39]].
Combined with this observation is the finding that rRNA
spliceosomal introns map primarily to regions in the in-
terface surface of the SSU and LSU ribosome [28]. These
sites presumably facilitate intron splicing during ribos-
ome biogenesis.
We find that in contrast to the spliceosomal introns in rR-
NA, group I intron insertion sites show a stronger positive
association with highly conserved rRNA regions (Fig. 3,
Table 2), including those that bind tRNA [28], and are
more clustered than are spliceosomal introns in the rRNA
primary structure (Fig. 4). This suggests that group I intron
fixation may be even more highly constrained by the exon
context than are spliceosomal introns. A possible explana-
tion for this observation is that group I introns are more
dependent on specific upstream and downstream exon se-
Figure 4
Analysis of rRNA intron distribution. A. The positions of introns mapped on the homologous sites in the primary structure of
E. coli SSU and LSU rRNA. Group I and group II (underlined) introns are shown above the lines, whereas spliceosomal and
archaeal (underlined) introns are shown below the lines. B. Results of the broken-stick analysis of rRNA intron distribution.
The results of the simulations are shown as are the observed standard deviations for all introns or group I and spliceosomal
introns individually for both SSU and LSU rRNA genes.
A SSU rDNA
40 114
156
170
287
323
392
1210
1224
1247
1389
1506,1511,151 2
1516
788,788,
497
508
529,531
568-570
651
789,793
89 1
952,956
989
1046,1049,
1052
1062
1139
1199940,943
966911
934
265
297-300
330-333
337
390,393
516
674
882,883
939 1226,1229 1514
1071
10 83
1197
742
0 1542
400
263
374
548
781 901
908 1057
1068
322
532
10 92
1201
1205
1213
1363
1391514
p < 0.05
p < 0.001
104 LSU rDNA introns
100 SSU rDNA intro ns
10 14 16 20 24 26 30 34 36 40
Standard deviatio ns
0.05
0.10
0.15
0.20
0.25
0.30
0.05
0.10
0.15
0.20
Probabilityofstandarddeviations
25 LSU rDNA s pliceos omal introns
Standard deviatio ns
0.05
0.10
0.15
0.20
0.25
0.30
0.03
0.05
0.08
0.10
0.12
30 50 70 90 110 140 170 200 230
26 SSU rDNA spliceos omal introns
Standard deviatio ns
0.05
0.10
0.15
0.20
0.25
0.30
0.05
0.10
0.15
0.20
12 20 28 32 40 48 52 60 68
65 LSU rDNA group I intro ns
57 SSU rDNA group I introns
p < 0.001
(79.67)
p < 0.01
p < 0.001
(102.39)
p < 0.01
575,580
730
779
796,798-800
1025
1065,1066
1090,1094
1255
787
678,681 711
775-777
780,783,784,
786,787
858 978
1091824 1054
LSU rDNA
0
958
1024
1085
1093830
1098
2904
1685
1699 1766
1915,1917
1921
1923,1925,1926,1931
1939,1943
1949,1951
1974
2059
2066,2067,2069
2256
2449,2451
2455 2499,2500,2504
2509
2563
2585
2593,2596
26101787
1849 2445
1903
2076
2437
1809 1927,1929
1952
2552
2601
B
2262
10. BMC Evolutionary Biology 2003, 3 http://www.biomedcentral.com/1471-2148/3/7
Page 10 of 13
(page number not for citation purposes)
quences to build the P1 and P10 domains [40] to facilitate
proper folding prior to excision [e.g., [41]]). This could
limit the number of rRNA sites at which group I introns
can be fixed in comparison to spliceosomal introns which
have less specific exon sequence requirements for splicing.
Conclusions
Our findings provide concrete insights into rRNA intron
fixation and are more compatible with the view that both
the spliceosomal and group I intron distributions reflect
fundamental features of present-day genes and genomes
and that introns may not be relics of an ancient intron-
rich period of cells. An intriguing view on intron origin
was recently published using the tools of population ge-
netics. In this view, the richness of introns in multicellular
organisms may primarily reflect the smaller population
sizes of these taxa relative to protists, which generally con-
tain few introns. The large population sizes of unicellular
eukaryotes may prevent widespread intron spread due to
secondary mutations that lead to their loss from popula-
tions [42]. Interestingly, the lichenized Euascomycetes,
which are particularly rich in both spliceosomal and
group I introns in their nuclear rRNA, are typically ex-
tremely slow-growing taxa many of which have small
population sizes [e.g., [43]].
Methods
PCR Methods and the Intron Data
The spliceosomal introns described in Bhattacharya et al.
[12], plus 12 new positions that have become available in
GenBank, were used in this study, as well as 6 new sites
that we have found in the LSU rRNA genes of Buellia capi-
tis-regum, Buellia muriformis, Ionaspis lacustris, Physconia en-
teroxantha, and Rinodina tunicata. To allow direct
comparison between all rRNAs, the numbering of introns
reflects their relative positions in the E. coli coding re-
gions. DNA samples for Buellia spp., Rinodina, and Physco-
ni were generously provided by T. Friedl (Göttingen).
Tissue from Ionaspis was a gift from F. Lutzoni (Duke).
DNA was extracted from Ionaspis as in Bhattacharya et al.
(2000). PCR reactions were done with the following
primers: 1825-5'GTGATTTCTGCCCAGTGCTC3',
2252-5'TTTAACAGATGTGCCGCC3',
2252-5'GGCGGCACATCTGTTAAA3', and
2746-5' GATTCTGRCTTAGAGGCGTTC3'. The primer
names refer to their position relative to the LSU rRNA of
E. coli. PCR amplification products were cloned in the
pGEM-T (Promega) vector and sequenced over both
strands. Together, the fungal spliceosomal data set includ-
ed 49 different introns at the following sites (the species
from which they were isolated and GenBank accession
numbers, where available, are also shown): SSU rRNA –
265 (Arthroraphis citrinella, AF279375), 297 (Anaptychia
runcinata, AJ421692), 298 (Physconia perisidiosa,
AJ421689), 299 (Roccella canariensis, AF110342), 300
(Rhynchostoma minutum, AF242268), 330 (Stereocaulon
paschale, AF279412), 331 (Physconia perisidiosa,
AJ421689), 332 (Pyrenula cruenta, AF279406), 333 (Per-
tusaria amara, AF274104), 337 (Graphis scripta,
AF038878), 390 (Dermatocarpon americanum, AF279383),
393 (Hymenelia epulotica, AF279393), 400 (Halosarpheia
fibrosa, AF352078), 514 (Porpidia crustulata, L37735), 674
(Physconia detersa, AJ240495), 882 (Dimerella lutea,
AF279386), 883 (Diploschistes scruposus, AF279388), 939
(Dimerella lutea, AF279386), 1057 (Graphina poitiaei,
AF465459), 1071 (Rhynchostoma minutum, AF242268),
1083 (Rhamphoria delicatula, AF242267), 1226 (Rhynchos-
toma minutum, AF242269), 1229 (Physconia perisidiosa,
AJ421689), and 1514 (Phialophora americana, X65199);
LSU rRNA – 678 (Gyalecta jenensis, AF279391), 681 (Stictis
radiata, AF356663), 711 (Gyalecta jenensis, AF279391),
775 (Dibaeis baeomyces, AF279385), 776 (Capronia pilosel-
la, AF279378), 777 (Rinodina tunicata, AF457569), 780
(Pertusaria tejocotensis, AF279301), 784 (Melanochaeta sp.
8, AF279421), 786 (Pertusaria kalelae, AF279298), 787
(Dibaeis baeomyces, AF279385), 824 (Stictis radiata,
AF356663), 830 (Coenogonium leprieurii, AF465442), 858
(Trapeliopsis granulosa, AF279415), 978 (Gyalecta jenensis,
AF279391), 1024 (Ocellularia alborosella, AF465452),
1054 (Rinodina tunicata, AF457569), 1091 (Dimerella
lutea, AF279387), 1093 (Ocellularia alborosella,
AF465452), 1098 (Coenogonium leprieurii, AF465442),
1849 (Cordyceps prolifica, AB044640), 1903 (Physconia en-
teroxantha, AF457573), 1929 (Buellia capitis-regum,
AF457572), 2076 (Buellia muriformis, AF457571), and
2445 (Ionaspis lacustris, AF457570). We did not study the
742 and 1197 spliceosomal introns in the SSU rRNA gene
of the distantly related stramenopile, Cymatosira belgica
(X85387). This diatom is the sole known organism
outside of the fungi to contain rDNA spliceosomal in-
trons. In addition, the fungal 674, 1057, 1514 SSU rRNA
and 1093, 1098, and 1849 LSU rRNA introns were not in-
cluded in the information analysis because of missing
data or ambiguous sequences (see below). The SSU 674
site (Physconia detersa, AJ240495), for example, only in-
cluded 10 nt of the 5' and 3' region in the GenBank acces-
sion [11]. All fungal and diatom intron sites were,
however, mapped on the conservation diagrams to under-
stand their distribution (see below).
We have made, on the basis of detailed analysis of rRNA
flanking regions, a number of corrections in the positions
of the introns within the SSU rRNA (e.g., 1129 is now at
1229 and 1510 is now at 1514). Copies of the manuscript
figures and tables and additional materials related to this
work are available from the Gutell Laboratory's CRW Site
at http://www.rna.icmb.utexas.edu/ANALYSIS/FUN-
GINT/[44]. This page includes detailed rRNA conserva-
tion and intron position data (both the version used for
the manuscript and current values that are updated daily),
11. BMC Evolutionary Biology 2003, 3 http://www.biomedcentral.com/1471-2148/3/7
Page 11 of 13
(page number not for citation purposes)
fungal nucleotide frequency values, and the SSU and LSU
rRNA sequence alignments used in Table 2.
Information Analysis of the rRNA Introns
An information analysis was done of the 50 nt upstream
and downstream of the different rRNA spliceosomal
intron sites to determine the total amount of exonic infor-
mation (in "bits") that is available to the spliceosome for
splicing. We used the web-based logo program of Gorod-
kin et al. [45]http://www.cbs.dtu.dk/~gorodkin/appl/slo-
go.html to derive the sequence logos and the information
content of individual sites was calculated according to the
expression of Hertz and Stormo [17]. Type 2 logos were
drawn in which the height of the nucleotides in the se-
quence column represented their frequency in proportion
to their expected frequency. The expected nucleotide
probabilities were estimated from the observed nucle-
otide frequencies over all sites for 80 Euascomycetes rRNA
sequences (A = 26%, C = 22%, G = 27%, T = 25% [12]).
The nucleotides were turned upside-down when the ob-
served frequency was less than expected [45]. A total of 43
spliceosomal intron sites, for which 50 nt of both up-
stream and downstream exon sequence are available, were
included in this analysis.
To put the information content in perspective, we also did
simulations in which 43 random sequence data sets of
length 100 nt (for flanking exons) and 109 (total number
of introns analyzed) random data sets of length 29 nt (for
conserved intron regions) were generated at the nucle-
otide frequencies of Euascomycetes rRNA and the infor-
mation content of these was calculated. A total of 100,000
iterations were done with each data set to create null dis-
tributions of random information content. The observed
information values were then compared to the null distri-
butions to infer their probabilities.
Analysis of G-Content in Euascomycetes SSU rRNAs
Because it is difficult to see the pattern of G-content along
the sequence based on the raw data, we fit a smooth curve
to the frequencies of G using the method of local regres-
sion (loess, [20]). This smooth curve captures the G-con-
tent pattern along the nucleotide sites. Loess is a
nonparametric curve fitting technique that fits the data in
a local fashion. That is, for the fit at site x, the fit is made
using the G-frequencies at the points in a neighbourhood
of x, weighted by their distance from x. A tricubic weight-
ing function (proportional to [1 - (distance/max dis-
tance)^3)^3]) is used for calculating the weights. For both
the LSUrRNA and SSUrRNA sequence alignment data sets,
we used a neighborhood of 50 nt (and 100 nt) in fitting
the loess curve. Thus the value of the curve at each site is
computed as a weighted average of the G-frequency at the
site itself, the G-frequencies at the 25 up-stream sites, and
the G-frequencies at the 25 down-stream sites.
Positions of Introns Relative to Conserved rRNA Regions
To assess the patterns of sequence conservation in exon se-
quences flanking all rRNA spliceosomal and group I in-
trons, we mapped intron positions on structure
conservation diagrams. Group I introns in different sub-
classes (e.g., IC1, IE [46,47]) which occupied the same
rRNA site were counted as separate intron insertions. This
accounted for our observation that certain rRNA sites
(e.g., SSU 788, 1199, LSU 1949, 2500 [see CRW Site for
details]) are "hot" spots for insertion with multiple, evo-
lutionarily divergent introns being fixed at the same site in
different species or in different genomes (i.e., nuclear vs.
organellar). The actual number of independent hits at
rRNA sites is, however, likely to be much greater than our
estimate but this can only be proven with rigorous phylo-
genetic analysis of group I introns at different insertion
sites to show that in some cases, introns in the same sub-
class at the same site in different species have a high prob-
ability of independent origin [e.g., [48,49]]. The first set of
conservation diagrams used in our analysis was based on
the comparison of 6389 and 922 different SSU and LSU
rRNA sequences, respectively, from the three phylogenetic
domains and the two organelles (3Dom2O) that were su-
perimposed on the secondary structures of the Escherichia
coli rRNAs. The second set of diagrams was a summary of
5591 and 585 different SSU and LSU rRNA sequences, re-
spectively, from the three phylogenetic domains (3Dom)
also mapped on the E. coli rRNAs. These diagrams are
available at the CRW Site. Multiway contingency table
analysis was done to determine whether sites that were
98–100%, 90–97%, 80–89%, and <80% conserved in the
diagrams were independent of intron insertion sites (the
null hypothesis). Intron sites were taken as the nucleotide
immediately preceding the intron insertion. We also cal-
culated nucleotide frequencies for each SSU and LSU
rRNA site using the S. cerevisiae genes for numbering. Fre-
quencies were calculated for alignments of all available
fungal rRNAs (1434 SSU and 880 LSU sequences) and of
only fungi containing spliceosomal introns (73 SSU and
40 LSU sequences), or of fungi lacking spliceosomal in-
trons (1361 sequences for SSU, 840 for LSU). These fre-
quencies were used to determine the level of conservation
of nucleotides encoding the proto-splice site in intron-
containing and intron-less fungal species.
rRNA Intron Distribution
The positions of all known spliceosomal, group I, group
II, and tRNA-like archaeal [50] introns were marked on
the primary structures of E. coli SSU and LSU rRNA. These
data, which also accounted for multiple group I intron
hits at the same rRNA site, were then studied to determine
whether they differ significantly from the null expectation
of a random distribution (i.e., "the broken stick distribu-
tion"). We used the program PowerNiche V1.0 (P. Drozd,
V. Novotny, unpublished data) to generate sticks of length
12. BMC Evolutionary Biology 2003, 3 http://www.biomedcentral.com/1471-2148/3/7
Page 12 of 13
(page number not for citation purposes)
1542 nt (SSU rRNA) or 2904 nt (LSU rRNA) which were
randomly broken by n = 101 events for all introns (includ-
ing group II and archaeal), or n = 56 for only group I, or n
= 26 for only spliceosomal introns in SSU rRNA. For the
LSU rRNA, the stick was broken into n = 107 events for all
introns, or n = 68 for only group I, or n = 25 for only spli-
ceosomal introns. The paucity of rRNA group II introns (3
and 8 introns in the SSU and LSU rRNA, respectively) and
archaeal introns (14 and 6 introns in the SSU and LSU rR-
NA, respectively) did not allow their individual analysis.
A mean number of intervals and a SD were calculated for
each broken-stick. The SDs of 1000 simulations were
compared to the SD of the observed data to test whether
the observed pattern was likely to have been produced un-
der the assumptions of the broken-stick model.
Authors' Contributions
DS generated the new intron sequences. JH did the statis-
tical analyses of G-frequencies and information content.
JJC and RRG established and maintain the CRW database
and the rRNA-intron database, generated the rRNA G-fre-
quencies and the yeast conservation diagram, and pro-
duced the 3Dom and 3Dom2O rRNA conservation data.
DB conceived of the study, did the broken-stick analysis,
participated in the design and coordination of the other
analyses, and wrote the paper. All authors read, modified,
and approved the final manuscript.
Acknowledgements
D. Bhattacharya, J. Huang, and D. Simon acknowledge financial support
from the Iowa Biosciences Initiative and grants from the National Science
Foundation (MCB 01-10252, DEB 01-07754) awarded to D. Bhattacharya.
J. Cannone and R. Gutell acknowledge financial support from the National
Institutes of Health (GM 48207) and the National Science Foundation (MCB
01-10252) awarded to R. Gutell.
References
1. Burge CB, Tuschl T and Sharp PA Splicing precursors to mRNAs
by the spliceosomes In: The RNA World (Edited by: Gesteland RF, Cech
TR, Atkins JF) Cold Spring Harbor, Cold Spring Harbor Laboratory, New
York 1999, 525-560
2. Gilbert W The exon theory of genes Cold Spring Harb Symp Quant
Biol 1987, 52:901-905
3. Doolittle WF Genes in pieces: were they ever together? Nature
1978, 272:581-582
4. Darnell JE and Doolittle WF Speculations on the early course of
evolution Proc Natl Acad Sci USA 1986, 83:1271-1275
5. Roy SW, Federov A and Gilbert W The signal of ancient introns
is obscured by intron density and homolog number Proc Nat
Acad Sci USA 2002, 99:15513-15517
6. Cavalier-Smith T Intron phylogeny: A new hypothesis Trends
Genet 1991, 7:145-148
7. Palmer JD and Logsdon JM Jr The recent origins of introns Curr
Opin Genet Dev 1991, 1:470-477
8. Stoltzfus A, Spencer DF, Zuker M, Logsdon JM Jr and Doolittle WF
Testing the exon theory of genes:the evidence from protein
structure Science 1994, 265:202-207
9. Logsdon JM Jr The recent origins of spliceosomal introns
revisited Curr Opin Genet Dev 1998, 8:637-648
10. Patthy L Genome evolution and the evolution of exon-shuf-
fling-a review Gene 1999, 238:103-114
11. Cubero OF, Bridge PD and Crespo A Terminal-sequence conser-
vation identifies spliceosomal introns in ascomycete 18s
RNA genes Mol Biol Evol 2000, 17:751-756
12. Bhattacharya D, Lutzoni F, Reeb V, Simon D and Fernandez F Wide-
spread occurrence of spliceosomal introns in the rRNA
genes of ascomycetes Mol Biol Evol 2000, 17:1971-1984
13. Dibb NJ and Newman AJ Evidence that introns arose at proto-
splice sites EMBO J 1989, 8:2015-2021
14. Berget SM Exon recognition in vertebrate splicing J Biol Chem
1995, 270:2411-2414
15. McCullough AJ and Berget SM G triplets located throughout a
class of small vertebrate introns enforce intron borders and
regulate splice site selection Mol Cell Biol 1997, 17:4562-4571
16. Stephens RM and Schneider TD Features of spliceosome evolu-
tion and function inferred from an analysis of the informa-
tion at human splice sites J Mol Biol 1992, 228:1124-1136
17. Hertz GZ and Stormo GD Identification of consensus patterns
in unaligned DNA and protein sequences: a large-deviation
statistical basis for penalizing gaps In: Proceedings of the Third In-
ternational Conference on Bioinformatics and Genome Research (Edited by:
Lim HA, Cantor CR) Singapore, Scientific Publishing Co Ltd 1995, 201-216
18. Blencowe BJ Exonic splicing enhancers: mechanism of action,
diversity and role in human genetic diseases Trends Biochem Sci
2000, 25:106-110
19. Graveley BR Sorting out the complexity of SR protein
functions RNA 2000, 6:1197-1211
20. Cleveland WS Robust locally weighted regression and smooth-
ing scatterplots J Amer Statist Assoc 1979, 74:829-836
21. MacArthur RH On the relative abundance of bird species Proc
Natl Acad Sci USA 1957, 43:293-295
22. Pielou EC Mathematical Ecology Wiley 1977,
23. MacArthur RH On the relative abundance of species Am Nat
1960, 94:25-36
24. Goss PJE and Lewontin RC Detecting heterogeneity of substitu-
tion along DNA and protein sequences Genetics 1996, 143:589-
602
25. Baumiller TK and Ausich WI The Broken-Stick model as a null
hypothesis for crinoid stalk taphonomy and as a guide to the
distribution of connective tissue in fossils Paleobiol 1992,
18:288-298
26. RR Gutell Comparative sequence analysis and the structure
of 16S and 23S rRNA In: Ribosomal RNA: Structure, Evolution, Process-
ing, and Function in Protein Biosynthesis (Edited by: Zimmermann RA, Dahl-
berg AE) Boca Raton, CRC Press 1996, 111-128
27. Yusupov MM, Yusupova GZ, Baucom A, Lieberman K, Earnest TN,
Cate JH and Noller HF Crystal structure of the ribosome at 55
Å resolution Science 2001, 292:883-896
28. Jackson SA, Cannone JJ, Lee JC, Gutell RR and Woodson SA Distri-
bution of rRNA introns in the three-dimensional structure of
the ribosome J Mol Biol 2002, 323:215-234
29. Gilbert W, de Souza SJ and Long M Origin of genes Proc Natl Acad
Sci USA 1997, 94:7698-7703
30. Wolf YI, Kondrashov FA and Koonin EV No footprints of primor-
dial introns in a eukaryotic genome Trends Genet 2000, 16:333-
334
31. Spingola M, Grate L, Haussler D and Ares M Jr Genome-wide bio-
informatic and molecular analysis of introns in Saccharomy-
ces cerevisiae. RNA 1999, 5:221-234
32. Lopez PJ and Séraphin B Genomic-scale quantitative analysis of
yeast pre-mRNA splicing: implications for splice-site
recognition RNA 1999, 5:1135-1137
33. Long M, de Souza SJ, Rosenberg C and Gilbert W Relationship be-
tween "proto-splice sites" and intron phases:evidence from
dicodon analysis Proc Natl Acad Sci USA 1998, 95:219-223
34. Green MR Pre-mRNA splicing Annu Rev Genet 1986, 20:671-708
35. Newman AJ and Norman C U5 snRNA interacts with exon se-
quences at 5' and 3' splice sites Cell 1992, 68:743-754
36. Ramchatesingh J, Zahler AM, Neugebauer KM, Roth MB and Cooper
TA A subset of SR proteins activates splicing of the cardiac
troponin T alternative exon by direct interactions with an
exonic enhancer Mol Cell Biol 1995, 15:4898-4907
37. Graveley BR, Hertel KJ and Maniatis T A systematic analysis of
the factors that determine the strength of pre-mRNA splic-
ing enhancers EMBO J 1998, 17:6747-6756
38. Lam BJ and Hertel KJ A general role for splicing enhancers in
exon definition RNA 2002, 8:1233-1241
13. Publish with BioMed Central and every
scientist can read your work free of charge
"BioMed Central will be the most significant development for
disseminating the results of biomedical research in our lifetime."
Sir Paul Nurse, Cancer Research UK
Your research papers will be:
available free of charge to the entire biomedical community
peer reviewed and publishedimmediately upon acceptance
cited in PubMed and archived on PubMed Central
yours — you keep the copyright
Submit your manuscript here:
http://www.biomedcentral.com/info/publishing_adv.asp
BioMedcentral
BMC Evolutionary Biology 2003, 3 http://www.biomedcentral.com/1471-2148/3/7
Page 13 of 13
(page number not for citation purposes)
39. Orban TI and Olah E Expression profiles of BRCA1 splice vari-
ants in asynchronous and in G1/S synchronized tumor cell
lines Biochem Biophys Res Commun 2001, 280:32-38
40. Cech TR, Damberger SH and RR Gutell Representation of the
secondary and tertiary structure of group I introns Nature
Struc Biol 1994, 1:273-280
41. Woodson SA and Emerick VL An alternative helix in the 26S
rRNA promotes excision and integration of the Tetrahymena
intervening sequence Mol Cell Biol 1993, 13:1137-1145
42. Lynch M Intron evolution as a population-genetic process Proc
Natl Acad Sci USA 2002, 99:6118-6123
43. Zoller S, Lutzoni F and Scheidegger C Genetic variation within
and among populations of the threatened lichen Lobaria pul-
monaria in Switzerland and implications for its conservation
Mol Ecol 1999, 8:2049-2059
44. Cannone JJ, Subramanian S, Schnare MN, Collett JR, D'Souza LM, Du
Y, Feng B, Lin N, Madabusi LV and Mueller KM The Comparative
RNA Web (CRW) Site: an online database of comparative
sequence and structure information for ribosomal, intron,
and other RNAs BMC Bioinformatics 2002, 3:2
45. Gorodkin J, Heyer LJ, Brunak S and Stormo GD Displaying the in-
formation contents of structural RNA alignments: the struc-
ture logos Comput Appl Biosci 1997, 13:583-586
46. Michel F and Westhof E Modeling of the three-dimensional ar-
chitecture of group I catalytic introns based on comparative
sequence analysis J Mol Biol 1990, 216:585-610
47. Suh SO, Jones KG and Blackwell M A group I intron in the nuclear
small subunit rRNA gene of Cryptendoxyla hypophloia, an as-
comycetous fungus J Mol Evol 1999, 48:493-500
48. Bhattacharya D and Oliveira M The SSU rDNA coding region of
a filose amoeba contains a group I intron lacking the univer-
sally conserved G at the 3'-terminus J Euk Microbiol 2000,
47:585-589
49. Bhattacharya. D, Cannone JJ and Gutell RR Group I intron lateral
transfer between red and brown algal ribosomal RNA Curr
Genet 2001, 40:82-90
50. Kjems J and Garrett RA Ribosomal RNA introns in archaea and
evidence for RNA conformational changes associated with
splicing Proc Natl Acad Sci USA 1991, 88:439-443