This document summarizes the findings of a comparative genomic analysis of 627 bacteriophages (viruses that infect bacteria) that infect Mycobacterium smegmatis. The analysis revealed dramatic variation in genome structures and relationships among the phages. The phages were grouped into 20 clusters and 8 singletons based on genome similarities. However, the degree of genetic diversity and connectivity to other phages varied greatly between clusters. Some clusters were discrete with few shared genes, while others shared many genes. This indicates the phage population spans a continuum of relationships rather than representing discrete populations. The mosaic genomic architectures of many phages complicate classification. Overall, the analysis revealed a highly diverse global phage population with unequal representation of
Phage adhere more strongly to mucus layers than surrounding environments across diverse animal species. In vitro experiments show that phage adhere specifically to mucin glycoproteins in mucus via interactions between Ig-like domains on phage capsids and glycan residues on mucins. Pretreating mucus-producing cells with phage reduces subsequent bacterial attachment and infection, protecting the underlying epithelium. The presence of Ig-like protein domains in phages from many environments suggests a widespread symbiotic relationship between phages and metazoans, whereby phage adherence to mucus provides a non-host-derived antimicrobial defense of mucosal surfaces.
Surveying the small-spotted catshark (Scyliorhinus canicula) tumour necrosis ...Fiona Bakke
This honours research project report summarizes an investigation of the tumour necrosis factor superfamily (TNFSF) genes in the small-spotted catshark (Scyliorhinus canicula). Bioinformatic analysis of the S. canicula transcriptome identified potential TNFSF orthologs. A phylogenetic tree including TNFSF sequences from various vertebrates provided insight into the evolutionary relationships between these genes. Several S. canicula TNFSF sequences were amplified and sequenced to confirm their presence. The analysis suggests S. canicula possesses orthologs of most mammalian TNFSF genes, though is missing a few, and also expresses a TNFSF gene only found previously in cart
Pugacheva et al. COMPLETE GB_16.1_p.161_publ.online_08_14_2015 2Victor Lobanenkov
CTCF and BORIS are paralogous proteins that bind to DNA through their nearly identical zinc finger domains. The authors performed ChIP-seq experiments in three cancer cell lines to compare genomic binding patterns of CTCF and BORIS. They found that BORIS selectively occupies a subset (~29-38%) of CTCF binding sites that contain clustered CTCF motifs, termed 2xCTSes. In contrast, the majority of CTCF binding sites contain a single CTCF motif (1xCTSes) and are not occupied by BORIS. 2xCTSes are preferentially located at active promoters and enhancers in cancer cells, and are also enriched in regions retaining histones in sperm. The results suggest there are two
This study sequenced the genomes of 11 clinical Mycobacterium abscessus isolates from 8 US patients with pulmonary infections. Core genome analysis compared these isolates to 30 globally diverse strains to investigate population structure. Longitudinally sampled isolates showed very few genetic differences, suggesting homogenous infection populations. Genome content variation between isolates was 0.3-8.3% compared to the reference strain, indicating plasticity.
Phylogenetic analyses of two gene sequences from Fusarium species support the monophyly of a clade comprising 20 species complexes and 9 monotypic lineages of Fusarium. This clade, termed the terminal Fusarium clade (TFC), is estimated to have originated in the middle Cretaceous period around 91 million years ago. Analysis of secondary metabolite genes in several Fusarium genome sequences showed that many mycotoxins and other compounds originated earlier in the evolution of the TFC. Dating of plant-associated species complexes suggests their evolution may have coincided with angiosperm diversification during the Miocene.
1) The study analyzed epigenetic variation in shoots from a 1000-year old clone of the seagrass Zostera marina in the Baltic Sea.
2) While all 34 shoots sampled along a 250m transect were genetically identical based on microsatellite analysis, they showed epigenetic differences in cytosine methylation patterns.
3) Epigenetic variation between shoots was independent of their distance from shore and not correlated with geographic distance, suggesting epigenetic variation is not spatially structured within this clonal meadow.
This document summarizes information about feline coronavirus (FCoV) which causes feline infectious peritonitis (FIP). It discusses that FCoV exists as two biotypes - feline enteric coronavirus (FECV) which causes a mild enteric infection, and feline infectious peritonitis virus (FIPV) which causes a lethal systemic infection. The viral structure and proteins like the nucleocapsid, membrane, spike, and envelope proteins are described. It also covers FCoV taxonomy, genetics, serotypes, and the theories around how FECV may mutate to the virulent FIPV form.
Phage adhere more strongly to mucus layers than surrounding environments across diverse animal species. In vitro experiments show that phage adhere specifically to mucin glycoproteins in mucus via interactions between Ig-like domains on phage capsids and glycan residues on mucins. Pretreating mucus-producing cells with phage reduces subsequent bacterial attachment and infection, protecting the underlying epithelium. The presence of Ig-like protein domains in phages from many environments suggests a widespread symbiotic relationship between phages and metazoans, whereby phage adherence to mucus provides a non-host-derived antimicrobial defense of mucosal surfaces.
Surveying the small-spotted catshark (Scyliorhinus canicula) tumour necrosis ...Fiona Bakke
This honours research project report summarizes an investigation of the tumour necrosis factor superfamily (TNFSF) genes in the small-spotted catshark (Scyliorhinus canicula). Bioinformatic analysis of the S. canicula transcriptome identified potential TNFSF orthologs. A phylogenetic tree including TNFSF sequences from various vertebrates provided insight into the evolutionary relationships between these genes. Several S. canicula TNFSF sequences were amplified and sequenced to confirm their presence. The analysis suggests S. canicula possesses orthologs of most mammalian TNFSF genes, though is missing a few, and also expresses a TNFSF gene only found previously in cart
Pugacheva et al. COMPLETE GB_16.1_p.161_publ.online_08_14_2015 2Victor Lobanenkov
CTCF and BORIS are paralogous proteins that bind to DNA through their nearly identical zinc finger domains. The authors performed ChIP-seq experiments in three cancer cell lines to compare genomic binding patterns of CTCF and BORIS. They found that BORIS selectively occupies a subset (~29-38%) of CTCF binding sites that contain clustered CTCF motifs, termed 2xCTSes. In contrast, the majority of CTCF binding sites contain a single CTCF motif (1xCTSes) and are not occupied by BORIS. 2xCTSes are preferentially located at active promoters and enhancers in cancer cells, and are also enriched in regions retaining histones in sperm. The results suggest there are two
This study sequenced the genomes of 11 clinical Mycobacterium abscessus isolates from 8 US patients with pulmonary infections. Core genome analysis compared these isolates to 30 globally diverse strains to investigate population structure. Longitudinally sampled isolates showed very few genetic differences, suggesting homogenous infection populations. Genome content variation between isolates was 0.3-8.3% compared to the reference strain, indicating plasticity.
Phylogenetic analyses of two gene sequences from Fusarium species support the monophyly of a clade comprising 20 species complexes and 9 monotypic lineages of Fusarium. This clade, termed the terminal Fusarium clade (TFC), is estimated to have originated in the middle Cretaceous period around 91 million years ago. Analysis of secondary metabolite genes in several Fusarium genome sequences showed that many mycotoxins and other compounds originated earlier in the evolution of the TFC. Dating of plant-associated species complexes suggests their evolution may have coincided with angiosperm diversification during the Miocene.
1) The study analyzed epigenetic variation in shoots from a 1000-year old clone of the seagrass Zostera marina in the Baltic Sea.
2) While all 34 shoots sampled along a 250m transect were genetically identical based on microsatellite analysis, they showed epigenetic differences in cytosine methylation patterns.
3) Epigenetic variation between shoots was independent of their distance from shore and not correlated with geographic distance, suggesting epigenetic variation is not spatially structured within this clonal meadow.
This document summarizes information about feline coronavirus (FCoV) which causes feline infectious peritonitis (FIP). It discusses that FCoV exists as two biotypes - feline enteric coronavirus (FECV) which causes a mild enteric infection, and feline infectious peritonitis virus (FIPV) which causes a lethal systemic infection. The viral structure and proteins like the nucleocapsid, membrane, spike, and envelope proteins are described. It also covers FCoV taxonomy, genetics, serotypes, and the theories around how FECV may mutate to the virulent FIPV form.
This document describes a study that characterized the genetic diversity and sexual reproduction capabilities of the fish parasite Ichthyophthirius multifiliis. The researchers sequenced and analyzed three genetic markers (SSUrDNA, nad1_b, and cox-1) from nine I. multifiliis isolates collected in different states. They found that the mitochondrial markers effectively distinguished the isolates and divided them into at least two genetically distinct groups. Analysis of 14 somatic single nucleotide polymorphism sites also showed that none of the nine isolates shared the same composition, suggesting sexual reproduction occurs in the life cycle of I. multifiliis. Compared to related ciliate species, I. multifiliis was found to have lost around 30-38
This document discusses various methods for measuring genetic selection in genomes. It examines comparing rates of synonymous and non-synonymous mutations to identify regions under negative or positive selection. Another method looks for extremely conserved elements or rapidly evolving regions across species. Analyzing population variation data through measures like Tajima's D or GWAS can also reveal selective sweeps. Transposon-free and INDEL-free regions may also indicate genetic selection.
This document summarizes the challenges and progress of human gene therapy. It discusses three categories of gene therapy delivery (ex vivo, in situ, in vivo) and focuses on challenges of using retroviral vectors, including inefficient delivery, inability to transduce non-dividing cells, lack of long-term gene expression, and difficulties with large-scale manufacturing. While over 300 clinical trials have been approved, gene therapy efficiency remains low and no protocols have conclusively treated human disease. The document evaluates efforts to address these challenges through envelope protein engineering, hybrid vectors, and improved regulatory sequences.
Sox2 suppresses the invasiveness of breast cancer cells via a mechanism that ...Enrique Moreno Gonzalez
Sox2, an embryonic stem cell marker, is aberrantly expressed in a subset of breast cancer (BC). While the aberrant expression of Sox2 has been shown to significantly correlate with a number of clinicopathologic parameters in BC, its biological significance in BC is incompletely understood.
1. The evolutionary relationships between malaria parasite species have been controversial due to past studies relying on visible traits rather than molecular data and issues like taxon bias.
2. Different genes are suitable for phylogenetic analysis, with some like rRNA being problematic due to paralogs. Studies using multiple genes from different genomic compartments provide better resolution.
3. The origin of P. falciparum, which causes the most virulent human malaria, has been debated, with evidence it may have recently switched hosts from gorillas rather than co-diverging with humans. Further sampling of ape malarias is needed to resolve this.
This document is a thesis by Jonas Danielson from Lund University in 2010 on the topic of plant Major Intrinsic Proteins (MIPs), also known as aquaporins. It provides an introduction to MIPs and their structure and function. It then focuses on plant MIPs, discussing the large family of MIP isoforms in plants, their evolution and diversity across plant species. It aims to expand knowledge of plant MIP diversity and the roles of different subfamilies and isoforms using both traditional molecular biology approaches and comparative genetics methods.
This document describes the development of a novel MLVA (multilocus variable number tandem repeat analysis) typing scheme for strains of the Ralstonia solanacearum species complex (RSSC) that belong to phylotype III and are found in Africa. The researchers first evaluated an existing 11-locus MLVA scheme (RS3-MLVA11) and found it was not fully suitable for studying the genetic structure of phylotype III populations. They then designed a new optimized 16-locus MLVA scheme (RS3-MLVA16) specific for phylotype III using tandem repeat loci identified from the genome of the reference strain CMR15. When tested on collections of phylotype III strains from various
Tryptophan scanning mutagenesis was used to identify sites of interaction between the transmembrane domains of connexin32 (Cx32), a gap junction protein. Tryptophan was substituted for residues in all four transmembrane domains of Cx32. Function was then assayed in Xenopus oocytes. Tryptophan substitution was poorly tolerated in all domains, especially TM1 and TM4, indicating tight packing. A region midway through the membrane appeared highly sensitive to substitution. Pore-facing regions were also highly sensitive, while lipid-facing regions were more tolerant. Sensitive sites mapped onto a Cx32 channel model, revealing interactions important for voltage gating and the pore. TM1 of
Human genetic diversity and origin of major human groupsMayank Sagar
Humans are 99.9% genetically identical and yet we are all so different. Even monozygotic twins have infrequent genetic differences due to mutations occurring during development and gene copy-number variation.
This document provides a summary of research on Brucellosis, a bacterial infection. It discusses the epidemiology and transmission of Brucellosis, highlighting that it is one of the most common zoonotic diseases worldwide, with 500,000 cases annually. It then examines the pathogenic strategies that allow Brucella bacteria to evade the immune system, including modifications to its lipopolysaccharide that inhibit immune responses. Finally, it reviews the immunological response to Brucella, noting the importance of innate immune cells and Th1-type cytokines in both the initial response and long-term control of infection.
This document summarizes research on human genetic population structure and diversity. The key points are:
- 85% of human genetic variation exists within populations, 10% among continental groups, and 5% among populations within the same continent.
- Clustering analyses of genetic data yield inconsistent groupings depending on the traits or markers used, and populations form a continuous gradient without clear boundaries.
- The patterns of genetic diversity are consistent with an origin of modern humans in Africa followed by serial founder effects during dispersal, around 56,000 years ago.
This document outlines the schedule and requirements for a genomics course consisting of 9 sessions over March and May. Students are required to attend all sessions and give one 20-minute seminar and write one essay. Seminars will be 15% of the final grade and essays will also be 15%, with a final exam making up the remaining 70% of the grade. Topics for the seminars and essays will be assigned.
This document describes the development of a multiplex PCR assay targeting the cgcA gene, which encodes a diguanylate cyclase, to differentiate between species within the genus Cronobacter. Analysis of 12 Cronobacter genomes identified 7 conserved diguanylate cyclase-encoding genes, one of which, cgcA, showed species-specific divergence that matched known phylogenetic relationships between Cronobacter species. Primers were designed for this gene and tested in a multiplex PCR assay on 305 Cronobacter isolates representing 6 species. The assay correctly identified the species of all isolates tested and did not identify any of 20 non-Cronobacter species, demonstrating high specificity and sensitivity for rapid identification of Cronobacter.
1) The study found that while late-stage bone metastases from prostate cancer were surrounded by osteoclasts, early-stage micro-metastases were spatially unrelated to osteoclasts. This suggests osteoclast involvement may not be important for survival of cancer cells immediately after arriving in bone marrow.
2) Expression of the platelet-derived growth factor receptor alpha (PDGFRa) correlated with prostate cancer cells' ability to survive and progress during early skeletal dissemination, while cells with lower PDGFRa failed to survive.
3) Blocking PDGFRa on prostate cancer cells with a monoclonal antibody impaired the establishment of early bone metastases in an animal model, indicating PDGFRa promotes survival and progression of
1) Researchers sequenced the genome of "Coxiella-like endosymbiont of Amblyomma americanum" (CLEAA), a bacterium that lives within the lone star tick.
2) Analysis of the CLEAA genome revealed it contains pathways for biosynthesis of many vitamins and cofactors that are scarce in vertebrate blood. This suggests CLEAA plays a role in providing nutrients to the tick.
3) CLEAA is highly prevalent within the lone star tick and is closely related to Coxiella burnetii, the agent of Q fever, but does not appear to be directly derived from it. In contrast to C. burnetii, CLE
This document summarizes a study that characterized resistance to the Russian wheat aphid (RWA) in the wheat line KRWA9. The researchers found that resistance segregated in a monogenic dominant manner. They used bulk segregant analysis with simple sequence repeat (SSR) markers between parental lines and resistant/susceptible bulks. One SSR marker on chromosome 7DS, Xgwm111, was closely linked to resistance with an R2 value of 85%. This marker provides opportunities for marker-assisted breeding to improve RWA resistance in wheat.
Rhodophyta: A cornucopia of cryptic diversityEukRef
This document summarizes research on the taxonomy and phylogeny of red algae (Rhodophyta). It finds that:
1) Molecular analysis has revealed cryptic diversity and non-monophyletic orders within the traditional morphological classification of red algae.
2) The phylum Rhodophyta is highly diverse, with over 6,000 described species classified into 7 classes that vary morphologically but share characteristics like lacking flagella.
3) Resolving the evolutionary relationships among some orders, like those in the lineage Nemaliophycidae, remains challenging despite molecular studies.
This document discusses the lack of commercial-level yield heterosis in wheat compared to other crops like maize and rice. It summarizes that wheat's allopolyploid nature results in fixed intergenomic heterosis between its genomes, behaving like a self-sustaining hybrid. Additionally, a long history of successful pureline breeding and lack of suitable parental lines have hindered realizing heterosis in wheat. While small-scale studies have found heterosis in wheat, economically viable hybrids at the commercial level have not been achieved. The document reviews various genetic explanations for heterosis and molecular evidence suggesting altered gene and protein expression may underlie the phenomenon.
This PhD report aims to develop diagnostic tools to distinguish between protozoan parasites that infect ruminants, including Toxoplasma gondii, Neospora caninum, and Sarcocystis species. The report will identify genus-specific antigens for each parasite and develop antibodies and PCR tests that can detect the parasites individually. Recombinant antigens will be produced and used to generate genus-specific antibodies for diagnosis via immunohistochemistry. Genus-specific DNA targets and primers will also be identified to enable PCR-based detection of the protozoan parasites in tissue samples. Both diagnostic methods will be supported by a tissue bank of samples infected with the parasites.
Unveiling Hidden Treasures of Indigenous Cattle In Zambia Using MacrosatilltesMSIMUKO ELLISON
1. The study analyzed genetic diversity in three indigenous cattle breeds in Zambia (Angoni, Tonga, and Barotse) using 32 microsatellite markers.
2. Results showed moderate genetic diversity within breeds and low differentiation between breeds, indicating gene flow between populations.
3. Bayesian cluster analysis grouped the Tonga and Barotse breeds together, separating them from the Angoni breed, suggesting two genetic populations rather than three.
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...Jonathan Eisen
Lang JM, Darling AE, Eisen JA (2013) Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees and Supermatrices. PLoS ONE 8(4): e62510. doi:10.1371/journal.pone.0062510
Understanding the origin and evolution of the eukaryotic cell and the full diversity of eukaryotes is relevant to many biological disciplines.
However, our current understanding of eukaryotic genomes is extremely biased, leading to a skewed view of eukaryotic biology.
We argue that a phylogeny-driven initiative to cover the full eukaryotic diversity is needed to overcome this bias.
•
◦There is an important bias in eukaryotic knowledge, affecting cultures and genomes.
Eukaryotic genomics are biased towards multicellular organisms and their parasites.
◦A phylogeny-driven initiative is needed to overcome the eukaryotic genomic bias.
◦We propose to sequence neglected cultures and increase culturing efforts.
◦Single-cell genomics should be embraced as a tool to explore eukaryotic diversity
This document describes a study that characterized the genetic diversity and sexual reproduction capabilities of the fish parasite Ichthyophthirius multifiliis. The researchers sequenced and analyzed three genetic markers (SSUrDNA, nad1_b, and cox-1) from nine I. multifiliis isolates collected in different states. They found that the mitochondrial markers effectively distinguished the isolates and divided them into at least two genetically distinct groups. Analysis of 14 somatic single nucleotide polymorphism sites also showed that none of the nine isolates shared the same composition, suggesting sexual reproduction occurs in the life cycle of I. multifiliis. Compared to related ciliate species, I. multifiliis was found to have lost around 30-38
This document discusses various methods for measuring genetic selection in genomes. It examines comparing rates of synonymous and non-synonymous mutations to identify regions under negative or positive selection. Another method looks for extremely conserved elements or rapidly evolving regions across species. Analyzing population variation data through measures like Tajima's D or GWAS can also reveal selective sweeps. Transposon-free and INDEL-free regions may also indicate genetic selection.
This document summarizes the challenges and progress of human gene therapy. It discusses three categories of gene therapy delivery (ex vivo, in situ, in vivo) and focuses on challenges of using retroviral vectors, including inefficient delivery, inability to transduce non-dividing cells, lack of long-term gene expression, and difficulties with large-scale manufacturing. While over 300 clinical trials have been approved, gene therapy efficiency remains low and no protocols have conclusively treated human disease. The document evaluates efforts to address these challenges through envelope protein engineering, hybrid vectors, and improved regulatory sequences.
Sox2 suppresses the invasiveness of breast cancer cells via a mechanism that ...Enrique Moreno Gonzalez
Sox2, an embryonic stem cell marker, is aberrantly expressed in a subset of breast cancer (BC). While the aberrant expression of Sox2 has been shown to significantly correlate with a number of clinicopathologic parameters in BC, its biological significance in BC is incompletely understood.
1. The evolutionary relationships between malaria parasite species have been controversial due to past studies relying on visible traits rather than molecular data and issues like taxon bias.
2. Different genes are suitable for phylogenetic analysis, with some like rRNA being problematic due to paralogs. Studies using multiple genes from different genomic compartments provide better resolution.
3. The origin of P. falciparum, which causes the most virulent human malaria, has been debated, with evidence it may have recently switched hosts from gorillas rather than co-diverging with humans. Further sampling of ape malarias is needed to resolve this.
This document is a thesis by Jonas Danielson from Lund University in 2010 on the topic of plant Major Intrinsic Proteins (MIPs), also known as aquaporins. It provides an introduction to MIPs and their structure and function. It then focuses on plant MIPs, discussing the large family of MIP isoforms in plants, their evolution and diversity across plant species. It aims to expand knowledge of plant MIP diversity and the roles of different subfamilies and isoforms using both traditional molecular biology approaches and comparative genetics methods.
This document describes the development of a novel MLVA (multilocus variable number tandem repeat analysis) typing scheme for strains of the Ralstonia solanacearum species complex (RSSC) that belong to phylotype III and are found in Africa. The researchers first evaluated an existing 11-locus MLVA scheme (RS3-MLVA11) and found it was not fully suitable for studying the genetic structure of phylotype III populations. They then designed a new optimized 16-locus MLVA scheme (RS3-MLVA16) specific for phylotype III using tandem repeat loci identified from the genome of the reference strain CMR15. When tested on collections of phylotype III strains from various
Tryptophan scanning mutagenesis was used to identify sites of interaction between the transmembrane domains of connexin32 (Cx32), a gap junction protein. Tryptophan was substituted for residues in all four transmembrane domains of Cx32. Function was then assayed in Xenopus oocytes. Tryptophan substitution was poorly tolerated in all domains, especially TM1 and TM4, indicating tight packing. A region midway through the membrane appeared highly sensitive to substitution. Pore-facing regions were also highly sensitive, while lipid-facing regions were more tolerant. Sensitive sites mapped onto a Cx32 channel model, revealing interactions important for voltage gating and the pore. TM1 of
Human genetic diversity and origin of major human groupsMayank Sagar
Humans are 99.9% genetically identical and yet we are all so different. Even monozygotic twins have infrequent genetic differences due to mutations occurring during development and gene copy-number variation.
This document provides a summary of research on Brucellosis, a bacterial infection. It discusses the epidemiology and transmission of Brucellosis, highlighting that it is one of the most common zoonotic diseases worldwide, with 500,000 cases annually. It then examines the pathogenic strategies that allow Brucella bacteria to evade the immune system, including modifications to its lipopolysaccharide that inhibit immune responses. Finally, it reviews the immunological response to Brucella, noting the importance of innate immune cells and Th1-type cytokines in both the initial response and long-term control of infection.
This document summarizes research on human genetic population structure and diversity. The key points are:
- 85% of human genetic variation exists within populations, 10% among continental groups, and 5% among populations within the same continent.
- Clustering analyses of genetic data yield inconsistent groupings depending on the traits or markers used, and populations form a continuous gradient without clear boundaries.
- The patterns of genetic diversity are consistent with an origin of modern humans in Africa followed by serial founder effects during dispersal, around 56,000 years ago.
This document outlines the schedule and requirements for a genomics course consisting of 9 sessions over March and May. Students are required to attend all sessions and give one 20-minute seminar and write one essay. Seminars will be 15% of the final grade and essays will also be 15%, with a final exam making up the remaining 70% of the grade. Topics for the seminars and essays will be assigned.
This document describes the development of a multiplex PCR assay targeting the cgcA gene, which encodes a diguanylate cyclase, to differentiate between species within the genus Cronobacter. Analysis of 12 Cronobacter genomes identified 7 conserved diguanylate cyclase-encoding genes, one of which, cgcA, showed species-specific divergence that matched known phylogenetic relationships between Cronobacter species. Primers were designed for this gene and tested in a multiplex PCR assay on 305 Cronobacter isolates representing 6 species. The assay correctly identified the species of all isolates tested and did not identify any of 20 non-Cronobacter species, demonstrating high specificity and sensitivity for rapid identification of Cronobacter.
1) The study found that while late-stage bone metastases from prostate cancer were surrounded by osteoclasts, early-stage micro-metastases were spatially unrelated to osteoclasts. This suggests osteoclast involvement may not be important for survival of cancer cells immediately after arriving in bone marrow.
2) Expression of the platelet-derived growth factor receptor alpha (PDGFRa) correlated with prostate cancer cells' ability to survive and progress during early skeletal dissemination, while cells with lower PDGFRa failed to survive.
3) Blocking PDGFRa on prostate cancer cells with a monoclonal antibody impaired the establishment of early bone metastases in an animal model, indicating PDGFRa promotes survival and progression of
1) Researchers sequenced the genome of "Coxiella-like endosymbiont of Amblyomma americanum" (CLEAA), a bacterium that lives within the lone star tick.
2) Analysis of the CLEAA genome revealed it contains pathways for biosynthesis of many vitamins and cofactors that are scarce in vertebrate blood. This suggests CLEAA plays a role in providing nutrients to the tick.
3) CLEAA is highly prevalent within the lone star tick and is closely related to Coxiella burnetii, the agent of Q fever, but does not appear to be directly derived from it. In contrast to C. burnetii, CLE
This document summarizes a study that characterized resistance to the Russian wheat aphid (RWA) in the wheat line KRWA9. The researchers found that resistance segregated in a monogenic dominant manner. They used bulk segregant analysis with simple sequence repeat (SSR) markers between parental lines and resistant/susceptible bulks. One SSR marker on chromosome 7DS, Xgwm111, was closely linked to resistance with an R2 value of 85%. This marker provides opportunities for marker-assisted breeding to improve RWA resistance in wheat.
Rhodophyta: A cornucopia of cryptic diversityEukRef
This document summarizes research on the taxonomy and phylogeny of red algae (Rhodophyta). It finds that:
1) Molecular analysis has revealed cryptic diversity and non-monophyletic orders within the traditional morphological classification of red algae.
2) The phylum Rhodophyta is highly diverse, with over 6,000 described species classified into 7 classes that vary morphologically but share characteristics like lacking flagella.
3) Resolving the evolutionary relationships among some orders, like those in the lineage Nemaliophycidae, remains challenging despite molecular studies.
This document discusses the lack of commercial-level yield heterosis in wheat compared to other crops like maize and rice. It summarizes that wheat's allopolyploid nature results in fixed intergenomic heterosis between its genomes, behaving like a self-sustaining hybrid. Additionally, a long history of successful pureline breeding and lack of suitable parental lines have hindered realizing heterosis in wheat. While small-scale studies have found heterosis in wheat, economically viable hybrids at the commercial level have not been achieved. The document reviews various genetic explanations for heterosis and molecular evidence suggesting altered gene and protein expression may underlie the phenomenon.
This PhD report aims to develop diagnostic tools to distinguish between protozoan parasites that infect ruminants, including Toxoplasma gondii, Neospora caninum, and Sarcocystis species. The report will identify genus-specific antigens for each parasite and develop antibodies and PCR tests that can detect the parasites individually. Recombinant antigens will be produced and used to generate genus-specific antibodies for diagnosis via immunohistochemistry. Genus-specific DNA targets and primers will also be identified to enable PCR-based detection of the protozoan parasites in tissue samples. Both diagnostic methods will be supported by a tissue bank of samples infected with the parasites.
Unveiling Hidden Treasures of Indigenous Cattle In Zambia Using MacrosatilltesMSIMUKO ELLISON
1. The study analyzed genetic diversity in three indigenous cattle breeds in Zambia (Angoni, Tonga, and Barotse) using 32 microsatellite markers.
2. Results showed moderate genetic diversity within breeds and low differentiation between breeds, indicating gene flow between populations.
3. Bayesian cluster analysis grouped the Tonga and Barotse breeds together, separating them from the Angoni breed, suggesting two genetic populations rather than three.
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...Jonathan Eisen
Lang JM, Darling AE, Eisen JA (2013) Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees and Supermatrices. PLoS ONE 8(4): e62510. doi:10.1371/journal.pone.0062510
Understanding the origin and evolution of the eukaryotic cell and the full diversity of eukaryotes is relevant to many biological disciplines.
However, our current understanding of eukaryotic genomes is extremely biased, leading to a skewed view of eukaryotic biology.
We argue that a phylogeny-driven initiative to cover the full eukaryotic diversity is needed to overcome this bias.
•
◦There is an important bias in eukaryotic knowledge, affecting cultures and genomes.
Eukaryotic genomics are biased towards multicellular organisms and their parasites.
◦A phylogeny-driven initiative is needed to overcome the eukaryotic genomic bias.
◦We propose to sequence neglected cultures and increase culturing efforts.
◦Single-cell genomics should be embraced as a tool to explore eukaryotic diversity
A lot of sequence data are getting accumulated with the increase in affordability to technology coupled with decreasing cost. But 'Pangenome' concept could help in efficient understanding and thereby practical utilization of sequence data
1) The document describes research that used whole genome sequence data from 42 fungal species to construct phylogenetic trees of fungal relationships using both supertree and concatenated gene analysis methods.
2) The supertree and concatenated tree methods produced highly congruent results, with both supporting traditional classifications of fungal phyla, sub-phyla, and classes.
3) Within the Ascomycota, the trees resolved the relationships between the classes Leotiomycetes and Sordariomycetes, and identified two clades within the CTG clade of the Saccharomycotina that may correlate with sexual status.
Phylogenetic diversity—patterns of phylogenetic relatedness among organisms in ecological communities—provides important insights into the mechanisms underlying community assembly. Studies that measure phylogenetic diversity in microbial communities have primarily been limited to a single marker gene approach, using the small subunit of the rRNA gene (SSU-rRNA) to quantify phylogenetic relationships among microbial taxa. In this study, we present an approach for inferring phylogenetic relationships among microorganisms based on the random metagenomic sequencing of DNA fragments. To overcome challenges caused by the fragmentary nature of metagenomic data, we leveraged fully sequenced bacterial genomes as a scaffold to enable inference of phylogenetic relationships among metagenomic sequences from multiple phylogenetic marker gene families. The resulting metagenomic phylogeny can be used to quantify the phylogenetic diversity of microbial communities based on metagenomic data sets. We applied this method to understand patterns of microbial phylogenetic diversity and community assembly along an oceanic depth gradient, and compared our findings to previous studies of this gradient using SSU-rRNA gene and metagenomic analyses. Bacterial phylogenetic diversity was highest at intermediate depths beneath the ocean surface, whereas taxonomic diversity (diversity measured by binning sequences into taxonomically similar groups) showed no relationship with depth. Phylogenetic diversity estimates based on the SSU-rRNA gene and the multi-gene metagenomic phylogeny were broadly concordant, suggesting that our approach will be applicable to other metagenomic data sets for which corresponding SSU-rRNA gene sequences are unavailable. Our approach opens up the possibility of using metagenomic data to study microbial diversity in a phylogenetic context.
1) The study aimed to determine environmental sources of variation in reproductive lifespan using genetically identical fruit fly lines. 2) While the lines were genetically identical, substantial variation was found between individuals' reproductive lifespans. 3) The study compared differences between treated and untreated lines, infected and cured lines, and results from different experimental sections, but no single environmental factor consistently explained the observed variation.
Comparative genomics involves systematically comparing genome sequences from different organisms. It uses computer programs to identify homologous genomic regions and align sequences at the base-pair level. Comparing genomes at different phylogenetic distances can provide insights into gene structure/function, evolution, and characteristics unique to each organism. Key tools for comparative genomics include genome browsers, aligners, and databases that classify orthologous gene clusters conserved across species.
Speciation in plant pathogens can occur through geographical isolation, environmental factors, or hybridization. Geographical isolation leads to genetic divergence as populations accumulate differences separated by physical barriers. Environmental factors like host or temperature can also drive reproductive isolation over time. Rarely, hybridization between species can result in hybrids reproductively isolated from parents. Studying speciation in plant pathogens provides insights into their evolution and emergence, helping predict new pathogens. Combining classical and genomic approaches reveals how speciation occurs in fungi and oomycetes.
Comparative genome mapping involves comparing genetic maps between closely related species to study genome evolution and understand relationships at the genetic level. Genomes can be compared by looking at features like gene location and order, as well as sequence similarity. Many model systems have been used for comparative mapping, including plants like rice and maize, Arabidopsis and Brassica, tomato and potato. These studies have revealed things like conserved synteny between species, rates of rearrangement, and the effects of polyploidization. Comparative mapping is a useful tool for understanding genomes and their relationships across species.
The document discusses the topic of phylogenetics. It begins with definitions of key terms like phylogeny, phylogenetic tree, clade, and orthologous genes. It then provides examples of how phylogenetic methods are used in fields like epidemiology, conservation biology, and pharmaceutical research. The document also discusses choosing appropriate genetic sequences to use in phylogenetic analysis and introduces molecular clock models.
This document discusses comparative genomics and the evolution of genes using Drosophila species as a model. It summarizes that 12 Drosophila genomes were compared phylogenetically and this comparison revealed patterns of gene family expansion and contraction over time as well as structural changes and rearrangements like in the Hox cluster. The multi-species comparisons provided strong evidence for gene models and functions are conserved despite sequence divergence across the Drosophila phylogeny.
The climbing vine kudzu, a member of the leguminous
pea family (Fabaceae), was introduced into the USA
from its native Asia in the 1800s. It was initially lauded
for efficacy in erosion control along highways and as a
high-quality grazing crop for livestock. P. montana var.
lobata has since become a truculent invasive, spreading
via vegetative runners and seed dispersal. Seven
million acres of the American southeast are now
plagued by this vine.
An expanded view of complex traits from polygenic to omnigenicBARRY STANLEY 2 fasd
A central goal of genetics is to understand the links between genetic variation and disease. Intuitively,
one might expect disease-causing variants to cluster into key pathways that drive disease
etiology. But for complex traits, association signals tend to be spread across most of the
genome—including near many genes without an obvious connection to disease. We propose
that gene regulatory networks are sufficiently interconnected such that all genes expressed in disease-
relevant cells are liable to affect the functions of core disease-related genes and that most
heritability can be explained by effects on genes outside core pathways. We refer to this hypothesis
as an ‘‘omnigenic’’ model.
An expanded view of complex traits from polygenic to omnigenicBARRY STANLEY 2 fasd
A central goal of genetics is to understand the links between genetic variation and disease. Intuitively,
one might expect disease-causing variants to cluster into key pathways that drive disease
etiology. But for complex traits, association signals tend to be spread across most of the
genome—including near many genes without an obvious connection to disease. We propose
that gene regulatory networks are sufficiently interconnected such that all genes expressed in disease-
relevant cells are liable to affect the functions of core disease-related genes and that most
heritability can be explained by effects on genes outside core pathways. We refer to this hypothesis
as an ‘‘omnigenic’’ model.
Association mapping for improvement of agronomic traits in riceSopan Zuge
This document summarizes a seminar on association mapping in plants. It discusses how association mapping offers greater precision in locating quantitative trait loci (QTLs) than family-based linkage analysis by taking advantage of linkage disequilibrium across diverse populations. The key steps in association mapping are described, including population selection and structure analysis, high-throughput phenotyping and genotyping, measuring linkage disequilibrium, and association analysis to identify marker-trait links. Software for conducting association mapping and case studies in rice are also reviewed.
Ap Chapter 26 Evolutionary History Of Biological Diversitysmithbio
- Phylogeny investigates the evolutionary history and relationships between species through analysis of fossil, molecular, and genetic data. Systematists use phylogenetic trees to depict these relationships.
- A phylogenetic tree represents hypotheses about evolutionary divergence. Each branch point indicates where two lineages diverged from a common ancestor. Tracing shared derived characteristics helps systematists determine which groupings form monophyletic clades.
- Molecular systematics uses comparisons of DNA and other molecules to infer relatedness and reconstruct evolutionary history over hundreds of millions of years, helping to extend phylogenies beyond what can be learned from fossils alone. Molecular clocks and neutral theory aim to date evolutionary events but have limitations.
Chasing a Unicorn for Model Host-Microbiome-Systems Jonathan Eisen
This document summarizes a presentation given by Jonathan Eisen on his research into the rice microbiome. Some key points:
- Eisen studies how the rice plant and its genotype influence the microbial communities that colonize its roots (rhizosphere, rhizoplane, endosphere).
- In greenhouse experiments, rice genotype explained a significant amount of variation in root microbial communities. Certain microbes were enriched or depleted across root compartments.
- Field experiments also found the rice cultivation site and farming practices influenced root microbiome composition.
- Dynamics studies showed microbes rapidly colonize roots within 24 hours of transplantation, with shifts in community composition over time.
- Network analysis revealed microbial modules involved in methane cycling that varied across
Chasing a Unicorn for Model Host-Microbiome-Systems
phages manuscript HHMI (1)
1. Dramatic variation in phage genome structures revealed by whole genome comparisons
Welkin Pope1
, Charles Bowman1
, SEA-PHAGES2
, PHIRE3
, K-RITH MGC4
, Deborah Jacobs-
Sera1
, Daniel A. Russell1
, Steven Cresawn5
, William R. Jacobs Jr.6
, Jeffrey G. Lawrence1
,
Roger W. Hendrix1
, and Graham F. Hatfull1
*.
1
Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260
2
Science Education Alliance Phage Hunters Advancing Genomics and Evolutionary Science
3
Phage Hunters Integrating Research and Education
4
KwaZulu-Natal Institute for TB and HIV research Mycobacterial Genetics Course
5
Department of Biology, James Madison University, Harrisonburg, VA
6
Department of Microbiology and Immunology, Albert Einstein College of Medicine, NY
*Corresponding Author
2. 2
Bacteriophages are the dark matter of the biological universe1
, forming a vast, dynamic,
old, and genetically diverse population2
. Horizontal exchange generates pervasive
genome mosaicism, with different genome segments having distinct evolutionary
histories3
. Phages of phylogenetically distant hosts typically share low nucleic acid
sequence similarity, and few share genes with amino acid sequence similarity2
. Phages
of a single common host can also span considerable sequence diversity even though
they are in direct genetic contact1
. Comparative genomics of a large collection of phages
isolated on Mycobacterium smegmatis provides insights into the size and diversity of
groups of related phages and the extent to which the groups are discrete and genetically
isolated from other phages. We show that both the diversity and genetic isolation of
phage groups varies enormously. Some are discrete and share few genes with other
phages, whereas others are genetically connected to many other phages. The phage
population thus spans a continuum of relationships, but with phages of different types
varying enormously in prevalence. The reticulate relationships resulting from pervasively
mosaic architectures confound hierarchical taxonomic phage classification or
application of simple numerical values to distinguish among phage genomic types.
Bacteriophages are the most abundant organisms in the biosphere, and the ~1031
tailed phage
particles participate in ~1023
infections per second on a global scale, with the entire population
turning over every few days4
. Virion structures suggest the population is also extremely old5
and
thus the great genetic diversity of phages is not surprising2
. Phages likely evolved with common
ancestry and access to a large common gene pool3
, although rates of horizontal exchange are
heterogeneous, being influenced by host range, varying phage migration rates across the
microbial landscape, and lifestyle (temperate or virulent)6
. Multiple processes determine this
including local host diversity and mutation rates, as well as resistance mechanisms such as
receptor availability, restriction, CRISPRs, and abortive infection systems6,7
. Constraints on
3. 3
gene acquisition may also be imposed by synteny – particularly among virion structural genes –
and by size limits of DNA packaging2,8
.
Genomic comparison of phages infecting a common host provides insights into evolutionary
mechanisms and the structure of their genetic diversity9
. Relatively small numbers of phage
genomes have been sequenced for hosts such as Escherichia coli, Salmonella,
Staphylococcus, Pseudomonas, and Propionibacterium10-13
revealing varying degrees of genetic
diversity. Mycobacteriophages isolated from environmental samples using Mycobacterium
smegmatis mc2
155 as a host are architecturally mosaic1
and span considerable diversity, but
can be grouped into ‘clusters’ of related phages that share little or no nucleotide sequence
similarity with other phages1,14-18
. Some clusters are heterogeneous and can be readily divided
into subclusters by their nucleotide similarities. Recent analysis of phages adsorbed to
Synechococcus revealed 26 discrete ‘populations’, although they were obtained from a single
sample and are predominantly morphologically myoviral (T4-like)9
. However, these populations
likely represent only a small portion Synechococcus phages because the genomes of 17 fully
sequenced phages infecting Synechoccocus or closely-related hosts fail to associate with these
“populations”9
. These populations may thus reflect sampling bias of the single environment
examined, and extensive genomic mosaicism found in phages of Synechococcus and other
hosts1,3,19
warrants caution in extrapolation of the concept of discrete phage populations in the
absence of complete genome sequences.
The Howards Hughes Medical Institute (HHMI) Science Education Alliance Phage Hunters
Advancing Genomics and Evolutionary Science (SEA-PHAGES) program has facilitated
expansion of the number of sequenced mycobacteriophage genomes to 627 (Table S1) by
engaging large numbers of undergraduates in phage discovery and genomics20
. The size of this
collection now provides sufficient resolution to offer insights into the diversity and genetic
4. 4
isolation of phage genome types. Here we address the question of whether the groups of
related phages represent primarily discrete populations or genetically intermixed groups.
Although the collection excludes viruses that don’t form plaques under laboratory conditions, the
phages were isolated from widely dispersed geographical locations, including nine countries
and 36 of the continental United States (Fig. S1), over a dozen or more years. All are dsDNA
tailed phages (Caudovirales), and are morphologically siphoviral, except cluster C myoviruses.
Most have isometric heads except for singleton MooMoo and the Cluster I and O phages, which
have prolate heads21
.
Using previously reported parameters15
the 627 genomes were assembled into 20 clusters (A –
T) and 8 singletons (with no close relatives) with large variations in Cluster sizes (Table 1, Fig.
S2); 11 clusters can be subdivided into 2 to 11 subclusters (Table 1). Clustered phages typically
share genome architectures; for example, Cluster A phages are similar in size, transcriptional
organization, and share an unusual immunity system16,22
. A different set of clustering
parameters would generate different profiles, but not alter the core observation that there are
large variations among the different phage types. Cluster designation is simple for some phage
types because of extensive nucleotide similarity (e.g. Cluster C; Fig. S2), and if all clusters
resembled Cluster C, our data would be congruent with the Synechococcus populations 9
. But
many do not, revealing more complex relationships.
To compare mycobacteriophage gene contents we grouped related genes into phamilies using
Phamerator23
, modified to use kclust24
. The 69,633 genes assembled into 5,205 phams of which
1,613 (31%) are orphams14
(single-gene phamilies), and the gene content relationships are
represented as a network phylogeny in Fig. 1. In general, branch lengths provide strong support
for cluster and subcluster designations (Table 1, Fig. S2); the proportions of orphams per
genome provide additional support, which as expected is highest for singletons and single-
5. 5
genome subclusters (Fig. S3). Determination of the proportions of shared genes by pairwise
comparisons reveals the complexity of the genetic relationships (Fig. 2), and three major
features are apparent.
First, the overall phage relationships closely mirror the cluster and subcluster designations
derived by DNA similarities (Fig. S2). Secondly, the intra-cluster and intra-subcluster diversity
varies enormously, and this is quantified as the Cluster Cohesion Index (CCI, average number
of genes/genome divided by the total number of phamilies in the cluster; Table 1, Fig. 3). Thus
in clusters such as Cluster A (CCI, 0.08), the total number of phamilies is vastly greater than the
average number of genes per genome, indicating high diversity. The diversity of the A
subclusters is also highly varied with CCI values ranging from 0.22 to 0.91 (Table S1). In
contrast, Clusters G and O have low diversity (high CCI values) and closely related genomes
(Table 1; Fig. 3).
Thirdly, the degree to which clusters are genetically connected to other phages varies greatly,
and is quantified as the Cluster Isolation Index (CII, the percentage of phamilies not present in
genomes outside of the cluster; Table 1, Fig. 3). Some clusters such as Clusters A, B, C, and Q
share relatively few genes (<25%) with other phages and have high CCI values (Fig. 3). Other
groups, such as Clusters I and P, share >60% of their genes with other phages (Table 1),
reflecting the DNA relationships (Fig. S4). There are therefore no universally applicable values
of either diversity or isolation for different phage groups, and the most striking picture emerging
is one of great diversity with unequal representation of different types (Fig. 3). This is in marked
contrast to the discreet populations reported for Synechococcus phages9
.
These comparisons reveal additional complexities arising from highly mosaic genomes (Figs.
S5-S8). For example, Dori is clearly related to Cluster B phages (Fig. 1) with which it shares 20-
6. 6
26% of its genes and limited DNA similarity (Fig. S5), but also has nucleotide similarity and
shares genes with Cluster N and I2 phages, among others (Fig. S5, S7A), as reflected in its low
CII (Table 1, Fig. 3). Likewise, the singleton MooMoo has segments of DNA similarity and
shares ~20% of its genes with Cluster F phages (Fig. 1, S6, S7B), but also has similarity to
Clusters N and I; it also has a low CII (Table 1, Fig. 3). It has low DNA similarity to Cluster O
(Fig. S6), but shares several genes and has the same unusual prolate morphology (Fig. 1).
Complex relationships are also seen in the singletons Gaia and Sparky (Fig. S8).
Bacteriophage taxonomic classification reflecting phylogeny presents substantial challenges
because of genome mosaicism25
. Classification by viral morphology is well established, but may
not accurately report the genetic relationships, as observed for the prolate-headed MooMoo
(Fig. 1). We also note that the mycobacteriophage myoviruses have a high CII and form a
discrete group (Table 1) as for the Synechococcus phages9
, perhaps reflecting a virulent
lifestyle that constrains productive gene exchange; host range mutability may also differ in
phages with different morphotypes, limiting access to the gene pool. Although grouping phages
into clusters and subclusters provides analytical advantages because of the wide range in
prevalence of the different types (Table 1), it is not suitable as a broadly applicable hierarchical
taxonomic system. Reticulate taxonomies more accurately reflect the phylogenetic
complexities25,26
.
Given the sampling ranges of these phages, it seems unlikely that the population profile
reported here is specific for M. smegmatis mc2
155 phages and we predict that related profiles
will be found for phages isolated from similar environments using different hosts. However,
phage types occurring rarely in M. smegmatis may be abundant in phylogenetically proximal
hosts, and we predict that phage populations at large – regardless of host – represent a
continuum of complex reticulate relationships. Finally, we predict that the overall diversity of the
7. 7
phage population is in large part a consequence of narrow but mutable viral host ranges, which
promotes local genetic isolation and constrains access to the common gene pool.
METHODS
In addition to extant GenBank sequence information, mycobacteriophages were isolated,
sequenced, and annotated in the Phage Hunters Integrating Research and Education (PHIRE)
or Science Education Alliance Phage Hunters Advancing Genomics and Evolutionary Science
(SEA-PHAGES) programs. All genome sequences are publically available at phagesDB.org or
in GenBank. Nucleotide comparisons used BlastN or Gepard27
. To create database
Mykobacteriophage_627, phamilies were constructed by first clustering to an equivalent of 70%
amino acid sequence identity and a 25% size threshold, followed by multiple sequence
alignment using kAlign28
. Consensus sequences were extracted using hhmake and
hhconsensus29
, and passed through a second iteration of kClust, clustering proteins above a
threshold e-value of 10-4
. CCI values were calculated as the average number of genes/genome
divided by the total number of phams in that cluster. Thus if all genomes in a cluster are
identical (and if phamilies occur only once in a genome), CCI would be one; the CCI for two sets
of five randomly chosen genomes is ~0.02. CII is the percentage of phams present within a
cluster that are not present in other mycobacteriophage genomes. Students, faculty and their
contributions to authorship are listed in Table S3.
ACKNOWLEDGEMENTS
This work was supported in part by the Howard Hughes Medical Institute SEA-PHAGES
program, by the Howard Hughes Medical Institute through its Professorship grant to GFH, and
by NIH grant GM51975 to GFH.
8. 8
Author Contributions
Authors and contributions are listed in Table S3.
9. 9
References
1 Pedulla, M. L. et al. Origins of highly mosaic mycobacteriophage genomes. Cell 113, 171-
182 (2003).
2 Hatfull, G. F. & Hendrix, R. W. Bacteriophages and their Genomes. Current Opinions in
Virology 1, 298-303 (2011).
3 Hendrix, R. W., Smith, M. C., Burns, R. N., Ford, M. E. & Hatfull, G. F. Evolutionary
relationships among diverse bacteriophages and prophages: all the world's a phage. Proc
Natl Acad Sci U S A 96, 2192-2197 (1999).
4 Suttle, C. A. Marine viruses--major players in the global ecosystem. Nat Rev Microbiol 5,
801-812 (2007).
5 Krupovic, M. & Bamford, D. H. Order to the viral universe. J Virol 84, 12476-12479,
doi:10.1128/JVI.01489-10 (2010).
6 Jacobs-Sera, D. et al. On the nature of mycobacteriophage diversity and host preference.
Virology 434, 187-201, doi:10.1016/j.virol.2012.09.026 (2012).
7 Buckling, A. & Brockhurst, M. Bacteria-virus coevolution. Adv Exp Med Biol 751, 347-370,
doi:10.1007/978-1-4614-3567-9_16 (2012).
8 Juhala, R. J. et al. Genomic sequences of bacteriophages HK97 and HK022: pervasive
genetic mosaicism in the lambdoid bacteriophages. J Mol Biol 299, 27-51,
doi:10.1006/jmbi.2000.3729 (2000).
9 Deng, L. et al. Viral tagging reveals discrete populations in Synechococcus viral genome
sequence space. Nature 513, 242-245, doi:10.1038/nature13459 (2014).
10 Kwan, T., Liu, J., DuBow, M., Gros, P. & Pelletier, J. The complete genomes and
proteomes of 27 Staphylococcus aureus bacteriophages. Proc Natl Acad Sci U S A 102,
5174-5179 (2005).
11 Kwan, T., Liu, J., Dubow, M., Gros, P. & Pelletier, J. Comparative genomic analysis of 18
Pseudomonas aeruginosa bacteriophages. J Bacteriol 188, 1184-1187 (2006).
10. 10
12 Kropinski, A. M., Sulakvelidze, A., Konczy, P. & Poppe, C. Salmonella phages and
prophages--genomics and practical aspects. Methods Mol Biol 394, 133-175 (2007).
13 Marinelli, L. J. et al. Propionibacterium acnes bacteriophages display limited genetic
diversity and broad killing activity against bacterial skin isolates. MBio 3,
doi:10.1128/mBio.00279-12 (2012).
14 Hatfull, G. F. et al. Comparative genomic analysis of 60 Mycobacteriophage genomes:
genome clustering, gene acquisition, and gene size. J Mol Biol 397, 119-143,
doi:10.1016/j.jmb.2010.01.011 (2010).
15 Hatfull, G. F. et al. Exploring the mycobacteriophage metaproteome: phage genomics as an
educational platform. PLoS Genet 2, e92 (2006).
16 Pope, W. H. et al. Expanding the Diversity of Mycobacteriophages: Insights into Genome
Architecture and Evolution. PLoS ONE 6, e16329 (2011).
17 Hatfull, G. F. et al. Complete genome sequences of 63 mycobacteriophages. Genome
announcements 1, doi:10.1128/genomeA.00847-13 (2013).
18 Hatfull, G. F. et al. Complete genome sequences of 138 mycobacteriophages. J Virol 86,
2382-2384, doi:10.1128/JVI.06870-11 (2012).
19 Hendrix, R. W., Hatfull, G. F. & Smith, M. C. Bacteriophages with tails: chasing their origins
and evolution. Res Microbiol 154, 253-257 (2003).
20 Jordan, T. C. et al. A broadly implementable research course in phage discovery and
genomics for first-year undergraduate students. MBio 5, e01051-01013,
doi:10.1128/mBio.01051-13 (2014).
21 Hatfull, G. F. The secret lives of mycobacteriophages. Adv Virus Res 82, 179-288,
doi:10.1016/B978-0-12-394621-8.00015-7 (2012).
22 Brown, K. L., Sarkis, G. J., Wadsworth, C. & Hatfull, G. F. Transcriptional silencing by the
mycobacteriophage L5 repressor. Embo J 16, 5914-5921, doi:10.1093/emboj/16.19.5914
(1997).
11. 11
23 Cresawn, S. G. et al. Phamerator: a bioinformatic tool for comparative bacteriophage
genomics. BMC Bioinformatics 12, 395, doi:10.1186/1471-2105-12-395 (2011).
24 Hauser, M., Mayer, C. E. & Soding, J. kClust: fast and sensitive clustering of large protein
sequence databases. BMC Bioinformatics 14, 248, doi:10.1186/1471-2105-14-248 (2013).
25 Lawrence, J. G., Hatfull, G. F. & Hendrix, R. W. Imbroglios of viral taxonomy: genetic
exchange and failings of phenetic approaches. J Bacteriol 184, 4891-4905 (2002).
26 Lima-Mendez, G., Toussaint, A. & Leplae, R. Analysis of the phage sequence space: the
benefit of structured information. Virology 365, 241-249 (2007).
27 Krumsiek, J., Arnold, R. & Rattei, T. Gepard: a rapid and sensitive tool for creating dotplots
on genome scale. Bioinformatics 23, 1026-1028 (2007).
28 Lassmann, T. & Sonnhammer, E. L. Kalign--an accurate and fast multiple sequence
alignment algorithm. BMC Bioinformatics 6, 298, doi:10.1186/1471-2105-6-298 (2005).
29 Remmert, M., Biegert, A., Hauser, A. & Soding, J. HHblits: lightning-fast iterative protein
sequence searching by HMM-HMM alignment. Nat Methods 9, 173-175,
doi:10.1038/nmeth.1818 (2012).
30 Huson, D. H. & Bryant, D. Application of phylogenetic networks in evolutionary studies. Mol
Biol Evol 23, 254-267, doi:10.1093/molbev/msj030 (2006).
12. 12
Figure Legends
Figure 1. Network phylogeny of 627 mycobacteriophages based on gene content.
Genomes of 627 mycobacteriophages were compared according to shared gene content using
the Phamerator23
database mykobacteriophage_627, and displayed using Splitstree30
. Colored
circles indicate grouping of phages labeled according to their cluster designations generated by
nucleotide sequence comparison (Fig. S2); singleton genomes with no close relatives are
labeled but not circled. Micrographs show morphotypes of the singleton MooMoo, the Cluster F
phage Mozy, and the Cluster O phage Corndog. With the exception of DS6A, all of the phages
infect M. smegmatis mc2
155.
Figure 2. Heat map representation of shared gene content among 627
mycobacteriophages. The percentages of pairwise shared genes was determined using a
database (mykobacteriophage_627) generated by Phamerator23
populated with 627 completely
sequenced phage genomes. The 69,574 genes were assembled into 5,205 phamilies (phams)
of related sequences using kclust, and the average percentages of shared phams calculated.
Genomes are ordered on both axes according to their cluster and subcluster designations
determined by nucleotide sequence similarities (Fig. S2). The values are colored as indicated.
Figure 3. Relationships between Cluster Cohesion and Cluster Isolation Indexes of
Mycobacteriophage groups. Mycobacteriophage clusters and singletons are plotted
according to their Cluster Isolation Index and Cluster Cohesion Index. Groups are colored
according to the numbers of phages in that group; scale is shown above. There is enormous
variation in both cluster isolation and cluster diversity among the different groups.
13. Table 1. Diversity and genetic isolation of mycobacteriophage genome clusters
Cluster # Subclusters # Genomes Avg # genes
1
Ave length (bp) Total phams
2
Total genes Cluster Cohesion
3
Cluster Isolation
4
A 11 232 90 51514 1085 20880 0.08 80.2
B 5 109 100.4 68653 421 10944 0.24 81.0
C 2 45 231 155504 486 10395 0.48 84.6
D 2 10 89.3 64965 147 893 0.61 71.4
E 1 35 141.9 75526 236 4967 0.60 59.3
F 3 66 105.3 57416 658 6950 0.16 55.8
G 1 14 61.5 41845 72 861 0.85 55.6
H 2 5 98.4 69469 207 492 0.48 67.6
I 2 4 78 49954 147 312 0.53 23.8
J 1 16 239.8 110332 530 3776 0.45 58.5
K 5 32 95.7 59720 411 3069 0.23 73.5
L 3 13 127.9 75177 246 1663 0.52 72.4
M 2 3 141 81636 201 423 0.70 69.2
N 1 7 69.1 42888 152 484 0.45 40.8
O 1 5 124.2 70651 151 621 0.82 64.2
P 2 9 78.8 47668 159 709 0.50 34.0
Q 1 5 85.2 53755 90 426 0.95 73.3
R 1 4 101.5 71348 117 406 0.87 71.8
S 1 2 109 65172 117 218 0.93 70.9
T 1 3 66.7 42833 83 200 0.80 62.7
Dori 1 1 94 64613 94 94 1.00 35.8
DS6A 1 1 97 60588 96 97 1.01 58.3
Gaia 1 1 194 90460 193 194 1.01 58.0
MooMoo 1 1 98 55178 98 98 1.00 31.6
Muddy 1 1 71 48228 70 71 1.01 71.4
Patience 1 1 109 70506 109 109 1.00 57.8
Sparky 1 1 93 63334 93 93 1.00 48.4
Wildcat 1 1 148 78296 148 148 1.00 69.6
1
Average number of protein-coding genes per genome
2
Total phams is the sum of all phamilies (groups of homologous mycobacteriophage genes) in that cluster
3
Cluster Cohesion Index (CCI) is generated by dividing the average number of genes per genome by the total number of phamilies (phams) in
that cluster. For singleton phages (bottom eight rows) the number of phams is equivalent to the number of genes (.e. CCI is one), except
where phams are represented by two or more genes in the same genome.
4
Cluster Isolation Index (CII) is the percentage of phams that are present only in that cluster, and not present in other mycobacteriophages
16. A B
C
K
F
N
P
I
J
H
L D
M
E
O
T
R S
Q
G
ClusterIsolationIndexMoreIsolatedLessIsolated
Cluster Cohesion Index
Less DiverseMore Diverse
0 0.2 0.4 0.6 0.8 1.0
20
30
40
50
60
70
80
90
Wildcat
Muddy
MooMoo
Dori
Sparky
Gaia
DS6A
Patience
>200 100-200 50-100 10-50 5-10 2-5 Singleton
Figure 3
17. SUPPLEMENTARY DATA
Supplementary Tables
Table S1. Phages used in this study and their cluster designation
Table S2. Genometrics and Cluster Cohesion Index of mycobacteriophages.
Supplementary Figures
Figure S1. Geographical distribution of sequenced mycobacteriophages. (A) Locations of
sequenced mycobacteriophages across the globe. (B) Locations of sequenced
mycobacteriophages across the United States. Data from www.phagesDB.org.
Figure S2. Nucleotide sequence comparison of 627 mycobacteriophages displayed as a
dotplot. Complete genome sequences of 627 mycobacteriophages were concatenated into a
single file and compared with itself using Gepard1
and displayed as a dotplot. The order of the
genomes is as listed in Table S1. Nucleotide similarity is a primary component in assembling
phages into Clusters, which typically requires evident DNA similarity spanning more than 50% of
the genome lengths.
Figure S3. Proportions of orphams in mycobacteriophage genomes. The proportions of
genes that are orphams (i.e. single-gene phamilies with no homologues within the
mycobacteriophage dataset) are shown for each phage. The order of the phages is as shown in
Table S1. All of the singleton genomes have >30% orphams, and most of the other genomes
with relatively high proportions of orphams are the single-genome subclusters (see Table S2)
including Hawkeye (D2), Myrna (C2), Squirty (F3), Barnyard (H2), Che9c (I2), Whirlwind (L3),
Rey (M2), and Purky (P2). Three phages shown in red type are not singletons or single-
genome subclusters but have relatively high proportion of orphams. Predator and Menkokysei
18. are members of the diverse and small clusters (5 or fewer genomes) H, and T respectively;
KayaCho is a member of Subcluster B4 but has a sufficiently high proportion of orphams to
arguably warrant formation of a new subcluster, B6.
Figure S4. Dotplot of phages in Clusters I, N, P and the singleton Sparky. Dotplot was
generated using a concatenated file of genome sequences using Gepard1
. The complexity of
the genome relationships is illustrated by the Cluster I phages which share varying degrees of
similarity to phages in Clusters N and P, as well as the singleton Sparky. Because inclusion of
a phage in a cluster typically requires sharing a span of similarity over half of the genome
lengths, these phages are not assembled into a single larger cluster.
Figure S5. Dotplot of Carcharodon, Che9c, Kheth and Dori. The dotplot of concatenated
genome sequences illustrates the ambiguity of whether the singleton Dori warrants inclusion in
Cluster B. Dori shares DNA sequence similarity with its closest relative Kheth (Subcluster B2),
but it does not span 50% of the genome lengths. Dori also share DNA sequence similarity with
Che9c (Cluster I2) and Carcharodon (Cluster N).
Figure S6. Dotplot of Corndog, Brujita, SG4, Yoshi, and MooMoo. The dotplot of
concatenated genome sequences illustrates the complex relationships between the singleton
MooMoo and other phages. MooMoo shares DNA sequence similarity with SG4 (Subcluster F1)
and Yoshi (Subcluster F2), but also with Brujita (Subcluster I1). MooMoo has barely detectable
DNA sequence similarity with Corndog (Cluster O), but has a similar prolate virion morphology.
Figure S7. Shared gene content between Dori, MooMoo, and other mycobacteriophages.
A. Average percentages of genes shared between Dori and other mycobacteriophages. B.
Average percentages of genes shared between MooMoo and other mycobacteriophages.
19. Genomes on the x axis are listed in the same order as in Table S1 and the cluster designations
are indicated.
Figure S8. Shared gene content between Gaia, Sparky, and other mycobacteriophages.
A. Average percentages of genes shared between Gaia and other mycobacteriophages. B.
Average percentages of genes shared between Sparky and other mycobacteriophages.
Genomes on the x axis are listed in the same order as in Table S1 and the cluster designations
are indicated.
20. References
1 Krumsiek, J., Arnold, R. & Rattei, T. Gepard: a rapid and sensitive tool for creating dotplots
on genome scale. Bioinformatics 23, 1026-1028 (2007).
21. Table S1. Phages used in this study and their cluster designation
Phage Name Clus
Abrogate A1
Aeneas A1
Alsfro A1
Anglerfish A1
Arcanine A1
BPBiebs31 A1
BeesKnees A1
Bethlehem A1
BillKnuckles A1
Bob3 A1
Bruns A1
Bxb1 A1
ConceptII A1
Corvo A1
DD5 A1
Doom A1
Dreamboat A1
Dynamix A1
Edtherson A1
Euphoria A1
Fascinus A1
Forsytheast A1
Fushigi A1
GageAP A1
Hope4ever A1
Ichabod A1
JC27 A1
Jasper A1
KBG A1
KSSJEB A1
Kugel A1
Kykar A1
Lamina13 A1
Lesedi A1
Lockley A1
MPlant7149 A1
Magnito A1
Manatee A1
Marcell A1
McGuire A1
MetalQZJ A1
MrGordo A1
Museum A1
Papez A1
Pari A1
PattyP A1
Pepe A1
Perseus A1
Petp2012 A1
PhrostyMug A1
Pinto A1
RidgeCB A1
Ringer A1
Rufus A1
Ruotula A1
Rutherferd A1
Sarfire A1
Scowl A1
SkiPole A1
Solon A1
Switzer A1
Target A1
Thor A1
Treddle A1
Tripl3t A1
Trouble A1
Turj99 A1
U2 A1
Violet A1
Wheeler A1
Zephyr A1
Zeuska A1
ADZZY A2
Bugsy A2
Changeling A2
Che12 A2
ChipMunk A2
D29 A2
EagleEye A2
Echild A2
Equemioh13 A2
EvilGenius A2
Heffalump A2
IronMan A2
Jerm A2
Jsquared A2
L5 A2
Larenn A2
Loser A2
Odin A2
Piro94 A2
Power A2
Pukovnik A2
RedRock A2
SemperFi A2
Serenity A2
SweetiePie A2
Trixie A2
Turbido A2
Whabigail7 A2
Aglet A3
Bxz2 A3
DaHudson A3
EpicPhail A3
Farber A3
GingkoMaracino A3
Grum1 A3
Hercules11 A3
JHC117 A3
Jobu08 A3
Lilith A3
Mainiac A3
MarQuardt A3
Marie A3
Methuselah A3
Microwolf A3
Misomonster A3
Ollie A3
P28Green A3
Phoxy A3
PotatoSplit A3
PurpleHaze A3
Sabia A3
Spike509 A3
Taurus A3
Tiffany A3
Vix A3
Zetzy A3
BabyRay A31
HelDan A31
Norbert A31
Phantastic A31
Pocahontas A31
Popcicle A31
QuinnKiro A31
Rockstar A31
Veracruz A31
Abdiel A4
Achebe A4
Arturo A4
Backyardigan A4
BellusTerra A4
Broseidon A4
Bruiser A4
BubbleTrouble A4
Burger A4
Caelakin A4
Camperdownii A4
Clarenza A4
Dhanush A4
Eagle A4
Eris A4
Flux A4
Funston A4
Gadost A4
HamSlice A4
Holli A4
ICleared A4
KFPoly A4
Kampy A4
Kratark A4
LHTSCC A4
Lemur A4
LittleGuy A4
Maverick A4
Medusa A4
MeeZee A4
Melvin A4
Millski A4
Morpher26 A4
Mundrea A4
Nyxis A4
Obama12 A4
Peaches A4
Phighter1804 A4
Pipcraft A4
Sabertooth A4
Shaka A4
TinaFeyge A4
TiroTheta9 A4
TygerBlood A4
Wander A4
Wile A4
Airmid A5
Aragog A5
Archetta A5
Benedict A5
Chadwick A5
Cuco A5
ElTiger69 A5
ForGetIt A5
George A5
LittleCherry A5
Naca A5
Phlorence A5
Swirley A5
Theia A5
Tiger A5
UnionJack A5
Blue7 A6
DaVinci A6
EricB A6
Gladiator A6
Hammer A6
Jeffabunny A6
JewelBug A6
Kazan A6
McFly A6
SuperAwesome A6
VohminGhazi A6
HINdeR A7
Sheen A7
Timshel A7
Astro A8
Expelliarmus A8
Saintus A8
Smeadley A8
Alma A9
Catalina A9
Myxus A9
PackMan A9
Goose A10
KittenMittens A10
Rebeuca A10
RhynO A10
Severus A10
Trike A10
Twister A10
Bachome A11
Et2Brutus A11
Fibonacci A11
Mulciber A11
Adjutor D1
BigMama D1
Butterscotch D1
Gumball D1
Nova D1
PBI1 D1
PLot D1
SirHarley D1
Troll4 D1
Hawkeye D2
244 E
ABCat E
Bask21 E
Cactus E
Cjw1 E
Contagion E
Czyszczon1 E
DrDrey E
Dumbo E
Dusk E
Elph10 E
Eureka E
Goku E
Henry E
Hopey E
Kostya E
Lilac E
MadamMonkfish E
Murphy E
NelitzaMV E
NoSleep E
Pharsalus E
Phaux E
Phrux E
Porky E
Pumpkin E
Rakim E
RiverMonster E
Simpliphy E
SirDuracell E
Stark E
TeardropMSU E
Toto E
Tuco E
Ukulele E
Ardmore F1
Batiatus F1
Bipolar F1
Bobi F1
Boomer F1
Brocalys F1
Bubbles123 F1
BuzzLyseyear F1
Cabrinians F1
CaptainTrips F1
Cerasum F1
Che8 F1
DLane F1
Daenerys F1
Dante F1
DeadP F1
Dorothy F1
DotProduct F1
Drago F1
Empress F1
Estave1 F1
Fruitloop F1
GUmbie F1
Girr F1
Hades F1
Hamulus F1
Hegedechwinu F1
Ibhubesi F1
Inventum F1
Job42 F1
Krakatau F1
Llama F1
Llij F1
Mantra F1
MilleniumForce F1
Minnie F1
MisterCuddles F1
Mozy F1
Mutaforma13 F1
Ogopogo F1
Ovechkin F1
PMC F1
Pacc40 F1
Pippy F1
Ramsey F1
RockyHorror F1
Ruby F1
SG4 F1
Saal F1
Shauna1 F1
ShiLan F1
SiSi F1
Spartacus F1
Spoonbill F1
SuperGrey F1
Taj F1
Tweety F1
Velveteen F1
Wee F1
dirtMcgirt F1
Avani F2
Che9d F2
Jabbawokkie F2
Yoshi F2
Zapner F2
Squirty F3
Angel G
Annihilator G
Avrafan G
BPs G
BQuat G
BruceB G
Cherrybomb426 G
Frosty24 G
Gomashi G
Halo G
Hope G
Liefie G
Phreak G
Zombie G
Damien H1
Konstantine H1