I shikha popali and my colleague harshpal singh wahi presents a presentation "RECENT DEVELOPMENT IN DRUG DESIGN AND DISCOVERY " A detail account on protein structure is given
Clinical Data Management Plan_Katalyst HLSKatalyst HLS
Introduction to Data Management Plan in Clinical Data Management in Clinical Trials of Pharmaceuticals, Bio-Pharmaceuticals, Medical Devices, Cosmeceuticals and Foods.
In this ppt the viewer will able to understand about SAS software. It is a statistical software suite developed by SAS Institute for data management. SAS was developed at North Carolina State University from 1966 until 1976, when SAS Institute was incorporated. SAS was further developed in the 1980s and 1990s with the addition of new statistical procedures, additional components and the introduction of JMP. A point-and-click interface was added in version 9 in 2004. A social media analytics product was added in 2010.
• Portion explained:
• Components of SAS Software
• Origins of SAS Software
• Development of SAS Software
• Recent History of SAS Software
• Software products of SAS Software
• Adoption of SAS Software
• Application of SAS Software
Bioinformatics involves the analysis of biological information using computers and statistical techniques,
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.
The sequence alignment is made between a known sequence and unknown sequence or between two unknown sequences. The known sequence is called reference sequence. The unknown sequence is called query sequence .
BLAST stands for Basic Local Alignment Search Tool. It addresses a fundamental problem in bioinformatics research. BLAST tool is used to compare a query sequence with a library or database of sequences.
In Bioinformatics, is an algorithm and program for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences.
BLAST was developed by stochastic model of Samuel Karlin and Stephen Altschul in 1990. They proposed “a method for estimating similarities between the known DNA sequence of one organism with that of another”.
A BLAST search enables a researcher to compare a subject protein or nucleotide sequence (called a query sequence) with a library or database of sequences and identify database sequences that resemble the query sequence above a certain threshold.
Gut microflora and their role in susceptibility of lepidopteran pests to baci...Prema Latha
This topic to be covered Types of insect-microbe interactions, Microbial diversity in insects, Role of gut microflora on the susceptibility of Bacillus thuringiensis (Bt), Mode of Action of Bt, Role of gut microflora on the susceptibility of Bt and more case studies supported to this topic.
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...Arvinder Singh
‘NATIONAL CONFERENCE ON MAN AND ENVIRONMENT’October 15 – 16, 2012
Organized by
Department of Zoology and Environmental Sciences, Punjabi University, Patiala (Pb.) – 147 002, India
I shikha popali and my colleague harshpal singh wahi presents a presentation "RECENT DEVELOPMENT IN DRUG DESIGN AND DISCOVERY " A detail account on protein structure is given
Clinical Data Management Plan_Katalyst HLSKatalyst HLS
Introduction to Data Management Plan in Clinical Data Management in Clinical Trials of Pharmaceuticals, Bio-Pharmaceuticals, Medical Devices, Cosmeceuticals and Foods.
In this ppt the viewer will able to understand about SAS software. It is a statistical software suite developed by SAS Institute for data management. SAS was developed at North Carolina State University from 1966 until 1976, when SAS Institute was incorporated. SAS was further developed in the 1980s and 1990s with the addition of new statistical procedures, additional components and the introduction of JMP. A point-and-click interface was added in version 9 in 2004. A social media analytics product was added in 2010.
• Portion explained:
• Components of SAS Software
• Origins of SAS Software
• Development of SAS Software
• Recent History of SAS Software
• Software products of SAS Software
• Adoption of SAS Software
• Application of SAS Software
Bioinformatics involves the analysis of biological information using computers and statistical techniques,
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.
The sequence alignment is made between a known sequence and unknown sequence or between two unknown sequences. The known sequence is called reference sequence. The unknown sequence is called query sequence .
BLAST stands for Basic Local Alignment Search Tool. It addresses a fundamental problem in bioinformatics research. BLAST tool is used to compare a query sequence with a library or database of sequences.
In Bioinformatics, is an algorithm and program for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences.
BLAST was developed by stochastic model of Samuel Karlin and Stephen Altschul in 1990. They proposed “a method for estimating similarities between the known DNA sequence of one organism with that of another”.
A BLAST search enables a researcher to compare a subject protein or nucleotide sequence (called a query sequence) with a library or database of sequences and identify database sequences that resemble the query sequence above a certain threshold.
Gut microflora and their role in susceptibility of lepidopteran pests to baci...Prema Latha
This topic to be covered Types of insect-microbe interactions, Microbial diversity in insects, Role of gut microflora on the susceptibility of Bacillus thuringiensis (Bt), Mode of Action of Bt, Role of gut microflora on the susceptibility of Bt and more case studies supported to this topic.
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...Arvinder Singh
‘NATIONAL CONFERENCE ON MAN AND ENVIRONMENT’October 15 – 16, 2012
Organized by
Department of Zoology and Environmental Sciences, Punjabi University, Patiala (Pb.) – 147 002, India
Is microbial ecology driven by roaming genes?beiko
Microbial ecology often makes assumptions about the relationship between phylogeny and function, but these assumptions can be invalidated by lateral gene transfer. We need to take a broader view of relationships between genes and genomes in order to make better sense out of microbes.
Bactriophage history and their uses in environment Jayan Eranga
this is to describe what is bacteriophage is and what is their use as indicator organisms and important in treating for wastewater treatment systems. also it describes their replication cycles as well as their historic milestones too.
Comparative study on screening methods of polyhydroxybutyrate (PHB) producing...inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Crimson publishers-5-MethylcytosineDNA Methylation Patterns among Gut Predomi...CrimsonpublishersMedical
5-MethylcytosineDNA Methylation Patterns among Gut Predominate Commensal Escherichia coli and Lactobacilli from the Balbas and Mazekh Domestic Sheep Breeds by Pepoyan AZ* in Research in Medical &Engineering Sciences
Identification of the positively selected genes governing host-pathogen arm r...Atai Rabby
Bacterial evolution is due to the adaptive nature of the core bacterial genomes that plays critical role in diversification, fitness and adaptation of the species to different environment and host. Since Vibrio cholerae represents an appropriate model organism for studying the interplay of environment and host driven factors shaping the microbial genome structure and function, the current study aims to identify genes that are under these strong forces in V. cholerae. Here, we employed a comparative genomics approach to identify genes that are under positive selection in ten strains of Vibrio sp. including four pathogenic V. cholerae strains. From the available genome sequence data, a total of 422 orthologous genes were identified by reciprocal BLAST best-hit method, recombination breakpoint frequency analysis and tree comparison method. These 422 genes, representing the core genome of Vibrio sp., constituted the dataset to be analyzed for evolutionary selections. The analysis of natural selection, based on Maximum Likelihood method on synonymous and non-synonymous substitution rate, confirms the hypothesis that the bacterial core genomes are mostly under purifying selection with a few positively selected regions. However, our finding also reveals that positively selected sites in the Vibrio genome occur in a wide range of different genes encompassing diverse functional pathways including cell surface proteins (e.g. outer membrane-specific lipoprotein transporter/assembly proteins etc.), cell motility proteins (e.g. flagellar motor switch proteins, flagellar hook and assembly proteins), nutrient acquisition (e.g. amino acid, carbohydrate and phosphate ABC transporters), DNA repair and transcription related proteins. Interestingly, these positively selected gene products are directly involved with host-pathogen interactions and fitness in gastrointestinal environment. Therefore, the collective evidences of these positively selected genes spanning several pathways raise the possibility of their involvement in evolutionary arms races with other bacteria, phages, and/or the host immune system. This finding points to the natural selections which is the responsible factor for the diversification of Vibrio genus.
Respiration of e. coli in the mouse intestineAndrew Fabich
Mammals are aerobes that harbor an intestinal ecosystem dominated by large numbers of anaerobic microorganisms. However, the role of oxygen in the intestinal ecosystem is largely unexplored. We used systematic mutational analysis to determine the role of respiratory metabolism in the streptomycin-treated mouse model of intestinal colonization. Here we provide evidence that aerobic respiration is required for commensal and pathogenic Escherichia coli to colonize mice. Our results showed that mutants lacking ATP synthase, which is required for all respiratory energy-conserving metabolism, were eliminated by competition with respiratory-competent wild-type strains. Mutants lacking the high-affinity cytochrome bd oxidase, which is used when oxygen tensions are low, also failed to colonize. However, the low-affinity cytochrome bo(3) oxidase, which is used when oxygen tension is high, was found not to be necessary for colonization. Mutants lacking either nitrate reductase or fumarate reductase also had major colonization defects. The results showed that the entire E. coli population was dependent on both microaerobic and anaerobic respiration, consistent with the hypothesis that the E. coli niche is alternately microaerobic and anaerobic, rather than static. The results indicate that success of the facultative anaerobes in the intestine depends on their respiratory flexibility. Despite competition for relatively scarce carbon sources, the energy efficiency provided by respiration may contribute to the widespread distribution (i.e., success) of E. coli strains as commensal inhabitants of the mammalian intestine.
Similar to Phylogenomics and the Diversity and Diversification of Microbes (20)
Innovations in Sequencing & Bioinformatics
Talk for
Healthy Central Valley Together Research Workshop
Jonathan A. Eisen University of California, Davis
January 31, 2024 linktr.ee/jonathaneisen
Thoughts on UC Davis' COVID Current ActionsJonathan Eisen
Slides I used for a presentation to Chancellor May's leadership council about the current state of UC Davis' response to COVID and how it could be improved
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Phylogenomics and the Diversity and Diversification of Microbes
1. Phylogenomics and the Diversity
and Diversification of Microbes
October 14, 2022
Talk at UC Merced
Quantitative and Systems Biology Colloqiuim
Jonathan A. Eisen
University of California, Davis
@phylogenomics
http://phylogenomics.me
6. Eisen Lab “Topics”
Phylogenomic
Methods
& Tools
Microbial
Phylogenomics
&
Evolvability
Phylogenomic
Resources
&
Reference Data
Communication
&
Participation
In Microbiology
& Science
Research
Projects
A Brief Tour
11. Symbiosis Under Stress
When organisms are placed under selective
pressure or stress where novelty would be
beneficial, can we predict which pathway
they will use?
What leads to interactions / symbioses
being a potential solution?
Can we manipulate interactions and/or force
new ones upon systems?
Extrinsic
Novelty
12. HMS Type 1: Nutrient Acquisition
Host
Microbiome Nutrients
E2
Extrinsic
13. HMS Type 1: Xylem Feeders
Glassy Winged Sharpshooter
Gut
Endosymbionts
Trying to
Live on
Xylem Fluid
Nancy Moran
Dongying Wu
E2
Extrinsic
Wu D, Daugherty SC, Van Aken SE, Pai GH, Watkins KL, Khouri H, et al. (2006) Metabolic Complementarity and Genomics of the Dual Bacterial Symbiosis of Sharpshooters. PLoS Biol 4(6): e188. https://doi.org/10.1371/journal.pbio.0040188
14. HMS Type 1: Nitrogen Acquisition
Oloton
Corn
Mucilage
Microbiome
Low
N
Van Deynze A, Zamora P, Delaux PM, Heitmann C, Jayaraman D, Rajasekar S, Graham D, Maeda J, Gibson D, Schwartz KD, Berry AM, Bhatnagar S, Jospin G, Darling A, Jeannotte R, Lopez J, Weimer BC, Eisen JA, Shapiro
HY, Ané JM, Bennett AB. 2018. Nitrogen fixation in a landrace of maize is supported by a mucilage-associated diazotrophic microbiota. PLoS Biology 16(8):e2006352. doi: 10.1371/journal.pbio.2006352. PMID: 30086128.
PMCID: PMC6080747.
E2
Extrinsic
15. Marine
Invertebrates
HMS Type 1: Chemosymbioses
Endosymbionts Carbon
Colleen
Cavanaugh
E2
Extrinsic
Eisen JA, et al.. 1992. Phylogenetic relationships of chemoautotrophic bacterial symbionts of Solemya velum Say (Mollusca: Bivalvia) determined by 16S rRNA gene sequence analysis. Journal of Bacteriology 174: 3416-3421. PMID: 1577710. PMCID:
PMC206016.
Newton ILG, et al 2007. The Calyptogena magni
fi
ca chemoautotrophic symbiont genome. Science 315: 998-1000
Dmytrenko O, et al. 2014. The genome of the intracellular bacterium of the coastal bivalve, Solemya velum: a blueprint for thriving in and out of symbiosis. BMC Genomics 15: 924.
Roeselers G, et al.. 2010. Complete genome sequence of Candidatus Ruthia magni
fi
ca.
16. HMS Type 1: Nutrients and Odor
Host
Microbiome Nutrients
Yamaguchi MS, Ganz HH, Cho AW, Zaw TH, Jospin G, McCartney MM, et al. (2019) Bacteria isolated from Bengal cat (Felis catus × Prionailurus bengalensis) anal sac secretions produce
volatile compounds potentially associated with animal signaling. PLoS ONE 14(9): e0216846. https://doi.org/10.1371/journal.pone.0216846
Connie Rojas
17. Chemical communication
Many animals communicate via odors and chemical
substances (pheromones; volatile organic compounds) to:
deter predators mark territories advertise fertility
and emit identifying information
Wood 1990; Jordan et al. 2007; Kucklich et al. 2019
Modification of Slide by Connie Rojas
18. Anal glands produce semi-viscous and odorous
secretions in mammals
Modification of Slide by Connie Rojas
19. Anal glands produce semi-viscous and odorous
secretions in mammals
fermentative bacteria within
anal glands can produce
odorous metabolites involved
in host chemical signaling
Studied in badgers, hyenas,
and meerkats, but severely
understudied in other
animals
Rosell et al. 1998, Theis et al. 2013, Roberts et al. 2014, Leclaire et al. 2014
Modification of Slide by Connie Rojas
20. Limited understanding in domestic cats (Felis
catus)
Yamaguchi et al. (2019) examined the microbiome and
volatile organic compounds found in the anal gland
secretions of a Bengal cat
Modification of Slide by Connie Rojas
21. Limited understanding in domestic cats (Felis
catus)
Yamaguchi et al. (2019) examined the microbiome and
volatile organic compounds found in the anal gland
secretions of a Bengal cat
Bacteria isolated from the anal gland produced the same
volatiles found in anal gland secretions
Modification of Slide by Connie Rojas
22. Expanding this research to include more
cat individuals
Metagenomics
(microbiome)
Culturing
Swabbed the anal glands of 23 domestic cats
Metabolomics
(volatiles)
Stanley Marks, Hira Lesea,
and Cristina Davis
Modification of Slide by Connie Rojas
26. Seven bacterial species were recovered as MAGs and
as cultured isolates
MAG
Corynebacterium pyruviciproducens
Corynebacterium frankenforstense
Bacteroides fragilis
Escherichia coli
Lactobacillus johnsonii
Pediococcus acidilactici
Proteus mirabilis
Streptococcus canis
Corynebacterium spp.
possess type I fatty acid
synthases
Lactobacillus plantarum can
make volatile phenols, esters,
and ketones from
fermentation of gram sprouts
Proteus mirabilis produce
odorants in carcasses that
attract blowflies
Streptococcus sp. found in
the anal gland secretions
of red foxes and dogs, and
in the human axillae
Modification of Slide by Connie Rojas
27. In the process of characterizing the volatile
compounds found in anal sac secretions
Modification of Slide by Connie Rojas
28. HMS Type 2: Pathogens
Host
Microbiome Pathogen
E2
Extrinsic
29. HMS Type 2: Flu & Ducks
Ducks
Gut
Microbiome
Flu
Walter
Boyce
Holly
Ganz
Sarah
Hird
Ladan
Daroud
Alana
Firl
Hird SM, Ganz H, Eisen JA, Boyce WM. 2018. The cloacal microbiome of
fi
ve wild duck species varies by species and in
fl
uenza A virus infection status. mSphere 3:e00382-18. https:// doi.org/10.1128/mSphere.00382-18
Ganz, H.H., Doroud, L., Firl, A.J., Hird, S.M., Eisen, J.A. and Boyce, W.M., 2017. Community-level differences in the microbiome of healthy wild mallards and those infected by influenza A viruses. mSystems, 2(1) .e00188-16.
E2
Extrinsic
30. HMS Type 2: Koalas & Chlamydia
Koala
Gut
Microbiome
Chlamydia
&
Antibiotics
Katherine
Dahlhausen
E2
Extrinsic
Dahlhausen KE, Doroud L, Firl AJ, Polkinghorne A, Eisen JA. 2018. Characterization of shifts of koala (Phascolarctos cinereus) intestinal microbial communities associated with antibiotic treatment. PeerJ 6:e4452 https://doi.org/
10.7717/peerj.4452
Dahlhausen KE, Jospin G, Coil DA, Eisen JA, Wilkins LGE. 2020. Isolation and sequence-based characterization of a koala symbiont: Lonepinella koalarum. PeerJ 8:e10177 https://doi.org/10.7717/peerj.10177
32. Sonia Ghose’s Research
Characterizing the impact of restoration
on the Rana sierrae skin microbiome
across restoration histories and sites
Sonia L. Ghose
Collaborators:
• Vance Vredenburg
(SFSU)
• Roland Knapp (SNARL)
• Jessie Bushell (SF Zoo)
Funding:
• NIH Animal Models of
Infectious Diseases T32
• Alfred P. Sloan Foundation
• Center for Population
Biology
Modification of Slide by Sonia Ghose
33. Rana sierrae
Study system: Rana sierrae
• The Sierra Nevada yellow-legged frog
• Endangered species
• Highly susceptible to Bd
• Few populations remain
• Persisting with Bd
• Restoration efforts underway
• Role of skin microbiome?
IUCNredlist.org
Sonia Ghose’s Research
Modification of Slide by Sonia Ghose
34. One genus dominates samples but unstudied
Sonia Ghose Modification of Slide by Sonia Ghose
35. Variovorax
Pseudorhodoferax
Curvibacter, Rhodoferax
Xylophilus, Curvibacter, Acidovorax, Delftia, Comam
Hydrogenophaga
Symbiobacter
Roseateles, Paucibacter, Pelomonas, Mitsuaria
Frog01
Rubrivivax, Azohydromonas, Aquincola, Vitreoscilla, JOSH
Rhizobacter, Ideonella
Methylibium, Aquabacterium
AAP99
Janthinobacterium
Duganella
Massilia
Massilia
AVCC01
Herbaspirillum
Undibacterium
Profftella
Polynucleobacter, Burkholderia
BOG-994
Caldimonas
Thiomonas
Fusobacterium
Brachymonas, Comamonas
= MAG
representative
= Clade contains
known violacein
producer
Black text = Present in 16S
data
Grey text = Not present in
16S data
Frog 01 Mag
Family
Burkholderiaceae
Close relatives of
Frog 01 known to
make anti-fungal
pigment violacein
Modification of Slide by Sonia Ghose
36. Discovery and Taxonomy of Pigmented Bacteria
Marina E. De León
Marina De León
Modification of Slide by Marina De León
38. HMS Type 3: Rice Microbiome
Rice
Root
Microbiome Domestication
E2
Extrinsic
Sundar Lab
Srijak
Bhatnagar
Edwards J, Johnson C, Santos-Medellin C, Lurie E, Podishetty NK, Bhatnagar S, Eisen JA, Sundaresan V. 2015. Structure, variation, and assembly of the root-associated microbiomes of
rice. Proceedings of the National Academy of Sciences USA 12(8): E911-20.
39. HMS Type 3: Panamanian Isthmus
1000s of Species
Microbiome
Rise of
Wilkins
Bill
Wcislo
Matt
Leray
E2
Extrinsic
https://istmobiome.rbind.io
https://istmobiome.net
· This work is funded by a grant from the Gordon and Betty Moore Foundation doi:10.37807/GBMF5603
Jarrod
Scott
David
Coil
40. Seagrass
Zostera marina
Microbiome Returning to
The Sea
HMS Type 3: Seagrass Land to Sea
Jenna
Lang
Jessica
Green
Jay
Stachowicz
David
Coil
E2
Extrinsic
https://seagrassmicrobiome.org
41. PLENTY OF FUNGI IN THE SEA:
INSIGHTS FROM PROFILING THE
SEAGRASS MYCOBIOME
CASSIE ETTINGER, PHD
@CASETTRON
NSF OCE POSTDOC
STAJICH LAB, UC RIVERSIDE
Modification of Slide by Cassie Ettinger
42. FOCUS ON ONE SEAGRASS
SPECIES: ZOSTERA MARINA
◆ Focus on one seagrass
species: Zostera marina (ZM)
or eelgrass
◆ Most abundant seagrass
species in the Northern
hemisphere
https://www.iucnredlist.org/
Fig 1, Fonseca & Uhrin Mar Fisheries Rev (2009)
Leaves
Rhizome
Roots
Modification of Slide by Cassie Ettinger
44. NOT MUCH IS KNOWN ABOUT
MARINE FUNGI
◆ Very few cultured isolates
exist
◆ Thought to include
members of the “early
diverging lineages” or
“dark matter” fungi
◆ Likely involved in
symbioses with many
marine organisms
◆ Harder to study than
bacteria / archaea
Microbial isolates associated with Zostera marina
Ettinger & Eisen, PLoS One (2020)
Modification of Slide by Cassie Ettinger
45. FUNGI WERE IMPORTANT FOR PLANT
TRANSITION TO LAND BUT MARINE
ENVIRONMENT HAS DIFFERENT SELECTION
PRESSURES
POSSIBLE LOSS OF MYCORRHIZAL FUNGI?
POSSIBLE GAIN OF NOVEL ASSOCIATIONS
WITH MARINE FUNGI??
Modification of Slide by Cassie Ettinger
46. GOALS FOR PROFILING THE
SEAGRASS MYCOBIOME
1) To characterize the taxonomic diversity of fungi associated
with the seagrass, ZM, from Bodega Bay, CA
2) To isolate and identify a diverse culture collection of fungi
associated with ZM from Bodega Bay, CA
3) To survey the taxonomic diversity of fungi associated with the
seagrass, ZM, globally
Modification of Slide by Cassie Ettinger
47. GOALS FOR PROFILING THE
SEAGRASS MYCOBIOME
1) To characterize the taxonomic diversity of fungi associated
with the seagrass, ZM, from Bodega Bay, CA
2) To isolate and identify a diverse culture collection of fungi
associated with ZM from Bodega Bay, CA
3) To survey the taxonomic diversity of fungi associated with the
seagrass, ZM, globally
Modification of Slide by Cassie Ettinger
48. THE MYCOBIOME OF ZM FROM BODEGA BAY, CA: FUNGAL
TAXA VARY ACROSS DIFFERENT PARTS OF THE PLANT
Ettinger & Eisen, Frontiers in Microbiology (2019)
Leaves
Roots
Rhizome
Glomerellales (Colletotrichum) =
Possible dark septate endophytes
Modification of Slide by Cassie Ettinger
49. DARK SEPTATE ENDOPHYTES (DSE)
◆ Morphological, not phylogenetic group
◆ Largely uncharacterized
◆ Can transfer nitrogen and receive carbon
from land plants
◆ Can increase overall land plant nutrient
content and growth
◆ But negative, neutral and positive effects
have all been observed in land plants
◆ DSE have been previously observed in
the Mediterranean seagrass, Posidonia
oceanica
Fig 3E, 3F, Vohník et al Mycorrhiza (2015)
Modification of Slide by Cassie Ettinger
50. Mr Bayes + RAxML phylogeny
of ITS2 & partial 28S rRNA gene
Ettinger & Eisen, Frontiers in Microbiology (2019)
MOST ABUNDANT ASV (SV8) CLUSTERS WITHIN NOVEL
CLADE SW-I IN LOBULOMYCETALES (CHYTRIDIOMYCOTA)
ASV
=
amplicon
sequence
variant
Modification of Slide by Cassie Ettinger
51. COULD THESE CHYTRIDS BE SEAGRASS
PARASITES OR MUTUALISTIC SYMBIONTS?
◆ Unclassified chytrids previously seen
associated with Thalassia testudinum
◆ Found only on/in living leaf tissue
◆ Were unable to culture it without the
host plant
◆ Hypothesized it was potential
mutualistic symbiont or weak parasite
◆ Thought might be ubiquitous, but
could easily be misidentified as the
seagrass pathogen Labyrinthula
Newell & Fell Botanica Marina (1980)
Modification of Slide by Cassie Ettinger
52. GOALS FOR PROFILING THE
SEAGRASS MYCOBIOME
1) Culture-independent profiling reveals putative associations
with dark septate endophytes and marine chytrids
2) To isolate and identify a diverse culture collection of fungi
associated with ZM from Bodega Bay, CA
3) To survey the taxonomic diversity of fungi associated with the
seagrass, ZM, globally
Modification of Slide by Cassie Ettinger
53. ISOLATIONS PERFORMED BY
UNDERGRADUATES
◆ Cultured fungi associated with:
◆ ZM leaves, ZM roots, ZM rhizomes, sediment,
seawater from Bodega Bay, CA
◆ Epiphytes and endophytes
◆ Sanger identifications using ITS/28S
Neil Brahmbahatt
Katie Somers
Kate Jones & Tess McDaniel
Modification of Slide by Cassie Ettinger
54. 108 FUNGI, 40 BACTERIA & 2
OOMYCETES
◆ Fungi
◆ Mainly Ascomycota in the
Eurotiomycetes,
Dothideomycetes &
Sordariomycetes
◆ No chytrids
◆ Bacteria
◆ Majority were ubiquitous marine
lineages (e.g. Vibrio,
Pseudoalteromonas)
◆ Actinomycetes isolates
◆ Phyllobacterium sp.
◆ Oomycota
◆ Halophytophthora sp.
Ettinger & Eisen, PLoS One (2020)
Modification of Slide by Cassie Ettinger
55. GOALS FOR PROFILING THE
SEAGRASS MYCOBIOME
1) Culture-independent profiling reveals putative associations
with dark septate endophytes and marine chytrids
2) Culture-based surveys capture ZM generalist and specialist
fungi and enable comparative genomics
3) To survey the taxonomic diversity of fungi associated with the
seagrass, ZM, globally
Modification of Slide by Cassie Ettinger
56. Leaves (n = 12), roots (n = 12),
sediment (n = 12) taken from a shallow
ZM bed at each site
Sediment Roots
Leaves
GLOBAL SAMPLING EFFORT
16 SITES ACROSS NORTHERN HEMISPHERE
Site map courtesy of J. Stachowicz Ettinger et al, AEM (2021)
Modification of Slide by Cassie Ettinger
57. ACKNOWLEDGEMENTS:
The Eisen & Stajich labs
Three fun-gals & one fun-guy: Kate Jones,
Tess McDaniel, Katie Somers & Neil Brahmbahatt
Collaborators: Jay Stachowicz, Sofie Voerman,
Susan Williams, Jessica Abbott, Jeanine Olsen, ZEN,
Marina LaForgia, Victoria Watson-Zink, Dante Torio & more
Want to know more?
email: cassande@ucr.edu
website: casett.github.io
twitter: @casettron
https://xkcd.com/
Modification of Slide by Cassie Ettinger
58. Microbiome Returning to
The Sea
HMS Type 3: Seagrass Land to Sea
Jenna
Lang
Jessica
Green
Jay
Stachowicz
David
Zostera marina
Part 2
Develop Zm into a
model system
59. Microbiome Returning to
The Sea
HMS Type 3: Seagrass Land to Sea
Jenna
Lang
Jessica
Green
Jay
Stachowicz
David
Zostera marina
Part 2
Develop Zm into a
model system
60. Catherine Collier. IAN Image Library. https://ian.umces.edu/
imagelibrary/
Jay
Stachowicz
Jonathan
Eisen
Laura
Vann
Jeanine
Olsen
Thorsten B.H.
Reusch
Resequencing of Zostera marina Across the
Northern Hemisphere
Modification of Slide by Gina Chaput
61. Catherine Collier. IAN Image Library. https://ian.umces.edu/
imagelibrary/
• 10.4% reads not mapped to
Z. marina
• 421 MAGs constructed
• 121 MAGs of high quality
Assembling Non Zostera reads in the data
Modification of Slide by Gina Chaput
62. Catherine Collier. IAN Image Library. https://ian.umces.edu/
imagelibrary/
• 10.4% reads not mapped to
Z. marina
• 421 MAGs constructed
• 121 MAGs of high quality
Assembling Non Zostera reads in the data
Modification of Slide by Gina Chaput
63. Catherine Collier. IAN Image Library. https://ian.umces.edu/
imagelibrary/
• 10.4% reads not mapped to
Z. marina
• 421 MAGs constructed
• 121 MAGs of high quality
Df Sumof Squares R2 F Pr(>F)
Latitude 1 51144 0.03172 11.2517 1e-04 ***
Water Body 1 48178 0.02988 10.5994 1e-04 ***
Latitude:Water Body 1 44813 0.02779 9.8591 1e-04 ***
Residual 323 1468163 0.91060
Total 326 1612299 1.0000
Assembling Non Zostera reads in the data
Modification of Slide by Gina Chaput
64. Phylogenetic Diversity of High Quality MAGs
Chaput et al. Unpublished
(31)
Df Sumof Squares R2 F Pr(>F)
Latitude 1 41270 0.06962 26.666 1e-04 ***
Water Body 1 29210 0.04927 18.874 1e-04 ***
Latitude:Water Body 1 22437 0.03785 14.498 1e-04 ***
Residual 323 499890 0.84326
Total 326 592807 1.0000
Modification of Slide by
Gina Chaput
68. Observations
in the Environment
From Field to the Lab: Designing Microbiome
Experiments
Application of Microbes (Example: Seagrass Restoration)
Plant-Microbe
Interactions:
Host Response
Method Development
Modification of Slide by Gina Chaput
69. Assembly Rules of Plant-Microbe Interactions: Z. marina
seedlings
Host Filtering Effect
Priority Effect of Seed MicrobiomePriming Effect of
Sediment Microbiome
Seven Stages of Seedling
Development
Xu et al. 2016 (DOI: 10.7717/peerj.2697)
EcoFAB 2.0
Modification of Slide by Gina Chaput
77. Monitoring wastewater to inform
COVID-19 public health response
Heather N. Bischel
Assistant Professor, Department of Civil & Environmental Engineering
hbischel@ucdavis.edu
78. Major Work Last Three Years
• COVID
• COVID
• COVID
• COVID
• COVID
• COVID
• COVID
HELP NEEDED
80. Covid-19
outbreak?
• Phylogenetic analysis of
clinical sequence data
clusters related Covid-19
infections
Main finding:
• Infection clusters are not
seen for students living off
campus
• Infection clusters are seen
for students in on-campus
residential housing [RH &
TG]
Modification of Slide by Mo Kaze
81. Covid-19 outbreak:
local school
• Phylogenetic analysis of
clinical sequence data
identified local school
outbreak
Main finding:
• High sequence similarity
indicated strong support
for Covid-19 transmission
between students,
teachers, and parents
Modification of Slide by Mo Kaze
88. STAP
An Automated Phylogenetic Tree-Based Small Subunit
rRNA Taxonomy and Alignment Pipeline (STAP)
Dongying Wu1
*, Amber Hartman1,6
, Naomi Ward4,5
, Jonathan A. Eisen1,2,3
1 UC Davis Genome Center, University of California Davis, Davis, California, United States of America, 2 Section of Evolution and Ecology, College of Biological Sciences,
University of California Davis, Davis, California, United States of America, 3 Department of Medical Microbiology and Immunology, School of Medicine, University of
California Davis, Davis, California, United States of America, 4 Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, United States of America,
5 Center of Marine Biotechnology, Baltimore, Maryland, United States of America, 6 The Johns Hopkins University, Department of Biology, Baltimore, Maryland, United
States of America
Abstract
Comparative analysis of small-subunit ribosomal RNA (ss-rRNA) gene sequences forms the basis for much of what we know
about the phylogenetic diversity of both cultured and uncultured microorganisms. As sequencing costs continue to decline
and throughput increases, sequences of ss-rRNA genes are being obtained at an ever-increasing rate. This increasing flow of
data has opened many new windows into microbial diversity and evolution, and at the same time has created significant
methodological challenges. Those processes which commonly require time-consuming human intervention, such as the
preparation of multiple sequence alignments, simply cannot keep up with the flood of incoming data. Fully automated
methods of analysis are needed. Notably, existing automated methods avoid one or more steps that, though
computationally costly or difficult, we consider to be important. In particular, we regard both the building of multiple
sequence alignments and the performance of high quality phylogenetic analysis to be necessary. We describe here our fully-
automated ss-rRNA taxonomy and alignment pipeline (STAP). It generates both high-quality multiple sequence alignments
and phylogenetic trees, and thus can be used for multiple purposes including phylogenetically-based taxonomic
assignments and analysis of species diversity in environmental samples. The pipeline combines publicly-available packages
(PHYML, BLASTN and CLUSTALW) with our automatic alignment, masking, and tree-parsing programs. Most importantly,
this automated process yields results comparable to those achievable by manual analysis, yet offers speed and capacity that
are unattainable by manual efforts.
Citation: Wu D, Hartman A, Ward N, Eisen JA (2008) An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP). PLoS
ONE 3(7): e2566. doi:10.1371/journal.pone.0002566
multiple alignment and phylogeny was deemed unfeasible.
However, this we believe can compromise the value of the results.
For example, the delineation of OTUs has also been automated
via tools that do not make use of alignments or phylogenetic trees
(e.g., Greengenes). This is usually done by carrying out pairwise
comparisons of sequences and then clustering of sequences that
have better than some cutoff threshold of similarity with each
other). This approach can be powerful (and reasonably efficient)
but it too has limitations. In particular, since multiple sequence
alignments are not used, one cannot carry out standard
phylogenetic analyses. In addition, without multiple sequence
alignments one might end up comparing and contrasting different
regions of a sequence depending on what it is paired with.
The limitations of avoiding multiple sequence alignments and
phylogenetic analysis are readily apparent in tools to classify
sequences. For example, the Ribosomal Database Project’s
Classifier program [29] focuses on composition characteristics of
each sequence (e.g., oligonucleotide frequency) and assigns
taxonomy based upon clustering genes by their composition.
Though this is fast and completely automatable, it can be misled in
cases where distantly related sequences have converged on similar
composition, something known to be a major problem in ss-rRNA
sequences [30]. Other taxonomy assignment systems focus
classification tools it does have some limitations. For example,
the generation of new alignments for each sequence is both
computational costly, and does not take advantage of available
curated alignments that make use of ss-RNA secondary structure
to guide the primary sequence alignment. Perhaps most
importantly however is that the tool is not fully automated. In
addition, it does not generate multiple sequence alignments for all
sequences in a dataset which would be necessary for doing many
analyses.
Automated methods for analyzing rRNA sequences are also
available at the web sites for multiple rRNA centric databases,
such as Greengenes and the Ribosomal Database Project (RDPII).
Though these and other web sites offer diverse powerful tools, they
do have some limitations. For example, not all provide multiple
sequence alignments as output and few use phylogenetic
approaches for taxonomy assignments or other analyses. More
importantly, all provide only web-based interfaces and their
integrated software, (e.g., alignment and taxonomy assignment),
cannot be locally installed by the user. Therefore, the user cannot
take advantage of the speed and computing power of parallel
processing such as is available on linux clusters, or locally alter and
potentially tailor these programs to their individual computing
needs (Table 1).
Table 1. Comparison of STAP’s computational abilities relative to existing commonly-used ss-RNA analysis tools.
STAP ARB Greengenes RDP
Installed where? Locally Locally Web only Web only
User interface Command line GUI Web portal Web portal
Parallel processing YES NO NO NO
Manual curation for taxonomy assignment NO YES NO NO
Manual curation for alignment NO YES NO* NO
Open source YES** NO NO NO
Processing speed Fast Slow Medium Medium
It is important to note, that STAP is the only software that runs on the command line and can take advantage of parallel processing on linux clusters and, further, is
more amenable to downstream code manipulation.
*
Note: Greengenes alignment output is compatible with upload into ARB and downstream manual alignment.
**
The STAP program itself is open source, the programs it depends on are freely available but not open source.
doi:10.1371/journal.pone.0002566.t001
ss-rRNA Taxonomy Pipeline
STAP database, and the query sequence is aligned to them using
the CLUSTALW profile alignment algorithm [40] as described
above for domain assignment. By adapting the profile alignment
algorithm, the al
while gaps are in
sequence accord
Figure 1. A flow chart of the STAP pipeline.
doi:10.1371/journal.pone.0002566.g001
STAP database, and the query sequence is aligned to them using
the CLUSTALW profile alignment algorithm [40] as described
above for domain assignment. By adapting the profile alignment
algorithm, the alignments from the STAP database remain intact,
while gaps are inserted and nucleotides are trimmed for the query
sequence according to the profile defined by the previous
alignments from the databases. Thus the accuracy and quality of
the alignment generated at this step depends heavily on the quality
of the Bacterial/Archaeal ss-rRNA alignments from the
Greengenes project or the Eukaryotic ss-rRNA alignments from
the RDPII project.
Phylogenetic analysis using multiple sequence alignments rests on
the assumption that the residues (nucleotides or amino acids) at the
same position in every sequence in the alignment are homologous.
Thus, columns in the alignment for which ‘‘positional homology’’
cannot be robustly determined must be excluded from subsequent
analyses. This process of evaluating homology and eliminating
questionable columns, known as masking, typically requires time-
consuming, skillful, human intervention. We designed an automat-
ed masking method for ss-rRNA alignments, thus eliminating this
bottleneck in high-throughput processing.
First, an alignment score is calculated for each aligned column
by a method similar to that used in the CLUSTALX package [42].
Specifically, an R-dimensional sequence space representing all the
possible nucleotide character states is defined. Then for each
aligned column, the nucleotide populating that column in each of
the aligned sequences is assigned a score in each of the R
dimensions (Sr) according to the IUB matrix [42]. The consensus
‘‘nucleotide’’ for each column (X) also has R dimensions, with the
Figure 2. Domain assignment. In Step 1, STAP assigns a domain to
each query sequence based on its position in a maximum likelihood
tree of representative ss-rRNA sequences. Because the tree illustrated
Figure 1. A flow chart of the STAP pipeline.
doi:10.1371/journal.pone.0002566.g001
ss-rRNA Taxonomy Pipeline
89. WATERS
Hartman et al. BMC Bioinformatics 2010, 11:317
http://www.biomedcentral.com/1471-2105/11/317
Open Access
SOFTWARE
Software
Introducing W.A.T.E.R.S.: a Workflow for the
Alignment, Taxonomy, and Ecology of Ribosomal
Sequences
Amber L Hartman†1,3, Sean Riddle†2, Timothy McPhillips2, Bertram Ludäscher2 and Jonathan A Eisen*1
Abstract
Background: For more than two decades microbiologists have used a highly conserved microbial gene as a
phylogenetic marker for bacteria and archaea. The small-subunit ribosomal RNA gene, also known as 16 S rRNA, is
encoded by ribosomal DNA, 16 S rDNA, and has provided a powerful comparative tool to microbial ecologists. Over
time, the microbial ecology field has matured from small-scale studies in a select number of environments to massive
collections of sequence data that are paired with dozens of corresponding collection variables. As the complexity of
data and tool sets have grown, the need for flexible automation and maintenance of the core processes of 16 S rDNA
sequence analysis has increased correspondingly.
Results: We present WATERS, an integrated approach for 16 S rDNA analysis that bundles a suite of publicly available 16
S rDNA analysis software tools into a single software package. The "toolkit" includes sequence alignment, chimera
removal, OTU determination, taxonomy assignment, phylogentic tree construction as well as a host of ecological
analysis and visualization tools. WATERS employs a flexible, collection-oriented 'workflow' approach using the open-
source Kepler system as a platform.
Conclusions: By packaging available software tools into a single automated workflow, WATERS simplifies 16 S rDNA
analyses, especially for those without specialized bioinformatics, programming expertise. In addition, WATERS, like
some of the newer comprehensive rRNA analysis tools, allows researchers to minimize the time dedicated to carrying
out tedious informatics steps and to focus their attention instead on the biological interpretation of the results. One
advantage of WATERS over other comprehensive tools is that the use of the Kepler workflow system facilitates result
interpretation and reproducibility via a data provenance sub-system. Furthermore, new "actors" can be added to the
workflow as desired and we see WATERS as an initial seed for a sizeable and growing repository of interoperable, easy-
to-combine tools for asking increasingly complex microbial ecology questions.
Background
Microbial communities and how they are surveyed
Microbial communities abound in nature and are crucial
for the success and diversity of ecosystems. There is no
end in sight to the number of biological questions that
can be asked about microbial diversity on earth. From
animal and human guts to open ocean surfaces and deep
sea hydrothermal vents, to anaerobic mud swamps or
boiling thermal pools, to the tops of the rainforest canopy
and the frozen Antarctic tundra, the composition of
microbial communities is a source of natural history,
intellectual curiosity, and reservoir of environmental
health [1]. Microbial communities are also mediators of
insight into global warming processes [2,3], agricultural
success [4], pathogenicity [5,6], and even human obesity
[7,8].
In the mid-1980 s, researchers began to sequence ribo-
somal RNAs from environmental samples in order to
characterize the types of microbes present in those sam-
ples, (e.g., [9,10]). This general approach was revolution-
ized by the invention of the polymerase chain reaction
(PCR), which made it relatively easy to clone and then
* Correspondence: jaeisen@ucdavis.edu
1 Department of Medical Microbiology and Immunology and the Department
of Evolution and Ecology, Genome Center, University of California Davis, One
Shields Avenue, Davis, CA, 95616, USA
† Contributed equally
Full list of author information is available at the end of the article
11:317
105/11/317
Page 2 of 14
bosomal RNA) in partic-
osomal RNA (ss-rRNA).
e amount of previously
[1,11-13]. Researchers
t rRNA gene not only
it can be PCR amplified,
e and highly conserved
ersally distributed among
ful for inferring phyloge-
e then, "cultivation-inde-
ught a revolution to the
ng scientists to study a
Align
Check
chimeras
Cluster Build
Tree
Assign
Taxonomy
Tree w/
Taxonomy
Diversity
statistics &
graphs
Unifrac
files
Cytoscape
network
OTU table
Hartman et al. BMC Bioinformatics 2010, 11:317
http://www.biomedcentral.com/1471-2105/11/317
Page 3 of 14
Motivations
As outlined above, successfully processing microbial
sequence collections is far from trivial. Each step is com-
plex and usually requires significant bioinformatics
expertise and time investment prior to the biological
interpretation. In order to both increase efficiency and
ensure that all best-practice tools are easily usable, we
sought to create an "all-inclusive" method for performing
all of these bioinformatics steps together in one package.
To this end, we have built an automated, user-friendly,
workflow-based system called WATERS: a Workflow for
the Alignment, Taxonomy, and Ecology of Ribosomal
Sequences (Fig. 1). In addition to being automated and
simple to use, because WATERS is executed in the Kepler
scientific workflow system (Fig. 2) it also has the advan-
tage that it keeps track of the data lineage and provenance
of data products [23,24].
Automation
The primary motivation in building WATERS was to
minimize the technical, bioinformatics challenges that
arise when performing DNA sequence clustering, phylo-
genetic tree, and statistical analyses by automating the 16
S rDNA analysis workflow. We also hoped to exploit
additional features that workflow-based approaches
entail, such as optimized execution and data lineage
tracking and browsing [23,25-27]. In the earlier days of 16
S rDNA analysis, simply knowing which microbes were
present and whether they were biologically novel was a
noteworthy achievement. It was reasonable and expected,
therefore, to invest a large amount of time and effort to
get to that list of microbes. But now that current efforts
are significantly more advanced and often require com-
parison of dozens of factors and variables with datasets of
thousands of sequences, it is not practically feasible to
process these large collections "by hand", and hugely inef-
ficient if instead automated methods can be successfully
employed.
Broadening the user base
A second motivation and perspective is that by minimiz-
ing the technical difficulty of 16 S rDNA analysis through
the use of WATERS, we aim to make the analysis of these
datasets more widely available and allow individuals with
Figure 2 Screenshot of WATERS in Kepler software. Key features: the library of actors un-collapsed and displayed on the left-hand side, the input
and output paths where the user declares the location of their input files and desired location for the results files. Each green box is an individual Kepler
actor that performs a single action on the data stream. The connectors (black arrows) direct and hook up the actors in a defined sequence. Double-
clicking on any actor or connector allows it to be manipulated and re-arranged.
Hartman et al. BMC Bioinformatics 2010, 11:317
http://www.biomedcentral.com/1471-2105/11/317
Page 9
default is 97% and 99%), and they are also generated for
every metadata variable comparison that the user
includes.
Data pruning
To assist in troubleshooting and quality con
WATERS returns to the user three fasta files of seque
Figure 3 Biologically similar results automatically produced by WATERS on published colonic microbiota samples. (A) Rarefaction curves
ilar to curves shown in Eckburg et al. Fig. 2; 70-72, indicate patient numbers, i.e., 3 different individuals. (B) Weighted Unifrac analysis based on ph
genetic tree and OTU data produced by WATERS very similar to Eckburg et al. Fig. 3B. (C) Neighbor-joining phylogenetic tree (Quicktree) represent
the sequences analyzed by WATERS, which is clearly similar to Fig. S1 in Eckburg et al.
B
A
!"#$ !"#% !"#& "#" "#&
'&(!(')*+),-(./*0/-01,()234/0,)5(67#7
!"#%
!"#&
"#"
"#&
"#%
"#$
"#6
"#9
'%(!(')*+),-(./*0/-01,()234/0,)5(%	%8
:";
:"<
:"=
:">
:"
:"@
:"
:&;
:&<
:&=
:&>
:&?
:&@
:&A
:%;
:%<
:%=
:%>
:%?
:%@
:%A
'=;(!('&(.B('%
" :9" &9"" %%9" $"""
"
9"
&""
&9"
%""
%9"
:%
:&
:"
C
!"#$%&'()%$%*
!"#$%&'()"+%*
)%+$",&'$%'!"#$%&("
"#$(-'!"#$%&("
.%&&/#'0(#&'!("
%,*(+'-,&'$%'!"#$%&("
1(&0(#/$%*
#+'*$&()("
#+'*$&()("+%*
2324
5"00",&'$%'!"#$%&("
#6"-'!"#$%&("
"+,7",&'$%'!"#$%&("
1/*'!"#$%&("
1(&0(#/$%*
!"#(++(
1(&0(#/$%*
0'++(#/$%*
90. alignment used to build the profile, resulting in a multiple
sequence alignment of full-length reference sequences and
metagenomic reads. The final step of the alignment process is a
quality control filter that 1) ensures that only homologous SSU-
rRNA sequences from the appropriate phylogenetic domain are
included in the final alignment, and 2) masks highly gapped
alignment columns (see Text S1).
We use this high quality alignment of metagenomic reads and
references sequences to construct a fully-resolved, phylogenetic
tree and hence determine the evolutionary relationships between
the reads. Reference sequences are included in this stage of the
analysis to guide the phylogenetic assignment of the relatively
short metagenomic reads. While the software can be easily
extended to incorporate a number of different phylogenetic tools
capable of analyzing metagenomic data (e.g., RAxML [27],
pplacer [28], etc.), PhylOTU currently employs FastTree as a
default method due to its relatively high speed-to-performance
PD versus PID clustering, 2) to explore overlap between PhylOTU
clusters and recognized taxonomic designations, and 3) to quantify
the accuracy of PhylOTU clusters from shotgun reads relative to
those obtained from full-length sequences.
PhylOTU Clusters Recapitulate PID Clusters
We sought to identify how PD-based clustering compares to
commonly employed PID-based clustering methods by applying
the two methods to the same set of sequences. Both PID-based
clustering and PhylOTU may be used to identify OTUs from
overlapping sequences. Therefore we applied both methods to a
dataset of 508 full-length bacterial SSU-rRNA sequences (refer-
ence sequences; see above) obtained from the Ribosomal Database
Project (RDP) [25]. Recent work has demonstrated that PID is
more accurately calculated from pairwise alignments than multiple
sequence alignments [32–33], so we used ESPRIT, which
Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in this generalize
workflow of PhylOTU. See Results section for details.
doi:10.1371/journal.pcbi.1001061.g001
Finding Metagenomic OTUs
Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer
JP, Green JL, Eisen JA, Pollard KS. (2011) PhylOTU: A High-
Throughput Procedure Quantifies Microbial Community
Diversity and Resolves Novel Taxa from Metagenomic Data.
PLoS Comput Biol 7(1): e1001061. doi:10.1371/
journal.pcbi.1001061
OTUs via Phylogeny (PhylOTU)
Tom
Sharpton
Katie
Pollard
Jessica
Green
Finding Metagenomic OTUs
91. rRNA Copy # vs. Phylogeny
Steven
Kembel
Jessica
Green
Martin
Wu
Kembel SW, Wu M, Eisen JA, Green JL (2012)
Incorporating 16S Gene Copy Number
Information Improves Estimates of Microbial
Diversity and Abundance. PLoS Comput Biol
8(10): e1002743. doi:10.1371/
journal.pcbi.1002743
95. PD from Metagenomes
typically used as a qualitative measure because duplicate s
quences are usually removed from the tree. However, the
test may be used in a semiquantitative manner if all clone
even those with identical or near-identical sequences, are i
cluded in the tree (13).
Here we describe a quantitative version of UniFrac that w
call “weighted UniFrac.” We show that weighted UniFrac b
haves similarly to the FST test in situations where both a
FIG. 1. Calculation of the unweighted and the weighted UniFr
measures. Squares and circles represent sequences from two differe
environments. (a) In unweighted UniFrac, the distance between t
circle and square communities is calculated as the fraction of t
branch length that has descendants from either the square or the circ
environment (black) but not both (gray). (b) In weighted UniFra
branch lengths are weighted by the relative abundance of sequences
the square and circle communities; square sequences are weight
twice as much as circle sequences because there are twice as many tot
circle sequences in the data set. The width of branches is proportion
to the degree to which each branch is weighted in the calculations, an
gray branches have no weight. Branches 1 and 2 have heavy weigh
since the descendants are biased toward the square and circles, respe
tively. Branch 3 contributes no value since it has an equal contributio
from circle and square sequences after normalization.
Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of
Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214
Jessica
Green
Steven
Kembel
Katie
Pollard
105. We need to know how organisms are
related to each other
Tools: Whole Genome Phylogeny
106. HMS Type 1: Xylem Feeders
Glassy Winged Sharpshooter
Gut
Endosymbionts
Trying to
Live on
Xylem Fluid
Nancy Moran
Dongying Wu
E2
Extrinsic
107. WGT: Higher Evolutionary Rates in Endosymbionts
Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
Higher
Evolutionary
Rates in
Endosymbionts
108. Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
MutS MutL
+ +
+ +
+ +
+ +
_ _
_ _
Variation in Evolution Rates Correlated with Repair Gene Presence
Highest Rates
In Those Missing
Mismatch Repair
Genes
109. Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
MutS MutL
+ +
+ +
+ +
+ +
_ _
_ _
Variation in Evolution Rates Correlated with Repair Gene Presence
Important Use of
Whole Genome Trees
110. Whole Genome Trees: Many Possible Methods
Lang JM, Darling AE, Eisen JA (2013) Phylogeny of
Bacterial and Archaeal Genomes Using Conserved
Genes: Supertrees and Supermatrices. PLoS ONE
8(4): e62510. doi:10.1371/journal.pone.0062510
Jenna Lang
112. Automated WGT: Phylosift
Input Sequences
rRNA workflow
protein workflow
profile HMMs used to align
candidates to reference alignment
Taxonomic
Summaries
parallel option
hmmalign
multiple alignment
LAST
fast candidate search
pplacer
phylogenetic placement
LAST
fast candidate search
LAST
fast candidate search
search input against references
hmmalign
multiple alignment
hmmalign
multiple alignment
Infernal
multiple alignment
LAST
fast candidate search
<600 bp
>600 bp
Sample Analysis &
Comparison
Krona plots,
Number of reads placed
for each marker gene
Edge PCA,
Tree visualization,
Bayes factor tests
each
input
sequence
scanned
against
both
workflows
Aaron
Darling
Erik
Matsen
Holly
Bik
Guillaume
Jospin
Darling AE, Jospin G, Lowe E,
Matsen FA IV, Bik HM, Eisen JA.
(2014) PhyloSift: phylogenetic
analysis of genomes and
metagenomes. PeerJ 2:e243
http://dx.doi.org/10.7717/
peerj.243
Erik
Lowe
113. Normalizing Across Genes Tree OTU
Wu, D., Doroud, L, Eisen, JA 2013. arXiv. TreeOTU:
Operational Taxonomic Unit Classi
fi
cation Based on
Phylogenetic
Dongying Wu
140. Microbiomania vs. Germophobia
Germophobia Microbiomania
All Microbes Are Bad
Use Antimicrobials
in Everything
Avoid all Microbes
All Microbes Are Good
Use Probiotics
in Everything
Embraces all Microbes
Lick Subway Poles
Fecal Transplants
Will Save World
Avoid Animals
Too
Swab Stories
141. Microbiomania vs. Germophobia
Underselling Overselling
All Microbes Are Bad
Use Antimicrobials
in Everything
Avoid all Microbes
All Microbes Are Good
Use Probiotics
in Everything
Embraces all Microbes
Lick Subway Poles
Fecal Transplants
Will Save World
Avoid Animals
Too
Swab Stories
142. Overselling 1: Correlations
Correlation ≠ causation
Correlation ≠ causation
Correlation ≠ causation
Correlation ≠ causation
Correlation ≠ causation
Correlation ≠ causation
Correlation ≠ causation
Correlation ≠ causation
Correlation ≠ causation
Lesson: Some microbiome correlations with health states are
due to microbiomes playing a causal role in health state. But
most are not due to causal connections.
146. Overselling 3: Presence vs. Importance
Lesson: Even when microbes are actually present somewhere,
this does not mean they are important
147. Overselling 4: Non pathogen ≠ probiotic
https://phylogenomics.blogspot.com/2013/12/cvs-marketing-probiotics-for-everyone.html?spref=tw
Lesson: Some probiotics really work, but you can’t just throw a
non pathogenic microbe at something and call it a probiotic
148. Probiotics That Kill …
https://phylogenomics.blogspot.com/2012/07/quick-post-story-about-ucdavis.html
149. Overselling 5: Personalized ≠ Health
Lesson: Most claims of personalized microbiome health and
diet plans are bogus
150. Overselling 6: Some Microbes Are Bad
Lesson: Hygiene hypothesis is important but imbibing all the
microbes in the world is not a good plan
151. Other Overselling Issues
• Big number systems lead to spurious
associations
• Massive complexity
• Just because fecal transplants work for C.diff
does not mean they should work for
everything
152. Underselling 1: Kill Everything
Lesson: We have gone completely bonkers with overuse of
sterilization and antimicrobials
153. Underselling 2: Swab Stories
Lesson: Germaphobia leads to crazy behaviors and great
underselling of the possible benefits of microbes
154. Other Underselling Issues
• Related to a pathogen does not mean
pathogenic
• Microbes with subtle effects have been
ignored in most systems (i.e., if they are not
pathogens or obligate mutualists)
• Microbiomes ignored in many experimental
studies of plants and animals
• Microbes ignored in most conservation
studies
162. Microbiomania vs. Germophobia
Underselling Overselling
All Microbes Are Bad
Use Antimicrobials
in Everything
Avoid all Microbes
All Microbes Are Good
Use Probiotics
in Everything
Embraces all Microbes
Lick Subway Poles
Fecal Transplants
Will Save World
Avoid Animals
Too
Swab Stories
163. Microbiomania vs. Germophobia
Underselling Overselling
All Microbes Are Bad
Use Antimicrobials
in Everything
Avoid all Microbes
All Microbes Are Good
Use Probiotics
in Everything
Embraces all Microbes
Lick Subway Poles
Fecal Transplants
Will Save World
Avoid Animals
Too
Swab Stories