You can ignore all the other bad –Omic words you hear – conservome?!
Regardless of methdology, focus on:Species assemblages and taxonomic diversityCommunity patterns over space and time – Cosmoplitanism or Regionally restricted?Community changes as a result of natural/human disturbance
Marker genes across all domains – bacteria, archeaa,eukaryotes & virusesrRNA genes,Protein-coding orthologs, lineage-specific gene families----- Meeting Notes (5/22/12 10:42) -----Marker genes to make higher level taxon assignmentsLineages-specific gene families to narrow down assignments to lower taxonomic levels
Head-tail patterns may help us to delimit species and separate out rare taxa (who will have Head-tail patterns) from errors (no apparent pattern)----- Meeting Notes (5/22/12 10:42) -----pplacer and EPA are great tools developed in the last few years.
I see name matching as not just species names, but matching between NCBI taxon ID synonyms
rRNAdata especially needs to be interpreted in a phylogenetic contextPhylo placement allows:1) More robust taxon assignments2) ID divergent/undersampled lineages (that aren't apparent via BLAST searches)What's the ecology/function of these divergent lineages?
Phylotastic!Metagenomics Use Cases Holly Bik, UC Davis
-Omic Dictionary• Marker gene studies – amplification of a conserved homologous gene (18S, 16S rRNA) from environmental samples• Metagenomics – shotgun sequencing of random genomic fragments from environmental DNA
Explicitly Phylogenetic Approaches Aligned Evolutionary environmental Placement of sequences short reads Guide Tree
Tree Reconciliation in PhyloSift Environmental Named Sequences Taxa
Pruning Subtrees from Megatrees• User inputs a list of reference sequences with NCBI Taxon IDs Pulls down tree topology• Unclassified sequences in a reference phylogeny could be “named” with the most appropriate higher level taxon
Name Matching and TNRS• Different taxonomic synonyms have different NCBI taxon IDS – Shigella: 620 and E.coli: 562 – Species/genus boundaries still debated• TNRS would provide a “matrix” for standardizing IDs – E.g. E.coli/Shigella supergroup: 12345
Integrating Comparative Data• Metadata is a standard part of any well- constructed metagenomics study – Depth (marine samples) – Aquatic/Terrestrial – Temperature – pH – Dissolved Oxygen
Integrating Comparative Data• Metadata also includes information about the sequences themselves – Abundance information – Distribution across sample sites Branch thickness can be incorporated into XML tree files and visualized within Archaeopteryx
Mashup with Online Data• Pull down NCBI metadata for a given reference sequence accession – Habitat metadata – Ecological associations –e.g. symbionts – Genome availability – Related publications – Pictures, etc. would be awesome
Exploring Trees Ecologically, wh at are these reference taxa doing??
Pertinent info for biologicalinterpretations of DNA data!!