Phylotastic!
Metagenomics Use Cases
     Holly Bik, UC Davis
-Omic Dictionary

• Marker gene studies – amplification of a
  conserved homologous gene (18S, 16S rRNA)
  from environmental samples

• Metagenomics – shotgun sequencing of
  random genomic fragments from
  environmental DNA
Biodiversity?


                Phylogeography?



 Environmental Impacts?
Extract Environmental DNA



                             EASY
                                      EASY            Amplify rRNA
  Diverse marine community


Community analysis
                         VERY
                         Difficult!



                                           EASY
                                         High-throughput
                                           sequencing
http://phylosift.wordpress.com
Explicitly Phylogenetic Approaches
                   Aligned         Evolutionary
                   environmental   Placement of
                   sequences       short reads




      Guide Tree
Tree Reconciliation in PhyloSift



                        Environmental   Named
                        Sequences       Taxa
Pruning Subtrees from Megatrees
• User inputs a list of reference sequences with
  NCBI Taxon IDs  Pulls down tree topology

• Unclassified sequences in a reference
  phylogeny could be “named” with the most
  appropriate higher level taxon
Name Matching and TNRS
• Different taxonomic synonyms have different
  NCBI taxon IDS
  – Shigella: 620 and E.coli: 562
  – Species/genus boundaries still debated


• TNRS would provide a “matrix” for
  standardizing IDs
  – E.g. E.coli/Shigella supergroup: 12345
Integrating Comparative Data
• Metadata is a standard part of any well-
  constructed metagenomics study

  – Depth (marine samples)
  – Aquatic/Terrestrial
  – Temperature
  – pH
  – Dissolved Oxygen
Integrating Comparative Data
• Metadata also includes information about the
  sequences themselves

  – Abundance information
  – Distribution across sample sites


  Branch thickness can be incorporated into XML
   tree files and visualized within Archaeopteryx
Mashup with Online Data
• Pull down NCBI metadata for a given reference
  sequence accession

  – Habitat metadata
  – Ecological associations –e.g. symbionts
  – Genome availability
  – Related publications
  – Pictures, etc. would be awesome
Exploring Trees
                  Ecologically, wh
                  at are these
                  reference taxa
                  doing??
Pertinent info for biological
interpretations of DNA data!!

Phylotastic metagenomics

  • 1.
  • 2.
    -Omic Dictionary • Markergene studies – amplification of a conserved homologous gene (18S, 16S rRNA) from environmental samples • Metagenomics – shotgun sequencing of random genomic fragments from environmental DNA
  • 3.
    Biodiversity? Phylogeography? Environmental Impacts?
  • 4.
    Extract Environmental DNA EASY EASY Amplify rRNA Diverse marine community Community analysis VERY Difficult! EASY High-throughput sequencing
  • 5.
  • 6.
    Explicitly Phylogenetic Approaches Aligned Evolutionary environmental Placement of sequences short reads Guide Tree
  • 8.
    Tree Reconciliation inPhyloSift Environmental Named Sequences Taxa
  • 9.
    Pruning Subtrees fromMegatrees • User inputs a list of reference sequences with NCBI Taxon IDs  Pulls down tree topology • Unclassified sequences in a reference phylogeny could be “named” with the most appropriate higher level taxon
  • 10.
    Name Matching andTNRS • Different taxonomic synonyms have different NCBI taxon IDS – Shigella: 620 and E.coli: 562 – Species/genus boundaries still debated • TNRS would provide a “matrix” for standardizing IDs – E.g. E.coli/Shigella supergroup: 12345
  • 11.
    Integrating Comparative Data •Metadata is a standard part of any well- constructed metagenomics study – Depth (marine samples) – Aquatic/Terrestrial – Temperature – pH – Dissolved Oxygen
  • 12.
    Integrating Comparative Data •Metadata also includes information about the sequences themselves – Abundance information – Distribution across sample sites Branch thickness can be incorporated into XML tree files and visualized within Archaeopteryx
  • 13.
    Mashup with OnlineData • Pull down NCBI metadata for a given reference sequence accession – Habitat metadata – Ecological associations –e.g. symbionts – Genome availability – Related publications – Pictures, etc. would be awesome
  • 14.
    Exploring Trees Ecologically, wh at are these reference taxa doing??
  • 15.
    Pertinent info forbiological interpretations of DNA data!!

Editor's Notes

  • #3 You can ignore all the other bad –Omic words you hear – conservome?!
  • #4 Regardless of methdology, focus on:Species assemblages and taxonomic diversityCommunity patterns over space and time – Cosmoplitanism or Regionally restricted?Community changes as a result of natural/human disturbance
  • #6 Marker genes across all domains – bacteria, archeaa,eukaryotes & virusesrRNA genes,Protein-coding orthologs, lineage-specific gene families----- Meeting Notes (5/22/12 10:42) -----Marker genes to make higher level taxon assignmentsLineages-specific gene families to narrow down assignments to lower taxonomic levels
  • #7 Head-tail patterns may help us to delimit species and separate out rare taxa (who will have Head-tail patterns) from errors (no apparent pattern)----- Meeting Notes (5/22/12 10:42) -----pplacer and EPA are great tools developed in the last few years.
  • #11 I see name matching as not just species names, but matching between NCBI taxon ID synonyms
  • #15 rRNAdata especially needs to be interpreted in a phylogenetic contextPhylo placement allows:1) More robust taxon assignments2) ID divergent/undersampled lineages (that aren't apparent via BLAST searches)What's the ecology/function of these divergent lineages?