0
QIIME Workshop   Get started by opening:http://bit.ly/mbe-qiime2012       and read up at:       www.qiime.org        Greg ...
www.qiime.orgExtract DNA and amplify   marker gene with   barcoded primers            Pool amplicons and sequence         ...
>5000 samples in analysis pipeline   •   Stream and lake water   •   Marine water, sediment and reef   •   Soil (forest, f...
>5000 samples analyzedto date
Alpha diversity by environment type
Where do we look for new diversity?* As determined by no hit to Greengenes database.
Sequencing output                                                     Metadata        (454, Illumina, Sanger)  fastq, fast...
http://analytics.google.com
Running QIIME       Native installation on OS X       or Linux (laptops through       16,416-core compute       cluster*) ...
IPython notebook
Moving Pictures of the Human             Microbiome• Two subjects sampled daily, one for six  months, one for 18 months• F...
Moving Pictures of the Human             Microbiome• Investigate the relative temporal variability of  body sites.• Is the...
Moving Pictures of the Human      Microbiome: QIIME tutorial• A small subset of the full data set to facilitate  short run...
Key QIIME files• Mapping file: per sample meta-data, user-  defined• Input sequence file• OTU table: sample x OTU matrix, ...
Mapping file
Mapping file: always run             check_id_map.py = required field
Sequences file
>[sampleID_seqID] descriptionBarcodes have been removed!!
>[sampleID_seqID] descriptionBarcodes have been removed!!
Sequences file: can be user-provided, or    generated by split_libraries.py
OTU table     (classic format)sample x OTU matrix
OTU table                  (classic format)    sample x OTU matrixOTU identifiers
OTU table                     (classic format)     sample x OTU matrixSample identifiers
OTU table                    (classic format)        sample x OTU matrixOptional per OTU taxonomic information
OTU tables are now in biological observation             matrix (.biom) format          (QIIME 1.4.0-dev and later)       ...
sample x observation contingency matrix   SamplesOTUs       Observation       counts
sample x observation contingency matrix       SamplesTaxa         Observation         counts
sample x observation contingency matrix     MetagenomesFunctions            Observation            counts
sample x observation contingency matrix        Samples                          Genomes                       Samples   OT...
The Biological Observation Matrix (BIOM) Format  or: How I Learned To Stop Worrying and  Love the Ome-ome    JSON-based fo...
Comparative genomic (B) and metagenomeanalysis (C) with QIIME
Working with OTU tables• single_rarefaction.py: even sampling (very important if you  have different numbers of seqs/sampl...
OTU picking: terminology
OTU picking• De Novo  – Reads are clustered based on similarity to one    another.• Reference-based  – Closed reference: a...
De novo OTU picking• Pros  – All reads are clustered• Cons  – Not parallelizable  – OTUs may be defined by erroneous reads
Closed-reference OTU picking• Pros  – Built-in quality filter  – Easily parallelizable  – OTUs are defined by high-quality...
Percentage of readsthat do not hit thereferencecollection, byenvironment type.
Open-reference OTU picking• Pros  – All reads are clustered  – Partially parallelizable• Cons  – Only partially paralleliz...
Considerations in analysis
Variation in sampling depth is animportant consideration                                                                  ...
Variation in sampling depth is animportant consideration                                                                  ...
Variation in sampling depth is animportant consideration                                                                  ...
How deep is deep enough?It depends on the question…  – Differences between community types: not many    sequences.  – Rare...
How deep is deep enough?   100 sequences/sample                                    10 sequences/sample                    ...
Figure 1  (A)              (B)                  10            100                   1           (C)
Can we get accurate taxonomic assignment from short reads?
Extra slides
Elizabeth K. Costello, et al. Science 2009.Bacterial Community Variation in Human Body Habitats Across Space and Time.
This work is licensed under the Creative Commons Attribution 3.0 United States License. To view acopy of this license, vis...
Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012
Upcoming SlideShare
Loading in...5
×

Caporaso sloan qiime_workshop_slides_18_oct2012

1,188

Published on

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,188
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
90
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Caporaso sloan qiime_workshop_slides_18_oct2012"

  1. 1. QIIME Workshop Get started by opening:http://bit.ly/mbe-qiime2012 and read up at: www.qiime.org Greg Caporaso gregcaporaso@gmail.com
  2. 2. www.qiime.orgExtract DNA and amplify marker gene with barcoded primers Pool amplicons and sequence RefSeq 1 >GCACCTGAGGACAGGCATGAGGAA… >GCACCTGAGGACAGGGGAGGAGGA… RefSeq 2 >TCACATGAACCTAGGCAGGACGAA… RefSeq 3 RefSeq 4 >CTACCGGAGGACAGGCATGAGGAT… >TCACATGAACCTAGGCAGGAGGAA… RefSeq 5 RefSeq 6 >GCACCTGAGGACACGCAGGACGAC… >CTACCGGAGGACAGGCAGGAGGAA… RefSeq 7 >CTACCGGAGGACACACAGGAGGAA… RefSeq 8 RefSeq 9 >GAACCTTCACATAGGCAGGAGGAT… >TCACATGAACCTAGGGGCAAGGAA… RefSeq 10 >GCACCTGAGGACAGGCAGGAGGAA… Assign millions of Compute UniFrac distances Assign reads to samples sequences from thousands and compare samples of samples to OTUs
  3. 3. >5000 samples in analysis pipeline • Stream and lake water • Marine water, sediment and reef • Soil (forest, farm, peatland, tundra, …) • Air • Coalbed • Arctic ice core • Insect-associated • Human-associated (gut, mouth, skin)http://www.earthmicrobiome.org/
  4. 4. >5000 samples analyzedto date
  5. 5. Alpha diversity by environment type
  6. 6. Where do we look for new diversity?* As determined by no hit to Greengenes database.
  7. 7. Sequencing output Metadata (454, Illumina, Sanger) fastq, fasta, qual, or sff/trace files mapping file www.QIIME.org Phylogenetic Tree OTU (or other sample by Pre-processing observation) table Evolutionary relationship e.g., remove primer(s), demultiplex, between OTUs quality filter Denoise 454 Data Database Submission α-diversity and rarefaction β-diversity and rarefaction PyroNoise, Denoiser e.g., Phylogenetic e.g., Weighted and (In development) Diversity, Chao1, unweighted UniFrac, Bray- Observed Species Curtis, Jaccard Pick OTUs and representative sequences Reference based De novo Interactive visualizations BLAST, UCLUST, e.g., UCLUST, CD-HIT, USEARCH MOTHUR, USEARCH e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering. Assign taxonomy Align sequences e.g., PyNAST, Legend BLAST, RDP Currently supported for INFERNAL, MUSCLE, Currently supported for Classifier general sample by MAFFT marker-gene data only observation data (i.e., upstream step) (i.e., downstream step) Build OTU table Build phylogenetic treei.e., sample by observation e.g., FastTree, RAxML, Required step or input Optional step or input matrix ClearCut
  8. 8. http://analytics.google.com
  9. 9. Running QIIME Native installation on OS X or Linux (laptops through 16,416-core compute cluster*) Ubuntu Linux Virtual Box Amazon Web Services (EC2) * http://ncar.janus.rc.colorado.edu/
  10. 10. IPython notebook
  11. 11. Moving Pictures of the Human Microbiome• Two subjects sampled daily, one for six months, one for 18 months• Four body sites: tongue, palm of left hand, palm of right hand, and gut (via fecal swabs).
  12. 12. Moving Pictures of the Human Microbiome• Investigate the relative temporal variability of body sites.• Is there a temporal core microbiome?• Technical points: do we observe the same conclusions on 454 and Illumina data?
  13. 13. Moving Pictures of the Human Microbiome: QIIME tutorial• A small subset of the full data set to facilitate short run time: ~0.1% of the full sequence collection.• Sequenced across six Illumina GAIIx lanes, with a subset of the samples also sequenced on 454.• The online tutorial contains details on all of the steps: go back and read that text.
  14. 14. Key QIIME files• Mapping file: per sample meta-data, user- defined• Input sequence file• OTU table: sample x OTU matrix, central to downstream analyses [now in biom format]• Parameters file: defines analyses, for use with the ‘workflow’ scripts (optional)
  15. 15. Mapping file
  16. 16. Mapping file: always run check_id_map.py = required field
  17. 17. Sequences file
  18. 18. >[sampleID_seqID] descriptionBarcodes have been removed!!
  19. 19. >[sampleID_seqID] descriptionBarcodes have been removed!!
  20. 20. Sequences file: can be user-provided, or generated by split_libraries.py
  21. 21. OTU table (classic format)sample x OTU matrix
  22. 22. OTU table (classic format) sample x OTU matrixOTU identifiers
  23. 23. OTU table (classic format) sample x OTU matrixSample identifiers
  24. 24. OTU table (classic format) sample x OTU matrixOptional per OTU taxonomic information
  25. 25. OTU tables are now in biological observation matrix (.biom) format (QIIME 1.4.0-dev and later) Google: “biom format” http://biom-format.org See convert_biom.pyfor translating between classic and biom otu tables
  26. 26. sample x observation contingency matrix SamplesOTUs Observation counts
  27. 27. sample x observation contingency matrix SamplesTaxa Observation counts
  28. 28. sample x observation contingency matrix MetagenomesFunctions Observation counts
  29. 29. sample x observation contingency matrix Samples Genomes Samples OTUs Ortholog Taxa groups Marker Comparative Marker gene (e.g., 16S) genomics gene (e.g., 16S) surveys surveys Samples MetagenomesFunctions Metabolites Metagenomics Metatranscriptomics Metabolomics ...
  30. 30. The Biological Observation Matrix (BIOM) Format or: How I Learned To Stop Worrying and Love the Ome-ome JSON-based format for representing arbitrary sample x observation contingency tables with optional metadataMcDonald et al., GigaScience (2012). http://www.biom-format.org
  31. 31. Comparative genomic (B) and metagenomeanalysis (C) with QIIME
  32. 32. Working with OTU tables• single_rarefaction.py: even sampling (very important if you have different numbers of seqs/sample!)• filter_otus_from_otu_table.py• filter_samples_from_otu_table.py• per_library_stats.py
  33. 33. OTU picking: terminology
  34. 34. OTU picking• De Novo – Reads are clustered based on similarity to one another.• Reference-based – Closed reference: any reads which don’t hit a reference sequence are discarded – Open reference: any reads which don’t hit a reference sequence are clustered de novo
  35. 35. De novo OTU picking• Pros – All reads are clustered• Cons – Not parallelizable – OTUs may be defined by erroneous reads
  36. 36. Closed-reference OTU picking• Pros – Built-in quality filter – Easily parallelizable – OTUs are defined by high-quality, trusted sequences• Cons – Reads that don’t hit reference dataset are excluded, so you can never observe new OTUs
  37. 37. Percentage of readsthat do not hit thereferencecollection, byenvironment type.
  38. 38. Open-reference OTU picking• Pros – All reads are clustered – Partially parallelizable• Cons – Only partially parallelizable – Mix of high quality sequences defining OTUs (i.e., the database sequences) and possible low quality sequences defining OTUs (i.e., the sequencing reads)
  39. 39. Considerations in analysis
  40. 40. Variation in sampling depth is animportant consideration Human skin, colored by individual, at 500 sequence/sampleImage/analysis credit: Justin KuczynskiData reference:Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R.Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
  41. 41. Variation in sampling depth is animportant consideration Human skin, colored by sampling depth, at either 50 or 500 sequences/sampleImage/analysis credit: Justin KuczynskiData reference:Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R.Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
  42. 42. Variation in sampling depth is animportant consideration Human skin, colored by sampling depth, at either 50 (blue) or 500 (red) sequences/sampleImage/analysis credit: Justin KuczynskiData reference:Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R.Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
  43. 43. How deep is deep enough?It depends on the question… – Differences between community types: not many sequences. – Rare biosphere: more (but be careful about sequencing noise!)
  44. 44. How deep is deep enough? 100 sequences/sample 10 sequences/sample 1 sequence/samplePC2 (8 .4 %) PC2 (1 1 %) PC2 (1 7 %) PC1 (2 4 %) PC1 (1 3 %) PC1 (8 .6 %) PC3 (9 .7 %) PC3 (8 .1 %) PC3 (6 .2 %) Direct sequencing of the human microbiome readily reveals community differences. J Kuczynski et al. Genome Biology (2011).
  45. 45. Figure 1 (A) (B) 10 100 1 (C)
  46. 46. Can we get accurate taxonomic assignment from short reads?
  47. 47. Extra slides
  48. 48. Elizabeth K. Costello, et al. Science 2009.Bacterial Community Variation in Human Body Habitats Across Space and Time.
  49. 49. This work is licensed under the Creative Commons Attribution 3.0 United States License. To view acopy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter toCreative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.Feel free to use or modify these slides, but please credit me by placing the following attributioninformation where you feel that it makes sense: Greg Caporaso, www.caporaso.us.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×