Xin Zhou - Saturday Closing Plenary
Upcoming SlideShare
Loading in...5
×
 

Xin Zhou - Saturday Closing Plenary

on

  • 1,101 views

 

Statistics

Views

Total Views
1,101
Views on SlideShare
1,101
Embed Views
0

Actions

Likes
0
Downloads
13
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Xin Zhou - Saturday Closing Plenary Xin Zhou - Saturday Closing Plenary Presentation Transcript

    • Taxon diversity analysis for bulk insect samples using Illumina Hi-seq platform Xin ZHOU, Shanlin LIU, Yiyuan LI, Qing YANG, and Xu SU Department of Science and Technology Environmental Genomics Research Group BGI, China Adelaide, Australia, 3 December 2011
    • Problem Solutions? Opt.1: ......zzzzZZZZZ Opt.2: morph sorting  indiv. ID  …  Opt.1 Opt.3: morph sorting  indiv. barcoding  …  Opt.1 Opt.4: grinding up  NGS  CLUSTERING/BLAST  DIVERSITY! Zhou et al. 2011, 4th International Barcode of Life Conference
    • Environmental barcoding of bulk insects  aquatic insects  mini-barcode (130bp)  454  bat diet (insects)  COI fragment, 157 bp  454Biodiversity soup: metabarcoding of arthropods  Malaise trap (insects)for rapid biodiversity assessment and  COI fragment, ~400 bpbiomonitoring, Yu D.W. et.al., in review  454 Zhou et al. 2011, 4th International Barcode of Life Conference
    • Major NGS platforms applicable in environmental barcoding Requirement Read Data/run NGS platforms Run time of library length (GB) construction454 platform ~400bp 0.7 23 hr. Yes(GS FLX Titanium XL+)Illumina platform 150bp 600 14 d. Yes(Hi-Seq 2000) PE readsIllumina platform 150bp 2 27 hr. Yes(Mi-Seq) PE readsIon Torrent 200bp ~1 3.5 hr. No Illumina Hi-Seq  higher through-put  less $ / bp  increasing reading length  variety of bioinformatics tools available from genomic pipelines Zhou et al. 2011, 4th International Barcode of Life Conference
    • Sequencing capacity at BGI• 28 Illumina GAIIx Data production:• 137 Illumina Hi-Seq2000 • 100 Gb / day (2009)• 25 Life Tech SOLiD 4 • >5 Tb / day (end of 2010)• 16 ABI 3730XL • >1500X human genome / day• 110 MegaBACEs• 2 Illumina iScan• 1 Roche 454• 1 Ion Torrent• 1 Illumina Mi-Seq Zhou et al. 2011, 4th International Barcode of Life Conference
    • What I am NOT going to talk about: • Primer optimization • Systematic comparisons of NGS platforms • Quantitative diversity analysisWhat I AM going to talk about: • Can Illumina NGS be used in diversity analysis? Zhou et al. 2011, 4th International Barcode of Life Conference
    • Can Illumina NGS be used in diversity analysis?  Sequencing error rate  Read-length Zhou et al. 2011, 4th International Barcode of Life Conference
    • Sequencing error rate No indel issue in homopolymers Sequencing quality keeps increasing Rare nucleotide error can be easily corrected by: Recent improvement in sequencing quality  increasing sequencing depth using Illumina’s V3 chemical (even at 100 bp, only about 10% of the base callings has error  pair-end (PE) sequencing rate >1%)  setting stringent matching criteria in 150bp the overlapping fragment by allowing only >99% identity 150bp Insert-size 250nt PE sequencing enables forming sequence contigs Zhou et al. 2011, 4th International Barcode of Life Conference
    • Read length 150bp 150bp Insert-size 250nt Read length keeps increasing 150PE enables contig read of 250bp Short-gun reads can be further assembled into longer fragments (“short-gun” assembly strategy used in genome sequencing projects) Option of scaffold assembly Zhou et al. 2011, 4th International Barcode of Life Conference
    • Illumina environmental barcoding Illumina e-barcoding PCR based PCR freeLib1 (658bp, 150PE) Lib2 (200bp, 150PE) Full length COI COI amplicons barcode PE shotgun PE Mitochondrial sequencing sequencing shotgun PE sequencing Full length COI Full length COI without PCR bias Zhou et al. 2011, 4th International Barcode of Life Conference
    • Approach #1: PCR-based Sample information XSBN Mock (provided by Yu et al.) # Specimens 23 292# Haplotypes (2%) 12 230 Soup protocol DNA extracted individually and mixed for PCR PCR primers LepF1/LepR1 CustomizedSequence length 658 bp 700 bp Sequencing Full length (658bp) + Short-gun library (~200bp) library details Sequencing 150PE protocol Zhou et al. 2011, 4th International Barcode of Life Conference
    • Approach #1: PCR-based Pre-analysis data filtering Lib 1 Mock XSBN Raw data 1.67G 4.04G Filtering adapter 1.60G 1.28GHigh quality (Q20) 0.35G 0.50G # Reads 1,081,997 1,150,477(Primer removed) # Unique reads 36,618 45,444 (Abundance > 1) Zhou et al. 2011, 4th International Barcode of Life Conference
    • OTU filtering workflow Unique OTU Alignment Remove Compared reads cluster Chimera to reads (abunda (98%) of Lib 2 nce > 1)Mock 36,618 784 490 119 44XSBN 45,444 4,189 3887 403 399 Zhou et al. 2011, 4th International Barcode of Life Conference
    • Sanger Reference Blast at 100% identity ResultsNGS OTUs Mock 4 8 36 LepF1/R1 Customized XSBN 32 198 197 primers Zhou et al. 2011, 4th International Barcode of Life Conference
    • Sanger Reference Mock NGS OTUs “False positive”? 31 can be found in False negative our total sample, from which our mock samplesNot found in raw were assembleddata (likely dueto primer failure) 4 8 36 5 likely to be PCR errors Zhou et al. 2011, 4th International Barcode of Life Conference
    • Sanger Reference XSBN Cross-sample NGS OTUs contamination?17 not found in rawdata (primer failure) Mean + SE 32 198 197 (group1) (group2)15 were lost indata filtering Zhou et al. 2011, 4th International Barcode of Life Conference
    • Sanger Reference NGS OTUs Significantly less false positives after removal of sequences with abundance <1032 198 197 49 181 84 Slight drop of true positives Zhou et al. 2011, 4th International Barcode of Life Conference
    • Approach #1: PCR-based What’s next? Illumina e-barcoding Obtaining full-length barcodes via short-gun reads assembly (new program in development – “SOAPbarcode”) New algorithm to filter out false positive OTUs Zhou et al. 2011, 4th International Barcode of Life Conference
    • Approach #2: PCR-free method Total MT isolationIndividual &barcoding DNA extraction Shotgun sequencing Reference Reference based method independent method Zhou et al. 2011, 4th International Barcode of Life Conference
    • Building reference library: individual barcoding1. 89 individuals;2. 84 reference barcodes;3. 39 OTUs (2%); Taxon group # OTUs Lepidoptera 25 Diptera 7 Hemiptera 4 Hymenoptera 2 Psocoptera 1 Total 39 Zhou et al. 2011, 4th International Barcode of Life Conference
    • Total MT isolation & DNA extractionSample Total MT MT DNAmixture isolation extraction Zhou et al. 2011, 4th International Barcode of Life Conference
    • Shotgun sequencingInsert size: 200bp;Read length: 100bp PE; Percentage of base pairs Q20 96.2% (Sequencing error rate < 1%) Q30 92.9% (Sequencing error rate < 0.1%) GC content 38.0% Zhou et al. 2011, 4th International Barcode of Life Conference
    • Pre-analysisData filtering:1. Adaptor contamination removal;2. Quality control: in each read, only allowing <10bp with seq. error rate >1% Raw data 2.45G After filtering 2.20G Ratio of high 89.91% quality reads Zhou et al. 2011, 4th International Barcode of Life Conference
    • Approach #2: PCR-free method Method 1: Reference basedBlast reads to reference barcodes,confident identification is made only when:1. Best BLAST hit >98% identity;2. Reference coverage > 90%; Taxon groups # OTUsReference 1 Coverage: 100% Lepidoptera 20Correct Diptera 2mapping Hemiptera 3 Psocoptera 1 Total 26Reference 2Not found 13 Coverage: 30%Incorrectmapping Zhou et al. 2011, 4th International Barcode of Life Conference
    • Potential sources of failure in detecting taxa Taxon specific or Bio-mass (size & number) Zhou et al. 2011, 4th International Barcode of Life Conference
    • Failures in taxon detectionTaxon bias?Taxon groups # Total # OTUs undetected OTUs missing Lepidoptera 25 5 Diptera 7 5Hymenoptera 2 2 Hemiptera 4 1 Psocoptera 1 0 Total 39 13 Zhou et al. 2011, 4th International Barcode of Life Conference
    • Failures in taxon detectionOR bio-mass (body size, # individuals)? Readily detected Missing Average length> 5mm Average length < 5mm Zhou et al. 2011, 4th International Barcode of Life Conference
    • Approach #2: PCR-free method Method 2: Reference independent (Will we be able to identify diversity without reference MT genomes for the targeted species?)Workflow:1. Assembly of COI gene using genome assembly program (SOAPdenovo);2. Annotation using ~240 MT genomes downloaded from Genbank; Zhou et al. 2011, 4th International Barcode of Life Conference
    • PCR-Free reference-independent: results 23/31 falling in standard COI barcode region (mostly >600 bp); 1 of 23 is not in our reference barcodes; (Insecta; Lepidoptera; Pyralidae); Multiple genes obtained simultaneously; 1 nearly complete mitochondrial genome (~15k bp); 3 fragments >6000 bp; Zhou et al. 2011, 4th International Barcode of Life Conference
    • Reference independent 23/31 falling in standard COI barcode 1 of 23 was not presented in our reference barcodes; region (mostly >600 bp); (Insecta; Lepidoptera; Pyralidae); Number of individuals we collected 5 individuals failed in Sanger sequencing 89 individuals3 OTUs not detected in reference Barcode references independent method because: 39 OTUs (84 individuals) References based(1) sequencing depth is too low 26 OTUs (<10X) to allow for reliable References independent 23 OTUs assembly(2) relatively small body-size Zhou et al. 2011, 4th International Barcode of Life Conference
    • PCR-free method Multiple MT genes obtained simultaneouslyGene NumberATP6 29ATP8 4COX1 31COX2 33COX3 31CYTB 31ND1 35ND2 34ND3 24ND4 30ND4L 16ND5 30ND6 24 Zhou et al. 2011, 4th International Barcode of Life Conference
    • PCR-free method 1 nearly complete mitochondrial genome (~15k bp); 3 fragments longer than 6k bp;Barcode region Zhou et al. 2011, 4th International Barcode of Life Conference
    • Approach #2: PCR-free method What’s next?Currently:  MT DNA 5-10% after isolation;  Non-targeting DNA affects MT assembly (e.g., bacteria & genomic DNA);  Taxonomic/biomass biasPotential solutions: 1. Wet-lab protocol optimization  Pre-sorting insects by body-size  Alternative MT isolation methods 2. Increase sequencing depth Zhou et al. 2011, 4th International Barcode of Life Conference
    • Conclusions Illumina Hi-Seq delivers compatible performance as other NGS platforms in analyzing bulk insect samples, with potential advantages in achieving higher sensitivity at lower cost; Deep sequencing capacity enables a novel PCR- free approach, which may eventually solve biases caused by DNA amplification; It shares issues with other NGS platforms (non- quantitative, inflation of OTUs, etc.) Methodology optimization is much needed in many details of the pipeline; Collaborative and synergistic efforts made by the community would greatly advance the progress. Zhou et al. 2011, 4th International Barcode of Life Conference
    • AcknowledgementsFunder:Collaborators: Douglas W. Yu Kunming Institute of Zoology, Chinese Academy of Sciences Mehrdad Hajibabaei, Shadi Shokralla University of Guelph Owain Edwards CSIRO Ecosystem Sciences LU Jianliang WU Qiong AN Sainan ZHOU Yizhuang ZHAO Jing Zhou et al. 2011, 4th International Barcode of Life Conference
    • Thanks for your attention! 36 Zhou et al. 2011, 4th International Barcode of Life Conference