• Save
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
Upcoming SlideShare
Loading in...5
×
 

The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule

on

  • 2,705 views

 

Statistics

Views

Total Views
2,705
Slideshare-icon Views on SlideShare
2,644
Embed Views
61

Actions

Likes
4
Downloads
0
Comments
0

15 Embeds 61

http://kevin-gattaca.blogspot.com 17
http://paper.li 12
http://a0.twimg.com 11
http://www.linkedin.com 6
https://www.linkedin.com 3
http://kevin-gattaca.blogspot.in 3
http://kevin-gattaca.blogspot.com.au 1
http://kevin-gattaca.blogspot.hk 1
http://kevin-gattaca.blogspot.de 1
http://kevin-gattaca.blogspot.co.nz 1
http://kevin-gattaca.blogspot.fr 1
http://kevin-gattaca.blogspot.nl 1
http://kevin-gattaca.blogspot.com.es 1
http://kevin-gattaca.blogspot.ca 1
http://twitter.com 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule Presentation Transcript

    • The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule Justin H. Johnson Director of Bioinformatics EdgeBio Washington DC, USA
    • Agenda• Who We Are• NGS at 30K• The Challenges – Even Before We Get to the Platforms – When We Get to the Platforms
    • Who We Are
    • Life Tech ServiceProvider
    • Contract Research Division• Five SOLiD4 sequencing platforms• One Life Techologies 5500XL• Two Ion Torrent PGMs• Bioinformatics consulting on Illumina, 454, and PacBio• Automation thru Caliper Sciclone & Biomek FX• Commercial partnerships with companies such as CLCBio, DNANexus and Genologics• MD/PhD & Masters Level Scientists and Bioinformaticians• IT Infrastructure of >100 CPUs and >100TB storage
    • Edge BioServ Scientific Advisory BoardElaine Mardis, Ph.D. Steven Salzberg, Ph.D.Co-Director, Genome Sequencing Center Director, Center for Bioinformatics andWashington University School of Medicine Computational Biology University of MarylandSam Levy, Ph.D.Director of Genome Sciences Gabor Marth, Ph.D.Scripps Translational Science Institute Professor of BioinformaticsScripps Genomic Medicine Boston CollegeMichael Zody, M.S.Chief Technologist Elliott Margulies, Ph.D.Broad Institute Investigator Genome Informatics SectionKen Dewar, Ph.D. National Human Genome Research InstituteAssistant Professor National Institutes of HealthMcGill University and Genome Quebec
    • NGS @ 30K Feet
    • Machines and Vendors
    • Obligatory NGS Exponential Growth SlideNature Biotechnology Volume 26 Number10 October2008
    • Ultra High Throughput + Lower Cost = Broader Applications RNA-Seq/ Whole Transcriptome Epigenome - mRNA Expression & Discovery - Transcriptionally Active Sites - Alternative Splicing - Protein-DNA Interactions - Allele-Specific Expression - Methylation Analysis - microRNA Expression & Discovery Genome- De Novo- Resequencing/ Mutation Metagenome Discovery & Profiling - Microbial Diversity- Exome Sequencing - Heterogeneous Samples- Copy Number Variation- Ancient DNA
    • Challenges
    • ChallengesTechnical Expertise
    • Experimental Design Considerations  Sequencing Platform in Use  Choice of Library Construction  Depth of coverage  Re$ources  Number of Replicates  Number of Samples and Control  Etc…
    • ChallengesFlexibility w/ Standards
    • Flexibility with Standards and Scale• Then (CE) – The Norm – 10 Machines, 30 – 360 Days, 1 Project• Now (Illumina/SOLiD/454) – Scale – 1 machine, 14 Days, 30 Projects• Now (Ion Torrent) - Flexibility – 1 machine, 1 Day, 1 Project.• Standardization of analysis (Details Later)
    • ChallengesSample Preparation
    • Sample Sourcing for RNA Projects– Blood: Large quantities of sample available, but with limited utility in transcriptome analysis– Tissue: Needle biopsy most common, but sample quantity very low– Surgical section: Larger quantities available, but limited utility; need laser capture microdissection to provide useful results, sample quantity very low– FFPE Slides: Very useful in clinical research but amount of sample and quality low.
    • Unamplified vs Amplified• Prostate Cancer Cell Line (Vcap) from CPDR – Well characterized – Differential Expression upon the addition of androgens. – Compared transcriptome from a single pool of RNA • Unamplified, ribosomally depleted (Ribominus™) • Amplified, no ribosomal depletion required • Two Pipelines for analysis
    • Amplification Gives Different Results• Gene Expression in Unstimulated Cells 14,075
    • Spearman’s Correlation from 2 PipelinesPipeline A Unamplified Amplified Androgen + - + - + … 0.930 0.904 0.892Unamplified - … … 0.896 0.900 + … … … 0.928 Amplified - … … … …Pipeline B Unamplified Amplified Androgen + - + - + … 0.853 0.757 0.701Unamplified - … … 0.720 0.712 + … … … 0.848 Amplified - … … … …
    • ChallengesSample Analysis
    • Exome Seq Ultimately About Variants• Coverage• Project Design – Cohorts – Cancer• Algorithms a Solved Problem? – Single open source pipelines – Single commercial pipelines – Proprietary internal algorithms. – A mixture?
    • Ultimately Comes to Variation• Coverage• Project Design – Cohorts – Cancer• Algorithms Solved Problem? – Single open source pipelines – Single commercial pipelines – Proprietary internal algorithms. – A mixture?
    • Digging Deep with an ExomeGenetic variation in an individual human exome.Ng PC, Levy S, Huang J, Stockwell TB, Walenz BP, Li K, Axelrod N, Busam DA, Strausberg RL, Venter JC.PLoS Genet. 2008 Aug 15;4(8):e1000160.
    • Venter Genome - Algorithms • PLOS genetics 2008 vol 4 issue 8 e10000160 • ~21K SNP in exons (29MB Targeted) • 36,206 expected SNPs for 50MB Kit% Difference Homozygous TP TN FP FN Sensitivity Pos.pred.valB 1% 0% -39% -1% 1% 4%A 31% 0% 88% -41% 31% -6%C -32% 0% -49% 42% -32% 2%% Difference Heterozygous TP TN FP FN Sensitivity Pos.pred.valB 0% 0% 16% 0% 0% -9%A -15% 0% -44% 21% -15% 16%C 15% 0% 28% -20% 15% -7%
    • 3 Tools and Associated SNP Counts• Software A – 45,551• Software B – 29,814• Software C – 40,964
    • Software B v. Software A B A 29,814 45,511 8,564 21,250 24,261 Union: 54,075 Intersection: 21,250 Not to Scale
    • Software B v. Software C B C 29,814 40,964 6,358 23,456 17,508 Union: 47,322 Intersection 23,456
    • Software A v. Software C A C 45,511 40,964 14,738 30,773 10,191 Union: 55,702 Intersection: 30,773
    • B A 29,814 45,5114,750 1,608 13,130 19,642 3,814 11,131 6,377 Union: 60,452 Intersection: 19,642 Voting Scheme (2/3): 36,195 C 40,964
    • ChallengesPlatforms
    • The weight in… Yield/Day Read Error Rates Read LengthsIllumina MiSeq 2.0 Gb 1.3% (V4) 150Ion Torrent PGM 0.5 Gb (316 Chip) 1.5% (316 Chip) 120 - 240PacBio RS 3.0 Gb 2-15% 430-2900• Illumina and PacBio numbers from Vendor Sequencing• Ion Torrent from EdgeBio Sequencing
    • Illumina MiSeqMid-Range Length, Accurate Reads, Large Throughput• All Resequencing• All De novo Applications• Transcriptome• Methylation
    • Ion Torrent PGMLong, Mostly Accurate Reads in 2.5 Hours• Microbial & Viral Resequencing• Microbial & Viral De novo Applications• Eukaryotic Amplicon Sequencing• Metagenomics – WGS – 16S Surveys
    • Pac Bio RSUltra Long, Less Accurate Reads & Rapid Sequence• Microbial & Viral De novo Applications• Structural Variation / Haplotyping
    • Ion Torrent PGM Mean Read Total # A20 Mean Read Name Total # Reads Length Longest Read (Mbp) Q20 Mb Length HG19-01 2,660,176 139 203 369.91 124.00 74 HG19-02 2,321,405 121 202 281.43 116.43 75 HG19-03 2,471,922 134 203 331.54 124.17 77Microbe (37% GC) 2,869,789 122 202 350.23 160.48 82Microbe (30% GC) 2,866,851 122 202 350.16 141.31 81
    • Ion Torrent PGM Percent of # Aligned / % Aligned / Aligned Total # # N50 Largest Consensus Name Assembled Assembled Genome Reads Contigs Contig Contig Accuracy Reads Reads Covered (AQ40)DH10B Mapping 1,384,863 1,334,138 96.34% 90 107,749 326,368 99.51% 99.97% DH10B Denovo 1,384,863 1,335,604 96.44% 216 42,499 146,899 99.53% 1.73%On Similar Illumina Data Set• Normalizing for coverage and removing Paired Ends• N50 of 94926 and Largest Contig of 236274• Removing normalization improved numbers
    • Why the Difference?Quality?
    • Quality?Q-Q plots of the DH10B Ion Torrent 316 chip data expected vs empirical qualitybefore recalibration (left) and after recalibration (right).
    • QualityQ-Q plots of DH10B MiSeq data expected vs empirical quality before recalibration(left) and after recalibration (right).
    • Empirical Quality
    • Empirical Quality - Long Reads
    • Then Why?• De Bruijn Graphs adversely affected by more frequent INDEL characteristics of Ion Torrent• Higher Average Quality reads are less abundant in Ion Torrent
    • Does this matter in Resequencing?• Depends on the tools used! – If you understand error profile, you can correct for it…• Ran Simulated DH10B mutation experiment 1. make mutated e. coli reference (fakemut) 2. align data to mutated reference (clc, tmap, or other mappers) 3. calculate per base coverage on the BAM file (genomeCoverageBed) 4. run samtools/mpileup/vcffilter (or CLC SNP/INDELcalling) to call variants -run various settings to compare variant calling 5. Calculate false positives, true positives, and false negatives 6. Calculate number of variants missed due to low coverage 7. Calculate PPV and corrected sensitivity 8. Graph PPV and corrected sensitivity
    • Resequencing • Ion claims substitution issues with MiSeq 1 • Illumina claims INDEL issues with Ion 21. http://www.iontorrent.com/lib/images/PDFs/co23743_pgm_app_note.pdf2. http://www.illumina.com/Documents/products/appnotes/appnote_miseq_ecoli.pdf
    • Resequencing Variants Specificity Sensitivity PPV IdentifiedIon/TMAP/SamTools 460 100% 76.957% 97.676%(Mod)Ion/TMAP/SamTools 459 99.895% 91.939% 6.014% (~6500(Default) False Negatives)MiSeq/Eland/SamTools 220 99.99996% 95.91% 99.06%(Default – SNPs ONLY)MiSeq/CLC/SamTools 459 95.464% 99.998% 83.871 (~65 False(Default) Negatives)MiSeq SubSampled on DH10B Ion SubSampled on DH10B(TMAP/Samtools): (TMAP/Samtools):9 total variants identified 16 total variants identified8 SNPs and 1 INDEL 0 SNPs and 16 INDEL
    • Ion Data PPV and Sensitivity of Samtools Analyses100.000% 80.000% 60.000% Total PPV SNPs PPV INDELs PPV Total Corrected Sensitivity 40.000% SNPs Corrected Sensitivity INDELs Corrected Sensitivity 20.000% 0.000% Default Q4, h100, o20, Q14, h75, o20, Q7, h50, o10, Q14, h50, o10, Variant Calling Q14, h50, o10, Samtools e27, m1, H1 e21, m4, H2 e17, m4, H1 e17, m4, H1 e17, m4, H2
    • MiSeq Data PPV and Sensitivity of Samtools Analyses of MiSeq Data100.000% 80.000% 60.000% Total PPV SNPs PPV INDELs PPV Total Corrected Sensitivity 40.000% SNPs Corrected Sensitivity INDELs Corrected Sensitivity 20.000% 0.000% MiSeq CLC with Default Samtools MiSeq CLC Map with Variant Analysis MiSeq TMAP Map with Variant Analysis
    • Resequencing ConclusionUsing appropriate aligners and variant callers we show bothplatforms have high accuracy, each with strengths and weaknesses…
    • What About PacBio?• We have less experience with PacBio• We (EdgeBio) thinks PacBio may have a niche, but given large initial investment, waiting.• Many conferences and posters – only results seen are for de novo sequencing and finishing (Broad).• Will be here all week and would love to hear why you love it.
    • Take This Home• There are many challenges before we even get to picking a platform – Technical Expertise – Standards in Prep and Analysis With Great NGS Power Comes Great Responsibility
    • Acknowledgements• CPDR (Center for Prostate Disease Research) Collaboration – Shyh-Han Tan, Ph.D. EdgeBio Sequencing EdgeBio IFX Joy Adigun John Seed Elyse Nagle Anjana Varadarajan Jennifer Sheffield David Jenkins Rossio Kersey Phil Dagosto Ryan Mease Quang Tri Nguyen
    • Questions Twitter: @Bioinfojjohnson@edgebio.com