Contract Research Division• Five SOLiD4 sequencing platforms• One Life Techologies 5500XL• Two Ion Torrent PGMs• Bioinformatics consulting on Illumina, 454, and PacBio• Automation thru Caliper Sciclone & Biomek FX• Commercial partnerships with companies such as CLCBio, DNANexus and Genologics• MD/PhD & Masters Level Scientists and Bioinformaticians• IT Infrastructure of >100 CPUs and >100TB storage
Edge BioServ Scientific Advisory BoardElaine Mardis, Ph.D. Steven Salzberg, Ph.D.Co-Director, Genome Sequencing Center Director, Center for Bioinformatics andWashington University School of Medicine Computational Biology University of MarylandSam Levy, Ph.D.Director of Genome Sciences Gabor Marth, Ph.D.Scripps Translational Science Institute Professor of BioinformaticsScripps Genomic Medicine Boston CollegeMichael Zody, M.S.Chief Technologist Elliott Margulies, Ph.D.Broad Institute Investigator Genome Informatics SectionKen Dewar, Ph.D. National Human Genome Research InstituteAssistant Professor National Institutes of HealthMcGill University and Genome Quebec
Sample Sourcing for RNA Projects– Blood: Large quantities of sample available, but with limited utility in transcriptome analysis– Tissue: Needle biopsy most common, but sample quantity very low– Surgical section: Larger quantities available, but limited utility; need laser capture microdissection to provide useful results, sample quantity very low– FFPE Slides: Very useful in clinical research but amount of sample and quality low.
Unamplified vs Amplified• Prostate Cancer Cell Line (Vcap) from CPDR – Well characterized – Differential Expression upon the addition of androgens. – Compared transcriptome from a single pool of RNA • Unamplified, ribosomally depleted (Ribominus™) • Amplified, no ribosomal depletion required • Two Pipelines for analysis
Amplification Gives Different Results• Gene Expression in Unstimulated Cells 14,075
Exome Seq Ultimately About Variants• Coverage• Project Design – Cohorts – Cancer• Algorithms a Solved Problem? – Single open source pipelines – Single commercial pipelines – Proprietary internal algorithms. – A mixture?
Ultimately Comes to Variation• Coverage• Project Design – Cohorts – Cancer• Algorithms Solved Problem? – Single open source pipelines – Single commercial pipelines – Proprietary internal algorithms. – A mixture?
Digging Deep with an ExomeGenetic variation in an individual human exome.Ng PC, Levy S, Huang J, Stockwell TB, Walenz BP, Li K, Axelrod N, Busam DA, Strausberg RL, Venter JC.PLoS Genet. 2008 Aug 15;4(8):e1000160.
Then Why?• De Bruijn Graphs adversely affected by more frequent INDEL characteristics of Ion Torrent• Higher Average Quality reads are less abundant in Ion Torrent
Does this matter in Resequencing?• Depends on the tools used! – If you understand error profile, you can correct for it…• Ran Simulated DH10B mutation experiment 1. make mutated e. coli reference (fakemut) 2. align data to mutated reference (clc, tmap, or other mappers) 3. calculate per base coverage on the BAM file (genomeCoverageBed) 4. run samtools/mpileup/vcffilter (or CLC SNP/INDELcalling) to call variants -run various settings to compare variant calling 5. Calculate false positives, true positives, and false negatives 6. Calculate number of variants missed due to low coverage 7. Calculate PPV and corrected sensitivity 8. Graph PPV and corrected sensitivity
Resequencing • Ion claims substitution issues with MiSeq 1 • Illumina claims INDEL issues with Ion 21. http://www.iontorrent.com/lib/images/PDFs/co23743_pgm_app_note.pdf2. http://www.illumina.com/Documents/products/appnotes/appnote_miseq_ecoli.pdf
Resequencing Variants Specificity Sensitivity PPV IdentifiedIon/TMAP/SamTools 460 100% 76.957% 97.676%(Mod)Ion/TMAP/SamTools 459 99.895% 91.939% 6.014% (~6500(Default) False Negatives)MiSeq/Eland/SamTools 220 99.99996% 95.91% 99.06%(Default – SNPs ONLY)MiSeq/CLC/SamTools 459 95.464% 99.998% 83.871 (~65 False(Default) Negatives)MiSeq SubSampled on DH10B Ion SubSampled on DH10B(TMAP/Samtools): (TMAP/Samtools):9 total variants identified 16 total variants identified8 SNPs and 1 INDEL 0 SNPs and 16 INDEL
MiSeq Data PPV and Sensitivity of Samtools Analyses of MiSeq Data100.000% 80.000% 60.000% Total PPV SNPs PPV INDELs PPV Total Corrected Sensitivity 40.000% SNPs Corrected Sensitivity INDELs Corrected Sensitivity 20.000% 0.000% MiSeq CLC with Default Samtools MiSeq CLC Map with Variant Analysis MiSeq TMAP Map with Variant Analysis
Resequencing ConclusionUsing appropriate aligners and variant callers we show bothplatforms have high accuracy, each with strengths and weaknesses…
What About PacBio?• We have less experience with PacBio• We (EdgeBio) thinks PacBio may have a niche, but given large initial investment, waiting.• Many conferences and posters – only results seen are for de novo sequencing and finishing (Broad).• Will be here all week and would love to hear why you love it.
Take This Home• There are many challenges before we even get to picking a platform – Technical Expertise – Standards in Prep and Analysis With Great NGS Power Comes Great Responsibility
Acknowledgements• CPDR (Center for Prostate Disease Research) Collaboration – Shyh-Han Tan, Ph.D. EdgeBio Sequencing EdgeBio IFX Joy Adigun John Seed Elyse Nagle Anjana Varadarajan Jennifer Sheffield David Jenkins Rossio Kersey Phil Dagosto Ryan Mease Quang Tri Nguyen