140127 abrf interlaboratory study proposal


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • ENCyclopedia Of DNA ElementsThe National Human Genome Research Institute (NHGRI) launched a public research consortium named ENCODE, the Encyclopedia Of DNA Elements, in September 2003, to carry out a project to identify all functional elements in the human genome sequence. The project started with two components - a pilot phase and a technology development phase.
  • 140127 abrf interlaboratory study proposal

    1. 1. The ABRF Next Generation Sequencing Study: Multi-Platform and Cross-Methodological Reproducibility of RNA and DNA Profiling Genome in a Bottle Consortium Workshop January 2014 Don A. Baldwin, Ph.D. CSO, Pathonomics LLC
    2. 2. ABRF is an international organization of over 700 scientists from shared research resource core facilities and biotechnology laboratories. Members represent over 250 core labs in academic and research institutions, government, and industry. “Yellow pages” and “MarketPlace” databases of members at www.ABRF.org Electronic discussion group facilitates sharing of technical advice and core facility networking. The Journal of Biomolecular Techniques covers genomics, proteomics, imaging, and other biotechnologies, and core facility operational management.
    3. 3. www.abrf.org
    4. 4. The ABRF Next Generation Sequencing (NGS) Study: • Produce reference data sets to establish baseline performance • Promote the use of standard samples • Provide public access to data for self-evaluation, performance monitoring and methods development Phase I: RNA-Seq and degraded RNA-Seq (2011-2013) Phase II: DNA-Seq and hard-to-sequence regions (2014-2016) Phase III: Clinical genetics sequencing panels Phase IV: Asteroid and Martian surface sequencing Phase I 1. Cross Platform: HiSeq/MiSeq, 454, PGM, Proton, PacBio 2. Cross Protocol: ribo-depletion, stranded, degraded 3. Cross Site: 3 sites for each platform, replicates at each site
    5. 5. SeQC, ABRF, ENCODE, others • • • • Provide reference data resources Best practices for – gene quantification, – isoform characterization, – dynamic range comparisons, – managing inter-site and intra-site variation, – analysis pipelines, and – cross-platform testing of transcriptome hypotheses To address some other aspects of RNA-seq, including – variant detection, – allele-specific expression, – RNA editing, and gene fusions. And more …
    6. 6. Phase I Study Design
    7. 7. Sequence mismatches with hg19 Q10 – Q60, most variation at read starts and ends Higher alignment rates with platform-specific algorithms vs. STAR Higher single-base mismatch and indel rates with platform-specific algorithms vs. STAR
    8. 8. Gene body coverage RNA: polyA rRNA-depleted total polyA polyA polyA 454 ILMN PAC PGM PRO 5’ 3’
    9. 9. Variation and correlation between laboratory sites Inter-site CV Inter-site R2
    10. 10. Transcript splice junction detection Long reads provide efficient junction detection Most junctions are detected by three or more platforms
    11. 11. Detection of Differentially Expressed Genes, sample A vs. B POLYA (11,820) DEGs detected by three or more methods: 61.4% 1112 454 (7579) RIBO (11,294) DEGs detected by two methods: 16.6% Unique DEGs: 454 1.5% POLYA 6.2% RIBO(-) 3.9% PRO 6.7% PGM 3.8% Total unique 22.0% 157 230 89 37 44 439 696 359 59 266 83 317 71 65 5566 366 46 93 486 79 39 330 1957 410 745 179 PGM (12,572) 867 928 PRO (13,797) 680 1207 Total (18,002) Sets containing more than 1000 genes are indicated in red; 100-999 in yellow.
    12. 12. Number of Genes Transcript abundance measurements using polyA-enrichment or rRNA-depletion library preparation methods Ribo0>PolyA RPKM difference (polyA – ribo0) PolyA>Ribo0
    13. 13. PolyA vs. ribo-depletion for detection of differential gene expression Correlation with RT-qPCR
    14. 14. Correlation coefficients 1.0 Correlations of measured transcript abundances for high-quality vs. degraded total RNA 0.9 - rRNA-depleted - Illumina HiSeq 0.8 0.7 A A A dA dA dA B dA dA dA dA dA dA dB Samples compared
    15. 15. Surrogate Variable Analysis to remove crossplatform and cross-site variation SVA: Leek JT, Storey JD. PLoS Genet. 2007 PGM Illumina FDA SeQC ABRF NGS Study
    16. 16. The ABRF NGS Study, Phase I 26 primary scientists 34 contributing scientists 21 research institutions 4.3 billion reads 447 billion nucleotides Funded by: Vendor donations of sample preparation and sequencing reagents Participating laboratories ABRF Manuscript in review: 6 figures, 2 tables 37 supplementary figures, 7 supplementary tables
    17. 17. The ABRF NGS Study, Phase II DNA sequencing topics were brainstormed and prioritized by the study consortium Samples were chosen based on the August 2013 Genome in a Bottle Workshop
    18. 18. Phase II DNA sequencing aims Reference data sets • Intra- and inter-lab replication to model the range of performance expected under normal service laboratory conditions Reference samples • Easily accessible for self-evaluation by comparison to the reference data • Standardized, stably reproduced, suitable for methods development Immediate utility • Performance metrics and data applicable to methods used now or in the near future by core sequencing facilities
    19. 19. Phase II projects in no particular order, with project scope and sequencing coverage to be prioritized by interest and funding: Performance using different platforms and technical protocols • NIST GiaB designated human genomic DNA • Measure sequencing accuracy and coverage Performance using damaged DNA and chimeric cell populations • DNA from formalin-fixed, paraffin embedded cell mixtures • Measure sequencing accuracy, coverage, and limits of detection for somatic mutations Performance on small genomes over a range of GC content • NIST GiaB (with FDA) designated bacterial genomic DNA • Measure sequencing accuracy and coverage
    20. 20. Phase II samples Sample ID DNA source A Ashkenazim Jew, maternal B Ashkenazim paternal C Ashkenazim child pool of mutant Horizon Dx lines #1, #3 plus Acrometrix lines #2, #4: M 1-2 48% each, 3-4 2% each by cell count M1 50% C, 50% M cells in FFPE (each target’s copy number = 24% or 1%) M2 80% C, 20% M cells in FFPE (targets = 9.6% or 0.4%) M3 90% C, 10% M cells in FFPE (targets = 4.8% or 0.2%) M4 95% C, 5% M cells in FFPE (targets = 2.4% or 0.1%) M5 99% C, 1% M cells in FFPE (targets = 0.48% or 0.02%) M6 99.5% C, 0.5% M cells in FFPE (targets = 0.24% or 0.01%) M7 99.9% C, 0.1% M cells in FFPE (targets = 0.048% or 0.002%) M8 99.99% C, 0.01% M cells in FFPE (targets = 0.0048% or 0.0002%) Sta Staphylococcus aureus Sae Salmonella enterica Psa Pseudomonas aeruginosa Cls Clostridium sporogenes P pooled metagenomic sample with all four bacterial genomes Project 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3
    21. 21. Small genomes project: sizes and GC content Species Staphylococcus aureus Salmonella enterica subsp. enterica serovar Typhimurium Pseudomonas aeruginosa Clostridium sporogenes Genome (bp) Avg % GC Reference strain NRS77 (NCTC 8325) 2.8x10^6 33 4.9x10^6 52 LT2 ATCC #700720 6.7x10^6 67 PA01 ATCC #47085 4.1x10^6 28 Metchnikoff ATCC #15579 Distributor NARSA #NRS77
    22. 22. Platforms and library methods Platform Illumina HiSeq 2000 Illumina HiSeq 2500 Illumina 2500 RapidTrack Illumina MiSeq Illumina Moleculo Life Technologies Proton Life Technologies PGM Pacific Biosciences New platforms? (Illm X10, NextSeq 500; Qiagen GeneReader, Oxford MinION…) Library Protocol Nextera on HiSeq NuGEN on HiSeq New England Biolabs on HiSeq Sigma WGA on Proton NuGEN WGA on Proton Qiagen WGA on Proton Project 1 Project 2 Samples Samples A, B, C C M1-M8 C C for longread scaffold A, B, C A, B, C M1-M8 C for longread scaffold Project 3 Samples Sta, Sae, Psa, Cls, P Sta, Sae, Psa, Cls, P Sta, Sae, Psa, Cls, P Sta, Sae, Psa, Cls, P Sta, Sae, Psa, Cls, P ? ? ? C C C C C C M1 M1 M1 M1 M1 M1 Sta, Sae, Psa, Cls Sta, Sae, Psa, Cls Sta, Sae, Psa, Cls Sta, Sae, Psa, Cls Sta, Sae, Psa, Cls Sta, Sae, Psa, Cls
    23. 23. An ABRF – GiaB collaboration NIST • Extract high-quality genomic DNA from cultured cells for A, B, C, Sta, Sae, Psa and Cls • Prepare equimolar blend of bacterial DNA for pool P • Procure somatic mutation cell lines, create pools M1-M8 titrated by cell counts • Extract genomic DNA from FFPE blocks of cell suspensions • Distribute aliquots of DNA reference stocks to participating study labs ABRF • Assemble platform groups with at least 3 labs per instrument or method • Each platform group will determine a consensus protocol for library preparation and sequencing • Sequence one library per sample per site (intra-lab replicates encouraged) • Collect and annotate data in a central repository • Analyze sequencing performance
    24. 24. The ABRF NGS Study leadership group in alphabetical order, with level of participation and devotion to be prioritized by alcoholic intake: Name Baldwin, Don Grills, George Mason, Chris Nicolet, Charlie Tighe, Scott email donabaldwin65@yahoo.com gsg34@cornell.edu chm2042@med.cornell.edu cnicolet@usc.edu scott.tighe@uvm.edu Contact regarding: study design vendor and partner relations data analysis sequencing methods logistics