Successfully reported this slideshow.

Aug2013 tumor normal whole genome sequencing

825 views

Published on

Published in: Technology
  • Be the first to comment

Aug2013 tumor normal whole genome sequencing

  1. 1. Project plan for generating a somatic data truth set for NGS cancer assay validation: COLO-829 and fusion spike- in materials Stephanie J.K. Pond 8/15/13
  2. 2. 2 There is a need for development and widespread adoption of standards to facilitate tool development and assay validation for next-gen sequencing in cancer applications. – Cancer standards are needed for somatic calls for SNVs, indels, structural variants, copy number variation, and RNA fusion detection. There is limited publicly available data that can act as a “gold standard” dataset. We embarked on a multi-lab collaboration to generate a set of somatic calls that can be used as a truth dataset for validations and evaluating assay performance – In this initial work, we are excluding FFPE samples Introduction
  3. 3. 3 Cell lines have been previously sequenced and somatic calls from the DNA were published. – Pleasance et al. Nature 2010, 463(7278): 191-196. – Found variants in the major categories of SNVs, indels, CNVs, SV that need to be investigated for cancer applications – Substitutions, insertions, deletions were confirmed by capillary sequencing – Structural variants were confirmed by PCR across the breakpoint and capillary sequencing – Confirmations in both cell lines to confirm somatic vs. germline variants. We want to expand on this dataset. COLO-829, COLO-829BL Cell Lines Cancer Type Tissue Source Name ATCC No. Tissue source Name ATCC No Melanoma; malignant skin COLO 829 CRL-1974 B lymphoblast COLO 829BL CRL-1980 Circos from COSMIC database
  4. 4. 4 Whole genome sequencing of COLO-829 and COLO-829BL at a depth of 90x is being generated to build a set of consensus calls: – TGen  HiSeq 2500  Multiple variant callers  Cell passage A – TGen  Samples sent for sequencing to Complete Genomics to incorporate an orthogonal technology – Illumina  HiSeq 2500  Cell passage B The consensus of the datasets will establish a set of somatic calls that can be used as a gold standard in analytical validations – expand the set in the literature – a second set of lower confidence somatic calls (2/3 datasets) may also be identified Whole Genome Sequencing of COLO-829 and COLO-829BL Consen- sus calls TGen (Complete Genomics) TGen (HiSeq) ILMN (HiSeq)
  5. 5. 5 Synthetic Oligo Spike-In mRNA Transcripts T7 AscI GeneA GeneB NotI T3(rc) ID Genes Transcript Length (excluding poly A+) TFG01 EWS-ATF1 1150 TFG02 TMPRSS2-ETV1 1282 TFG03 EWS-FLI1 1483 TFG04 NTRK3-ETV6 1954 TFG05 CD74-ROS1 2164 TFG06 HOOK3-RET 2383 TFG07 EML4-ALK 3442 TFG08 AKAP9-BRAF 4531 TFG09 BCR-ABL N/A* TFG10 BRD4-NUT 3969 *IDT could not synthesize TFG09 due to significant secondary structure • 9 fusion gene sequences of clinically relevant gene fusions were pulled from GeneBank and were synthesized as DNA plasmids by IDT. • Reverse transcription of the purified plasmid, followed by poly-A tailing, resulted in mRNA transcripts of known sequence. • These constructs can be used as spike-in control materials in mRNA protocols to assess the ability to detect fusion genes, a critical mutation type in cancer.
  6. 6. 6 Pool of fusion spikes was added to COLO-829 total RNA at different concentrations. Data shows a linear response at higher concentrations, and poor detection below a threshold value. One spike (TMPRSS2-ETV1) is not detected, even at the highest concentrations, although it is present at very high read counts – Hypothesis is that the fusion is near the 5’ end of the transcript, and breakpoint position is affecting fusion calling (remains to be tested) – Highlights the need for standard materials in this area Preliminary tests of the synthetic oligos appear promising 0 1 2 3 4 5 6 -14 -13 -12 -11 -10 -9 -8 -7 -6 Supportingevidencestrength(log10 readcounts) Fusion spike RNA concentration (log10 nmoles) TopHat-Fusion ChimeraScan SnowShoes
  7. 7. 7 Whole- Genome TGen – HiSeq 2500 TGen – Complete Genomics ILMN - HiSeq 2500 Exomes SNVs 0:100% N:T • Replicate 1 • Replicate 2 • Replicate 3 50:50 • Replicate 1 • Replicate 2 • Replicate 3 75:25 • Replicate 1 • Replicate 2 • Replicate 3 90:10 • Replicate 1 • Replicate 2 • Replicate 3 95:5 • Replicate 1 • Replicate 2 • Replicate 3 99:1 • Replicate 1 • Replicate 2 • Replicate 3 100:0 • Replicate 1 • Replicate 2 • Replicate 3 WGS Large Insert Structural Variants 0:100% N:T • Replicate 1 • Replicate 2 • Replicate 3 50:50 • Replicate 1 • Replicate 2 • Replicate 3 75:25 • Replicate 1 • Replicate 2 • Replicate 3 90:10 • Replicate 1 • Replicate 2 • Replicate 3 95:5 • Replicate 1 • Replicate 2 • Replicate 3 99:1 • Replicate 1 • Replicate 2 • Replicate 3 100:0 • Replicate 1 • Replicate 2 • Replicate 3 RNA Diff. Exp. Fusions Tumor • Replicate 1 • Replicate 2 • Replicate 3 Tumor ERCC 1 • Replicate 1 • Replicate 2 • Replicate 3 Tumor ERCC 2 • Replicate 1 • Replicate 2 • Replicate 3 Normal • Replicate 1 • Replicate 2 • Replicate 3 Norm ERCC 1 • Replicate 1 • Replicate 2 • Replicate 3 Norm ERCC 2 • Replicate 1 • Replicate 2 • Replicate 3 Fusion spikes •Replicate 1 •Replicate 2 •Replicate 3 Arrays Copy Number Expression Agilent Illumina Affymetrix Analytical Validation at TGen 50+ Flow cells 6 TB of sequencing data Equiv ~600 Exomes (TCGA Phase 1)
  8. 8. 8 TGen and ILMN have begun a cross-site effort to generate a “gold standard” somatic dataset for a pair of cancer cell lines (COLO-829 & COLO-829BL) as well as a set of synthetic mRNA fusion transcripts. Data generation is scheduled to be completed this month, analysis thereafter. We intend to make the data publicly available. Are these appropriate reference materials? – Cell lines:  Stability  Consent – Fusion materials:  Preliminary data is encouraging. Additional experiments are on-going. We welcome feedback and discussion. Summary
  9. 9. 9 Acknowledgements Illumina – Han-Yu Chuang – Nancy Kim – Timothy McDaniel – Valerie Montel – Jimmy Perrott Tgen – Stephanie Buchholtz – John Carpten – David Craig – Winnie Liang – W. Amol Tembe – Tracey White
  10. 10. 10 Appendix
  11. 11. 11
  12. 12. 12

×