Project plan for generating a somatic
data truth set for NGS cancer assay
validation: COLO-829 and fusion spike-
Stephanie J.K. Pond
There is a need for development and widespread adoption
of standards to facilitate tool development and assay
validation for next-gen sequencing in cancer applications.
– Cancer standards are needed for somatic calls for SNVs, indels,
structural variants, copy number variation, and RNA fusion
There is limited publicly available data that can act as a
“gold standard” dataset.
We embarked on a multi-lab collaboration to generate a set
of somatic calls that can be used as a truth dataset for
validations and evaluating assay performance
– In this initial work, we are excluding FFPE samples
Cell lines have been previously sequenced
and somatic calls from the DNA were
– Pleasance et al. Nature 2010, 463(7278):
– Found variants in the major categories of
SNVs, indels, CNVs, SV that need to be
investigated for cancer applications
– Substitutions, insertions, deletions were
confirmed by capillary sequencing
– Structural variants were confirmed by PCR
across the breakpoint and capillary
– Confirmations in both cell lines to confirm
somatic vs. germline variants.
We want to expand on this dataset.
COLO-829, COLO-829BL Cell Lines
Cancer Type Tissue Source Name ATCC No. Tissue source Name ATCC No
malignant skin COLO 829 CRL-1974 B lymphoblast COLO 829BL CRL-1980
Circos from COSMIC database
Whole genome sequencing of COLO-829 and
COLO-829BL at a depth of 90x is being
generated to build a set of consensus calls:
Multiple variant callers
Cell passage A
Samples sent for sequencing to Complete Genomics
to incorporate an orthogonal technology
Cell passage B
The consensus of the datasets will establish a set
of somatic calls that can be used as a gold
standard in analytical validations
– expand the set in the literature
– a second set of lower confidence somatic calls (2/3
datasets) may also be identified
Whole Genome Sequencing of COLO-829 and COLO-829BL
Synthetic Oligo Spike-In mRNA Transcripts
T7 AscI GeneA GeneB NotI T3(rc)
(excluding poly A+)
TFG01 EWS-ATF1 1150
TFG02 TMPRSS2-ETV1 1282
TFG03 EWS-FLI1 1483
TFG04 NTRK3-ETV6 1954
TFG05 CD74-ROS1 2164
TFG06 HOOK3-RET 2383
TFG07 EML4-ALK 3442
TFG08 AKAP9-BRAF 4531
TFG09 BCR-ABL N/A*
TFG10 BRD4-NUT 3969
*IDT could not synthesize TFG09 due to significant secondary structure
• 9 fusion gene sequences of clinically relevant gene fusions were pulled
from GeneBank and were synthesized as DNA plasmids by IDT.
• Reverse transcription of the purified plasmid, followed by poly-A tailing,
resulted in mRNA transcripts of known sequence.
• These constructs can be used as spike-in control materials in mRNA
protocols to assess the ability to detect fusion genes, a critical mutation
type in cancer.
Pool of fusion spikes was added to COLO-829 total RNA at different
Data shows a linear response at higher concentrations, and poor detection
below a threshold value.
One spike (TMPRSS2-ETV1) is not detected, even at the highest
concentrations, although it is present at very high read counts
– Hypothesis is that the fusion is near the 5’ end of the transcript, and breakpoint
position is affecting fusion calling (remains to be tested)
– Highlights the need for standard materials in this area
Preliminary tests of the synthetic oligos appear promising
-14 -13 -12 -11 -10 -9 -8 -7 -6
Fusion spike RNA concentration (log10 nmoles)
TGen and ILMN have begun a cross-site effort to generate a “gold standard”
somatic dataset for a pair of cancer cell lines (COLO-829 & COLO-829BL) as
well as a set of synthetic mRNA fusion transcripts.
Data generation is scheduled to be completed this month, analysis thereafter.
We intend to make the data publicly available.
Are these appropriate reference materials?
– Cell lines:
– Fusion materials:
Preliminary data is encouraging. Additional experiments are on-going.
We welcome feedback and discussion.
– Han-Yu Chuang
– Nancy Kim
– Timothy McDaniel
– Valerie Montel
– Jimmy Perrott
– Stephanie Buchholtz
– John Carpten
– David Craig
– Winnie Liang
– W. Amol Tembe
– Tracey White