Early detection of cancer is very important to cure cancers because when the tumor burden is small and localized, they can be surgically removed. This paper describes a new strategy for early cancer detection, aiming at screening multiple different cancers within the general population using blood based test, both the protein component as well as the circulating DNA.
6. cfDNA vs ctDNA
cfDNA (cell-free DNA)
• Non-encapsulated DNA fragments of 100-300bp
• t1/2 ~ 2h for ctDNA and 1h for fetal-derived cfDNA
• Source: death, dying, necrosis/apoptosis cells
• Used in noninvasive prenatal diagnostics and cancer assessment
• Concentration in blood varies, increase with the size of fetus/ tumor
ctDNA (circulating tumor DNA)
• How is it specific to tumor? Cancer-specific mutations.
• As a biomarker: real time, non-invasive, multi-lesions, potentially cheaper (>biopsies)
• Often low concentration mutant DNA in the sea of wild type DNA, especially in early stage of
cancer. Eg. Early stage has <1 mutant template/ml plasma -> beyond detection limit (0.1%)
• But mutation information alone is not enough to predict the location of origin -> challenge
for follow-up tests
7. Study cohortTable S11. Cancer patients evaluated in this
study by tumor type and stage.
Tumor Type AJCC Stage
Patients
(n)
Proportion
of cases
(%)
Breast
I 32 15
II 114 55
III 63 30
I-III 209 --
Colorectum
I 77 20
II 191 49
III 120 31
I-III 388 --
Esophagus
I 5 11
II 29 64
III 11 24
I-III 45 --
Liver
I 5 11
II 19 43
III 20 45
I-III 44 --
Lung
I 46 44
II 27 26
III 31 30
I-III 104 --
Ovary
I 9 17
II 4 7
III 41 76
I-III 54 --
Pancreas
I 4 4
II 83 89
III 6 6
I-III 93 --
Stomach
I 21 31
II 30 44
III 17 25
I-III 68 --
• 1005 patients
• 8 types of cancer stage II (49%), stage III (31%)
and stage I (20%)
• Neoadjuvant chemo/ metastasis excluded
• Median age = 64 (range 22-93)
Control group:
• 812 “healthy” controls
• Median age = 55 (range 17-88)
• Criteria: no known history of cancer, high-grade dysplasia,
autoimmune or chronic kidney disease.
11. Mutation detection and analysis
Mutation detection
• Read was matched to reference sequence using custom scripts :
• https://github.com/InSilicoSolutions/SafeSeqS
• Reads from a common template molecule were grouped based on UID
• Artefactual mutations removed by requiring a mutation to be presen tin > 90% reads in
each UID family
• Redundant reads from optical duplication were removed by requiring reads to be at least
5000 pixels apart when located on the same file.
• Mutations must meet either one of these 2 criteria to be considered (1) present in the
COSMIC databases or (2) predicted to be inactivating in tumor suppressor genes.
• Synonymous mutation (except those at exon ends) and intronic mutations (except for
those at splice sites) were excluded.
Mutation analysis
• Mutant allele frequency (MAF) = mutant fraction per well.
• MAF in a sample = SUM of supermutant in 6 wells / total number of UID in 6 wells
12. Bioplex-200
• xMAP technology to multiplex up to 100 different analytes/ sample
• 100 colored magnetic beads created by the use of 2 fluorescent dyes
at distinct ratios of concentrations.
Houser, B. (2012). Bio-Rad’s Bio-Plex® suspension array system, xMAP technology overview.
Archives of Physiology and Biochemistry,
Magnetic bead
Charge-coupled device
CCD technology
13. Approach
• CancerSEEK approach: Combined gene + protein biomarkers
• Features
1. Gene: 61 amplicons panel of 16 genes: NRAS, CTNNB1, PIK3CA,
FBXW7, APC, EGFR, BRAG, CDKN2A, PTEN, FGFR2, HRAS, AKT1,
TP53, PPP2R1A, GNAS
2. Protein: Literature search to find protein that detect at least 1/8
cancer types with >10% sensitivity and 99% specificities : list of 41
proteins (39 can be reproducibly evaluated) -> narrow down the test
to 8 proteins
15. CancerSEEK algorithm-1
1. Mutant allelle frequency (MAF) normalization:
• MAF = # supermutants/ # UID in the same well
• Normalized by observed MAFs (for each mutation) in training set composed of normal
controls + 256 healthy WBC .
• MAF < 100 UID : set to zero
• Average MAF = ave_i for each mutation i = 1,… n
• 25th percentile of this ave_i distribution -> ave_ref
• Normalized MAF = MAF * (ave_ref/ave_i)
2. Reference distribution and p-values:
• UID was split in 10 intervals (<1000, 1000 - 2000, … , >9000)
• Corresponding to the range of UIDs, MAF was compared to 2 reference distributions:
(normal + 256 WBC healthy) or cancer patients in training set using 10-fold cross
validation-> pN and pC values.
“The classification of a sample's ctDNA status was obtained from a statistical test comparing
the normalized mutation frequencies of the sample of interest to the distributions of the
normalized mutation frequencies of, respectively, normal and cancer samples in the training
set.”
16. CancerSEEK algorithm-2
3. Log ratios and omega scores
• pC/pN for each mutation was calculated (Min and Max of 6 wells was omitted):
where Wi = #UID/ total UID for mutation i
Example for KRAS mutation:
Ø The number of supermutants and UIDs in each of the six wells were
(161, 3755), (78, 2198), (99, 2966), (84, 2013), (177, 3694), (117,3427), respectively.
Ø 6 MAFs (0.043, 0.035, 0.033, 0.042, 0.048, 0.034),
or (0.0057, 0.0047, 0.0044, 0.0056, 0.0064, 0.0045) after normalization.
Ø pC = (1.06E-06, 5.70E-06, 1.02E-05, 1.03E-06, 3.09E-07, 8.83E-06)
Ø pN = (0.100, 0.124, 0.128, 0.114, 0.094, 0.112)
Ø pC / pN = (94243, 21716, 12510,110752, 305090, 12680).
Ø Eliminate min and max
19. CancerSEEK algorithm-3
5. Logistic Regression:
• omega score + 8 protein concentration (CA-125, CA19-9, CEA, HGF, MPO, OPN, PRL, TIMP-1)
• Selection of 8/ 39 proteins:
1/ eliminate any proteins with higher median values in normal samples: 39->26 left
2/ Forward selection: each protein was dropped, and the decrease in accuracy of the test was
Checked -> importance of each protein
3/ Perform 10 rounds of 10-fold cross-validations
6. Tissue localization:
• Random forest to predict cancer types using omega score + 8 protein + 31 other proteins +
gender.
• Classification calls were obtained in an average round of 10-fold CV.
• Concordance between mutations in plasma vs tumor was considered only when omega> 3
and primary tumor contain any mutation with MAF> 5%
4. Protein normalization and transformation:
• Set all values < limits of detection : m
• Set all values > limits of detection : M
• Further transformation: if a protein concentration < 95th percentile of normal samples in
training set, then protein concentration = 0, otherwise, protein concentration = original value
28. Conclusion
• CancerSEEK = multi-analyte blood
test that can detect the presence of 8
common solid tumors (60%
estimated cancer death in the US) by
combining 8 protein biomarkers with
genetic biomarkers (61 amplicons of
16 genes)
• Estimate cost ~< $500
This study lays a foundation for a single multi-analyte blood test that combine other blood
biomarkers (metabolites, mRNA, miRNA and methylated DNA) to detect cancer for early
intervention.