SlideShare a Scribd company logo
1 of 1
Abstract
Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, NHGRI
Dahlia Shvets, James C. Mullikin, Leslie Biesecker, and Nancy F. Hansen
Investigation of Clonal Hematopoiesis in Whole-Exome Sequencing of ClinSeq Individuals
National Human Genome Research Institute
Comparative Genomics Analysis Unit
Cancer is thought to arise from the gradual
accumulation of specific genetic mutations,
sometimes years before the presence of clinical
symptoms. Early mutations can result in clonal
expansion of mutated stem or progenitor cells.
Clonal expansion subsequently increases the
likelihood of cooperating mutations occurring in
cells already harboring initiating mutations.
Clonal hematopoiesis--the clonal expansion of
hematopoietic stem cells--may signal the onset
of many hematologic cancers. In previous work,
clonal hematopoiesis has been shown to occur
with higher incidence in the elderly and is a risk
factor for later hematopoietic cancers.
Two recent studies published in the New
England Journal of Medicine, by Genovese, et
al. and Jaiswal, et al., performed large-scale
analyses searching for recurrent somatic
mutations in whole-exome sequencing of DNA
isolated from blood. These studies conclude that
clonal hematopoiesis with somatic mutations can
be detected via DNA sequencing, that it
increases in prevalence with age, and is
associated with an increased risk of hematologic
cancer. We aim to replicate their work using
whole-exome sequences from 1,001 individuals
in the ClinSeq cohort. We examine blood
derived DNA sequence data from ClinSeq
individuals to identify the genes and their
mutations that may drive clonal expansion.
Prior to this work, the ClinSeq data had already
been aligned with NovoAlign to the GRCh37
human reference sequence. We used the
program LoFreq to call low-frequency variants
using these alignments. Following Genovese et
al., we attempted to remove unreliable data from
our analysis by excluding genomic regions of
low complexity, excess coverage, segmental
duplications, known large insertions, and sites
failing Hardy Weinberg equilibrium tests. Then,
to separate germline from somatic variants,
binomial tests of the null hypothesis that the true
allelic fraction is 50% were implemented, along
with a false discovery rate correction using the
Benjamini-Hochberg method. In addition, any
variants occurring with an allele fraction of less
than 10% or occurring more than three times in
the cohort were removed.
In the process of searching for drivers of clonal
hematopoiesis, we uncovered an underlying
issue related to sequencing capture kits. The
mutation profile of our discovered somatic
mutations failed to mimic the expected mutation
profile seen by Genovese et al. A large number
of A to C (and T to G) base changes were
reported in our results, which upon visual
examination, displayed extremely high strand
biases. This pointed towards the need to further
filter the data based on strand bias. After
extensive filtering, our final gene lists of putative
drivers in the ClinSeq whole exome data
included almost all of the candidate driver genes
noted by Genovese et al. and Jaiswal et al.
However, we also observed numerous other
genes with an even larger number of somatic
mutations that were not previously noted
candidate drivers.
1. Workflow
2. ClinSeq Cohort
3. Mutation Profiles
The ClinSeq cohort includes males and females
primarily between the ages of 45 and 65, a somewhat
narrower range than that of the Genovese cohort.
The ClinSeq cohort consisting of 1001 individuals
provides the opportunity to obtain follow up samples
in the future from any individuals that may exhibit
somatic mutations in genes previously known to
cause clonal expansion.
The pipeline for this project mimics the steps taken by Genovese
et al. to identify putative somatic mutations. Binomial testing was
performed on the null hypothesis that the allelic fraction is 50%,
Benjamini Hochberg method was implemented for multi-test
correction, and samples appearing more than twice in the cohort
or with an allelic fraction of less than 10% were removed.
Figure S5 from Supplementary Data of Genovese et al.
“Clonal Hematopoiesis and Blood-Cancer Risk Inferred from
Blood DNA Sequence”. New England Journal of Medicine.
26 Nov. 2014; 371:2477-87.
Specific tissues are known to
exhibit certain mutation profiles.
Our original mutation profile, seen
above left, for both driver genes,
and total genes was most similar to
the inclusive somatic Wave1 seen
by Genovese et al. which is not
expected given our data. Wave1
was excluded from the Genovese
analysis due to sequencing error.
After investigating the cause of
potential error in our data, and after
subsequent strand bias filtering of
less than 15, the final mutation
profile seen above matches what is
expected for somatic mutations in
DNA from the blood.
The number of total somatic mutations
in most samples is very low. The
graph on the left excludes 6 samples
that have greater than 50 somatic
mutations. All final somatic mutations
were calculated at a false discovery
rate cut off of 0.05 and a strand bias
cut off of less than 15. On the right are
the top genes that had the greatest
number of samples containing at least
one mutation in the given gene, along
with the driver genes seen by
Genovese, et al. We found a total of
4,600 genes exhibiting at least one
somatic mutation. The remaining five
of the 14 putative drivers discovered
by Genovese et al. were not present in
the ClinSeq samples.
4. Capture Kit
Analysis
The skewed mutation profile lead to
further investigation of the effects of the
different capture kits used on the ClinSeq
cohort. Three different capture kit types
were used in sequencing the 1001
samples: ICGC and Index, Exon, Truseq
V1 and V2. Each base change for each
capture kit was plotted against the
number of times that it occurred in the
cohort. It became evident that one of the
capture kits was prone to error because
so many of the A to C and T to G base
changes were appearing only on one
strand. For the other capture kits, and
other base changes, the majority of
strand bias values are extremely low.
Due to this finding, all variants with a
strand bias higher than 15 were removed
from the final analysis. These graphs
exclude any strand bias values higher
than 50.
5. Total Somatic Mutations
Future Directions
6. Mutations in Driver Genes
Next generation read data showing a somatic
DNMT3A p.R882H mutation. This mutation was
covered by 136 reads, 27.9% of which displayed the
mutant allele. DNMT3A p.R882H mutations are found
frequently in acute myeloid leukemia (AML) and are
associated with shorter overall survival [Ley et al.,
NEJM, "DNMT3A Mutations in Acute Myeloid
Leukemia", 2010].
Read data showing a somatic frameshift insertion
in the TET2 gene. This mutation causing a
frameshift at p.C262, was covered by 63 reads,
and had 30.2% of reads displaying the altered
allele. Truncating mutations in TET2 have been
found in roughly 15% of a variety of malignant
myeloid disorders [Delhommeau et al., NEJM,
"Mutation in TET2 in Myeloid Cancers", 2009].
Future work will include ensuring that genes with a large number of somatic mutations aren’t subject to
copy number variation, and searching in known driver regions for additional low-level variants that may
have been missed by our Lofreq analysis. Additional work will involve following up with ClinSeq individuals
who have somatic mutations in the putative driver genes, obtaining new DNA samples if possible, and
analyzing them for the presence of any newly acquired mutations.
SAMPLE
ANNOTATION
BINOMIAL
HYPOTHESIS
TESTING
FALSE
DISCOVERY RATE
CORRECTION
FILTER
MUTATIONS
OBSERVED 3 OR
MORE TIMES IN
COHORT
LOFREQ VARIANT
CALLING
ERROR PRONE
REGION
FILTERING
PUTATIVE
CLONAL
HEMATOPOIESIS
DRIVERS
ALIGNED
CLINSEQ BAM
FILES
ALLELE FRACTION
FILTERING
0
10
20
30
40
40 50 60 70
Age
NumberofPatients
Gender
Male
Female
Clinseq Cohort Age & Gender
0
50
100
150
0 10 20 30 40 50
Number of Somatic Mutations
NumberofSamples
0
50
100
count
Total Somatic Mutations per ClinSeq Sample
0
5
10
15
20
25
TTN SYNEI DNMT3A LYST MUC16 // TET2 ATM ASXL1 CBL JAK2 TP53 SF3B1 MYD88
Gene Names
TotalGeneCounts
0
5
10
15
20
25
Mutation.Count
Top Genes with Somatic Mutations
0
2000
4000
6000
8000
0 10 20 30 40 50
Strand Bias
Count
Base Change
A > C−
T > G−
ICGC and Index Capture Kit
0
5000
10000
15000
0 10 20 30 40
Strand Bias
Count
Base Change
A > C−
T > G−
Exon Capture Kit
0
2000
4000
6000
0 10 20 30 40 50
Strand Bias
Count
Base Change
A > C−
T > G−
Truseq V1 and V2 Capture Kit
0
20000
40000
60000
0 10 20 30 40 50
Strand Bias
Count
Base Change
C > T−
G > A−
ICGC and Index Capture Kit
0e+00
5e+04
1e+05
0 10 20 30 40
Strand Bias
Count
Base Change
C > T−
G > A−
Exon Capture Kit
0
20000
40000
60000
0 10 20 30 40 50
Strand Bias
Count
Base Change
C > T−
G > A−
Truseq V1 and V2 Capture Kit

More Related Content

What's hot

Mutation accumulation in adult stem celles
Mutation accumulation in adult stem cellesMutation accumulation in adult stem celles
Mutation accumulation in adult stem celles06AYDIN
 
Verifying the role of AID in Chronic Lymphocytic Leukemia
Verifying the role of AID in Chronic Lymphocytic LeukemiaVerifying the role of AID in Chronic Lymphocytic Leukemia
Verifying the role of AID in Chronic Lymphocytic LeukemiaCharlotte Broadbent
 
Sarcar B et al.,-Mol Cancer Ther-2011-Sarcar-
Sarcar B et al.,-Mol Cancer Ther-2011-Sarcar-Sarcar B et al.,-Mol Cancer Ther-2011-Sarcar-
Sarcar B et al.,-Mol Cancer Ther-2011-Sarcar-Bhaswati Sarcar
 
Genetic basis of cancer
Genetic basis of cancerGenetic basis of cancer
Genetic basis of cancerIkram Ullah
 
Minimal residual disease in AML
Minimal residual disease in AMLMinimal residual disease in AML
Minimal residual disease in AMLspa718
 
Journal Club- Urinary cell mRNA profile and acute cellular rejection
Journal Club- Urinary cell mRNA profile and acute cellular rejection Journal Club- Urinary cell mRNA profile and acute cellular rejection
Journal Club- Urinary cell mRNA profile and acute cellular rejection Wisit Cheungpasitporn
 
Federico Garnier - France - Tuesday 29 - Hematopoietic Stem Cells
Federico Garnier  - France - Tuesday 29 - Hematopoietic Stem CellsFederico Garnier  - France - Tuesday 29 - Hematopoietic Stem Cells
Federico Garnier - France - Tuesday 29 - Hematopoietic Stem Cellsincucai_isodp
 
Hematopoietic Stem Cell Transplantation: High Risk Diffuse Large Cell Lymphoma:
Hematopoietic Stem Cell Transplantation: High Risk Diffuse Large Cell Lymphoma:Hematopoietic Stem Cell Transplantation: High Risk Diffuse Large Cell Lymphoma:
Hematopoietic Stem Cell Transplantation: High Risk Diffuse Large Cell Lymphoma:spa718
 
10.2 inheritance
10.2 inheritance10.2 inheritance
10.2 inheritanceBob Smullen
 
Explanation slides Somatic Mutations cancer
Explanation slides Somatic Mutations cancerExplanation slides Somatic Mutations cancer
Explanation slides Somatic Mutations cancermeducationdotnet
 
QMB_Poster_Tom_Kelly
QMB_Poster_Tom_KellyQMB_Poster_Tom_Kelly
QMB_Poster_Tom_KellyTom Kelly
 
Genetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacologyGenetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacologyDeepak Kumar
 
Estudiante- Vanessa María Espinosa Cano
Estudiante- Vanessa María Espinosa Cano Estudiante- Vanessa María Espinosa Cano
Estudiante- Vanessa María Espinosa Cano Vanessa Espinosa
 
Hematopoietic Stem Cell Transplantation : Opportunities and challenges
Hematopoietic Stem Cell Transplantation : Opportunities and challengesHematopoietic Stem Cell Transplantation : Opportunities and challenges
Hematopoietic Stem Cell Transplantation : Opportunities and challengesBioAsia: The Global Bio Business Forum
 

What's hot (20)

Mutation accumulation in adult stem celles
Mutation accumulation in adult stem cellesMutation accumulation in adult stem celles
Mutation accumulation in adult stem celles
 
Verifying the role of AID in Chronic Lymphocytic Leukemia
Verifying the role of AID in Chronic Lymphocytic LeukemiaVerifying the role of AID in Chronic Lymphocytic Leukemia
Verifying the role of AID in Chronic Lymphocytic Leukemia
 
Sarcar B et al.,-Mol Cancer Ther-2011-Sarcar-
Sarcar B et al.,-Mol Cancer Ther-2011-Sarcar-Sarcar B et al.,-Mol Cancer Ther-2011-Sarcar-
Sarcar B et al.,-Mol Cancer Ther-2011-Sarcar-
 
Genetic basis of cancer
Genetic basis of cancerGenetic basis of cancer
Genetic basis of cancer
 
Minimal residual disease in AML
Minimal residual disease in AMLMinimal residual disease in AML
Minimal residual disease in AML
 
Cadher
CadherCadher
Cadher
 
Journal Club- Urinary cell mRNA profile and acute cellular rejection
Journal Club- Urinary cell mRNA profile and acute cellular rejection Journal Club- Urinary cell mRNA profile and acute cellular rejection
Journal Club- Urinary cell mRNA profile and acute cellular rejection
 
ABO incompatible renal transplant
ABO incompatible renal transplantABO incompatible renal transplant
ABO incompatible renal transplant
 
Federico Garnier - France - Tuesday 29 - Hematopoietic Stem Cells
Federico Garnier  - France - Tuesday 29 - Hematopoietic Stem CellsFederico Garnier  - France - Tuesday 29 - Hematopoietic Stem Cells
Federico Garnier - France - Tuesday 29 - Hematopoietic Stem Cells
 
Hematopoietic Stem Cell Transplantation: High Risk Diffuse Large Cell Lymphoma:
Hematopoietic Stem Cell Transplantation: High Risk Diffuse Large Cell Lymphoma:Hematopoietic Stem Cell Transplantation: High Risk Diffuse Large Cell Lymphoma:
Hematopoietic Stem Cell Transplantation: High Risk Diffuse Large Cell Lymphoma:
 
Article_Subclones_BIG
Article_Subclones_BIGArticle_Subclones_BIG
Article_Subclones_BIG
 
10.2 inheritance
10.2 inheritance10.2 inheritance
10.2 inheritance
 
LAMB3(2014)
LAMB3(2014)LAMB3(2014)
LAMB3(2014)
 
Explanation slides Somatic Mutations cancer
Explanation slides Somatic Mutations cancerExplanation slides Somatic Mutations cancer
Explanation slides Somatic Mutations cancer
 
Genetics Biology
Genetics BiologyGenetics Biology
Genetics Biology
 
QMB_Poster_Tom_Kelly
QMB_Poster_Tom_KellyQMB_Poster_Tom_Kelly
QMB_Poster_Tom_Kelly
 
Genetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacologyGenetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacology
 
PhD-Defense-May 13 2011
PhD-Defense-May 13 2011PhD-Defense-May 13 2011
PhD-Defense-May 13 2011
 
Estudiante- Vanessa María Espinosa Cano
Estudiante- Vanessa María Espinosa Cano Estudiante- Vanessa María Espinosa Cano
Estudiante- Vanessa María Espinosa Cano
 
Hematopoietic Stem Cell Transplantation : Opportunities and challenges
Hematopoietic Stem Cell Transplantation : Opportunities and challengesHematopoietic Stem Cell Transplantation : Opportunities and challenges
Hematopoietic Stem Cell Transplantation : Opportunities and challenges
 

Similar to Poster_NHGRI_DahliaShvets.ver2

From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingJoaquin Dopazo
 
Alain Toledano : Test and genomic profile : what future in breast cancer trea...
Alain Toledano : Test and genomic profile : what future in breast cancer trea...Alain Toledano : Test and genomic profile : what future in breast cancer trea...
Alain Toledano : Test and genomic profile : what future in breast cancer trea...breastcancerupdatecongress
 
NF2 frequency of mosaicism
NF2 frequency of mosaicismNF2 frequency of mosaicism
NF2 frequency of mosaicismBianca Heinrich
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingShelomi Karoon
 
AJP_12-0313_Araten_et_al_Word_Version
AJP_12-0313_Araten_et_al_Word_VersionAJP_12-0313_Araten_et_al_Word_Version
AJP_12-0313_Araten_et_al_Word_VersionJonathan Karten
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationJoaquin Dopazo
 
A new assay for measuring chromosome instability (CIN) and identification of...
A new assay for measuring chromosome instability  (CIN) and identification of...A new assay for measuring chromosome instability  (CIN) and identification of...
A new assay for measuring chromosome instability (CIN) and identification of...Enrique Moreno Gonzalez
 
A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicJoaquin Dopazo
 
Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Ronak Shah
 
The clinical value of telomere testing – dr. Mark Tager A4M May 2016
The clinical value of telomere testing – dr. Mark Tager A4M May 2016The clinical value of telomere testing – dr. Mark Tager A4M May 2016
The clinical value of telomere testing – dr. Mark Tager A4M May 2016Life Length
 
Tumor Mutational Load assessment of FFPE samples using an NGS based assay
Tumor Mutational Load assessment of FFPE samples using an NGS based assayTumor Mutational Load assessment of FFPE samples using an NGS based assay
Tumor Mutational Load assessment of FFPE samples using an NGS based assayThermo Fisher Scientific
 
TEMPUS xT, Biópsia sólida vs TEMPUS xF, Biópsia Líquida
TEMPUS xT, Biópsia sólida vs TEMPUS xF, Biópsia LíquidaTEMPUS xT, Biópsia sólida vs TEMPUS xF, Biópsia Líquida
TEMPUS xT, Biópsia sólida vs TEMPUS xF, Biópsia LíquidaComunicaoIberlab
 
Gene expression profiling reveals molecularly and clinically distinct subtype...
Gene expression profiling reveals molecularly and clinically distinct subtype...Gene expression profiling reveals molecularly and clinically distinct subtype...
Gene expression profiling reveals molecularly and clinically distinct subtype...Yu Liang
 
Montgomery expression
Montgomery expressionMontgomery expression
Montgomery expressionmorenorossi
 
A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...
A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...
A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...Antoaneta Vladimirova
 

Similar to Poster_NHGRI_DahliaShvets.ver2 (20)

From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene finding
 
MCR_Article_JW
MCR_Article_JWMCR_Article_JW
MCR_Article_JW
 
Alain Toledano : Test and genomic profile : what future in breast cancer trea...
Alain Toledano : Test and genomic profile : what future in breast cancer trea...Alain Toledano : Test and genomic profile : what future in breast cancer trea...
Alain Toledano : Test and genomic profile : what future in breast cancer trea...
 
NF2 frequency of mosaicism
NF2 frequency of mosaicismNF2 frequency of mosaicism
NF2 frequency of mosaicism
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
AJP_12-0313_Araten_et_al_Word_Version
AJP_12-0313_Araten_et_al_Word_VersionAJP_12-0313_Araten_et_al_Word_Version
AJP_12-0313_Araten_et_al_Word_Version
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical information
 
A new assay for measuring chromosome instability (CIN) and identification of...
A new assay for measuring chromosome instability  (CIN) and identification of...A new assay for measuring chromosome instability  (CIN) and identification of...
A new assay for measuring chromosome instability (CIN) and identification of...
 
A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The Clinic
 
Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...
 
The clinical value of telomere testing – dr. Mark Tager A4M May 2016
The clinical value of telomere testing – dr. Mark Tager A4M May 2016The clinical value of telomere testing – dr. Mark Tager A4M May 2016
The clinical value of telomere testing – dr. Mark Tager A4M May 2016
 
Tumor Mutational Load assessment of FFPE samples using an NGS based assay
Tumor Mutational Load assessment of FFPE samples using an NGS based assayTumor Mutational Load assessment of FFPE samples using an NGS based assay
Tumor Mutational Load assessment of FFPE samples using an NGS based assay
 
Oncogene_2010_Ocak
Oncogene_2010_OcakOncogene_2010_Ocak
Oncogene_2010_Ocak
 
GWAS
GWASGWAS
GWAS
 
TEMPUS xT, Biópsia sólida vs TEMPUS xF, Biópsia Líquida
TEMPUS xT, Biópsia sólida vs TEMPUS xF, Biópsia LíquidaTEMPUS xT, Biópsia sólida vs TEMPUS xF, Biópsia Líquida
TEMPUS xT, Biópsia sólida vs TEMPUS xF, Biópsia Líquida
 
Gene expression profiling reveals molecularly and clinically distinct subtype...
Gene expression profiling reveals molecularly and clinically distinct subtype...Gene expression profiling reveals molecularly and clinically distinct subtype...
Gene expression profiling reveals molecularly and clinically distinct subtype...
 
poster
posterposter
poster
 
Montgomery expression
Montgomery expressionMontgomery expression
Montgomery expression
 
ResearchreportSTS
ResearchreportSTSResearchreportSTS
ResearchreportSTS
 
A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...
A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...
A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...
 

Poster_NHGRI_DahliaShvets.ver2

  • 1. Abstract Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, NHGRI Dahlia Shvets, James C. Mullikin, Leslie Biesecker, and Nancy F. Hansen Investigation of Clonal Hematopoiesis in Whole-Exome Sequencing of ClinSeq Individuals National Human Genome Research Institute Comparative Genomics Analysis Unit Cancer is thought to arise from the gradual accumulation of specific genetic mutations, sometimes years before the presence of clinical symptoms. Early mutations can result in clonal expansion of mutated stem or progenitor cells. Clonal expansion subsequently increases the likelihood of cooperating mutations occurring in cells already harboring initiating mutations. Clonal hematopoiesis--the clonal expansion of hematopoietic stem cells--may signal the onset of many hematologic cancers. In previous work, clonal hematopoiesis has been shown to occur with higher incidence in the elderly and is a risk factor for later hematopoietic cancers. Two recent studies published in the New England Journal of Medicine, by Genovese, et al. and Jaiswal, et al., performed large-scale analyses searching for recurrent somatic mutations in whole-exome sequencing of DNA isolated from blood. These studies conclude that clonal hematopoiesis with somatic mutations can be detected via DNA sequencing, that it increases in prevalence with age, and is associated with an increased risk of hematologic cancer. We aim to replicate their work using whole-exome sequences from 1,001 individuals in the ClinSeq cohort. We examine blood derived DNA sequence data from ClinSeq individuals to identify the genes and their mutations that may drive clonal expansion. Prior to this work, the ClinSeq data had already been aligned with NovoAlign to the GRCh37 human reference sequence. We used the program LoFreq to call low-frequency variants using these alignments. Following Genovese et al., we attempted to remove unreliable data from our analysis by excluding genomic regions of low complexity, excess coverage, segmental duplications, known large insertions, and sites failing Hardy Weinberg equilibrium tests. Then, to separate germline from somatic variants, binomial tests of the null hypothesis that the true allelic fraction is 50% were implemented, along with a false discovery rate correction using the Benjamini-Hochberg method. In addition, any variants occurring with an allele fraction of less than 10% or occurring more than three times in the cohort were removed. In the process of searching for drivers of clonal hematopoiesis, we uncovered an underlying issue related to sequencing capture kits. The mutation profile of our discovered somatic mutations failed to mimic the expected mutation profile seen by Genovese et al. A large number of A to C (and T to G) base changes were reported in our results, which upon visual examination, displayed extremely high strand biases. This pointed towards the need to further filter the data based on strand bias. After extensive filtering, our final gene lists of putative drivers in the ClinSeq whole exome data included almost all of the candidate driver genes noted by Genovese et al. and Jaiswal et al. However, we also observed numerous other genes with an even larger number of somatic mutations that were not previously noted candidate drivers. 1. Workflow 2. ClinSeq Cohort 3. Mutation Profiles The ClinSeq cohort includes males and females primarily between the ages of 45 and 65, a somewhat narrower range than that of the Genovese cohort. The ClinSeq cohort consisting of 1001 individuals provides the opportunity to obtain follow up samples in the future from any individuals that may exhibit somatic mutations in genes previously known to cause clonal expansion. The pipeline for this project mimics the steps taken by Genovese et al. to identify putative somatic mutations. Binomial testing was performed on the null hypothesis that the allelic fraction is 50%, Benjamini Hochberg method was implemented for multi-test correction, and samples appearing more than twice in the cohort or with an allelic fraction of less than 10% were removed. Figure S5 from Supplementary Data of Genovese et al. “Clonal Hematopoiesis and Blood-Cancer Risk Inferred from Blood DNA Sequence”. New England Journal of Medicine. 26 Nov. 2014; 371:2477-87. Specific tissues are known to exhibit certain mutation profiles. Our original mutation profile, seen above left, for both driver genes, and total genes was most similar to the inclusive somatic Wave1 seen by Genovese et al. which is not expected given our data. Wave1 was excluded from the Genovese analysis due to sequencing error. After investigating the cause of potential error in our data, and after subsequent strand bias filtering of less than 15, the final mutation profile seen above matches what is expected for somatic mutations in DNA from the blood. The number of total somatic mutations in most samples is very low. The graph on the left excludes 6 samples that have greater than 50 somatic mutations. All final somatic mutations were calculated at a false discovery rate cut off of 0.05 and a strand bias cut off of less than 15. On the right are the top genes that had the greatest number of samples containing at least one mutation in the given gene, along with the driver genes seen by Genovese, et al. We found a total of 4,600 genes exhibiting at least one somatic mutation. The remaining five of the 14 putative drivers discovered by Genovese et al. were not present in the ClinSeq samples. 4. Capture Kit Analysis The skewed mutation profile lead to further investigation of the effects of the different capture kits used on the ClinSeq cohort. Three different capture kit types were used in sequencing the 1001 samples: ICGC and Index, Exon, Truseq V1 and V2. Each base change for each capture kit was plotted against the number of times that it occurred in the cohort. It became evident that one of the capture kits was prone to error because so many of the A to C and T to G base changes were appearing only on one strand. For the other capture kits, and other base changes, the majority of strand bias values are extremely low. Due to this finding, all variants with a strand bias higher than 15 were removed from the final analysis. These graphs exclude any strand bias values higher than 50. 5. Total Somatic Mutations Future Directions 6. Mutations in Driver Genes Next generation read data showing a somatic DNMT3A p.R882H mutation. This mutation was covered by 136 reads, 27.9% of which displayed the mutant allele. DNMT3A p.R882H mutations are found frequently in acute myeloid leukemia (AML) and are associated with shorter overall survival [Ley et al., NEJM, "DNMT3A Mutations in Acute Myeloid Leukemia", 2010]. Read data showing a somatic frameshift insertion in the TET2 gene. This mutation causing a frameshift at p.C262, was covered by 63 reads, and had 30.2% of reads displaying the altered allele. Truncating mutations in TET2 have been found in roughly 15% of a variety of malignant myeloid disorders [Delhommeau et al., NEJM, "Mutation in TET2 in Myeloid Cancers", 2009]. Future work will include ensuring that genes with a large number of somatic mutations aren’t subject to copy number variation, and searching in known driver regions for additional low-level variants that may have been missed by our Lofreq analysis. Additional work will involve following up with ClinSeq individuals who have somatic mutations in the putative driver genes, obtaining new DNA samples if possible, and analyzing them for the presence of any newly acquired mutations. SAMPLE ANNOTATION BINOMIAL HYPOTHESIS TESTING FALSE DISCOVERY RATE CORRECTION FILTER MUTATIONS OBSERVED 3 OR MORE TIMES IN COHORT LOFREQ VARIANT CALLING ERROR PRONE REGION FILTERING PUTATIVE CLONAL HEMATOPOIESIS DRIVERS ALIGNED CLINSEQ BAM FILES ALLELE FRACTION FILTERING 0 10 20 30 40 40 50 60 70 Age NumberofPatients Gender Male Female Clinseq Cohort Age & Gender 0 50 100 150 0 10 20 30 40 50 Number of Somatic Mutations NumberofSamples 0 50 100 count Total Somatic Mutations per ClinSeq Sample 0 5 10 15 20 25 TTN SYNEI DNMT3A LYST MUC16 // TET2 ATM ASXL1 CBL JAK2 TP53 SF3B1 MYD88 Gene Names TotalGeneCounts 0 5 10 15 20 25 Mutation.Count Top Genes with Somatic Mutations 0 2000 4000 6000 8000 0 10 20 30 40 50 Strand Bias Count Base Change A > C− T > G− ICGC and Index Capture Kit 0 5000 10000 15000 0 10 20 30 40 Strand Bias Count Base Change A > C− T > G− Exon Capture Kit 0 2000 4000 6000 0 10 20 30 40 50 Strand Bias Count Base Change A > C− T > G− Truseq V1 and V2 Capture Kit 0 20000 40000 60000 0 10 20 30 40 50 Strand Bias Count Base Change C > T− G > A− ICGC and Index Capture Kit 0e+00 5e+04 1e+05 0 10 20 30 40 Strand Bias Count Base Change C > T− G > A− Exon Capture Kit 0 20000 40000 60000 0 10 20 30 40 50 Strand Bias Count Base Change C > T− G > A− Truseq V1 and V2 Capture Kit