SlideShare a Scribd company logo
1 of 18
Download to read offline
Verifying the role of activation-induced
deaminase in chronic lymphocytic leukemia
Charlotte Broadbent
Ward Melville High School
380 Old Town Road
East Setauket, NY 11733
1
Abstract
Chronic lymphocytic leukemia (CLL) is a relatively common disease amongst aging
adults (Cheson, 2001). To successfully design a drug to combat this disease, more background on
its mechanisms is vital. One aspect of CLL that remains unclear is the role of the enzyme
activation-induced deaminase (AID). AID is normally involved in somatic hypermutation, a
mechanism B-cells use to differentiate their variable region to combat pathogens. AID has
already been associated with CLL, but the sequencing method used, known as pyrosequencing, is
not accurate enough to detect all of the variable region clones when analyzing specific sequences
such as AID hotspots (Ansorge, 2009). The goal of this research was to verify the role of AID
using a statistical method known as bootstrapping, or resampling, so these inaccuracies would
not affect the total distribution of variable region clones. Two parameters were measured to test
for AID activity – mutations in the variable region compared to the constant region, and
mutations at AID hotspots compared to total mutations. Results from both tests were statistically
significant, verifying the role of AID in chronic lymphocytic leukemia.
2
Introduction
B-cell chronic lymphocytic leukemia (B-CLL) is the most prevalent type of leukemia
among aging Caucasians (Cheson, 2001), and is a disease which is currently incurable. Patients
with B-CLL follow one of two clinical outcomes: an indolent outcome, in which the median
survival age of patients is greater than 25 years, and an aggressive outcome, in which those
afflicted decline relatively rapidly, with a median survival age of less than 8 years (Chiorazzi et
al., 2005). The ineffectiveness of treatment in prolonging lifespan (Rai et al., 2000) has led many
physicians to delay aggressive treatment until the nature of the disease is evident. However, the
two clinical outcomes have been associated with a biological difference – those patients whose
immunoglobulin variable-region heavy chain (IgVH) is relatively free of mutations follow the
more aggressive outcome, while patients whose IgVH region contains considerable mutations
follow the more indolent outcome (Fais et al., 1998).
One of the primary defenses of the immune system is the diverse repertoire of B cells that
identify pathogens in the body. Unique antigen receptors on the surface of the B cells bind to
foreign antigens, after which the B cell divides and produces millions of antibodies to destroy the
pathogen. These antigen receptors are unique due to the variation in the amino acid sequence at
the antigen-binding site, which consists of a variable (V) region and a constant (C) region. The
distribution of V regions in immunoglobulin genes will always have a dominant clone, known as
the consensus sequence. This clone accounts for approximately 95% of the V region clones.
However, for this process to work, the unique variable region on each antigen receptor must have
an efficient method of differentiating.
Somatic hypermutation is the process which diversifies B cells so they can recognize
3
threats to the immune system. The enzyme AID (activation-induced deaminase) induces point
mutations, usually at locations in the variable region of the DNA strand known as WRC hotspots,
where W represents A or T, R represents G or A, and C is always cytosine, or its inverse, GYW
hotspots, where G is always guanine and Y represents a pyrimidine. AID tends to avoid SYC
cold spots, where S represents G or C. Somatic hypermutation is responsible for many of the
mutations in immunoglobulin genes – this research seeks to verify, using statistical methods,
whether the mutations in cells of patients with the indolent form of B-CLL are in fact caused by
AID activity, since it is already known that AID is associated with CLL (Patten et al., 2012).
The DNA of leukemic and other cells has been sequenced since 1977 primarily via the
chain termination sequencing method, also known as the Sanger method (Sanger et al., 1977).
This method has several advantages: it can sequence strands up to five hundred base pairs in
length with fairly high accuracy. However, the Sanger method also has limitations: there are few
opportunities for parallel sequencing, making sequencing of large quantities of DNA very
difficult. Also, sample preparation can take up to four hours (Fakruddin et al., 2012).
These limitations are addressed by a relatively new method known as pyrosequencing.
Pyrosequencing has many distinct advantages over the Sanger method. Even though it can only
process strands around four hundred base pairs in length, it can sequence many strands in
parallel, making it more efficient for much larger quantities of DNA. It eliminates the need to
label the primers, which are the synthetically prepared starting points for the DNA polymerase.
Pyrosequencing is also easy to automate, the cost is lower because of the smaller strands, and
preparation time is as little as fifteen minutes (Fakruddin et al., 2012). In this project, each
sequence was a different B cell's variable region. Pyrosequencing was therefore the logical
4
choice, due to the immense quantity of sequences that were each relatively short in length.
However, the pyrosequencing method is much more error prone than the Sanger method,
resulting in many inaccuracies throughout the sequences (Ansorge, 2009). Another problem with
this technique is the number of sequences changes during the course of the experiment. After
AID is stimulated and the results are analyzed, there is often a discrepancy between how many
sequences were originally counted and how many remain. This is attributed to the death of the
cells containing the DNA during the course of the experiment. There is therefore uncertainty in
the exact number and type of variable region clones after AID stimulation.
These problems can be solved by applying a statistical technique known as resampling, or
bootstrapping. The term “bootstrap” was first used by Efron in 1979 (Efron,1979). In particular,
he described the nonparametric bootstrap, or resampling with replacement of size n from the
original sample taken of size n (Geyer, 2006). Therefore, when a sample is resampled thousands
of times, a more representative distribution of possible samples can be obtained regardless of the
shape of the distribution. In this case, this technique allows for a more representative distribution
of immunoglobulin mutations, allowing confirmation of AID activity in leukemic cells. Such a
confirmation would be instrumental in aiding drug design for CLL.
Methods
Sequence data was obtained from collaborators at Northshore LIJ. DNA from patients
with chronic lymphocytic leukemia was sequenced using Roche 454/GC FLX pyrosequencing
technology, both before and after AID stimulation. Here stimulation means that different
experiments were performed to create an AID expression environment. The sequencing was
5
performed for different amounts of time for different patients, lasting up to ninety-seven days.
Each day, the DNA was sequenced once from the 5'-end of the chain of bases and once from the
3'-end. The first 20 nucleotides from both ends were removed to eliminate the primer. The data
from the 5'-end sequences were considered to be the variable region of the immunoglobulin
gene, while the first 103 nucleotides from the 3'-end sequence were considered to be the constant
region of the immunoglobulin heavy chains.
All analysis was performed using the statistical program R. In addition, the program
functions Biostrings, ape, lattice, and ShortRead were utilized.
The data was first resampled using the bootstrap technique. A function was created to
identify the unique sequences of variable regions in each patient. The output of this function,
readUniqueSequences, was four different parameters: the unique sequences (vUniqueSeqs), the
associated number of times each of those sequences appeared (vCounts), the consensus sequence
(consensusSequence), and the length of the consensus sequence (blockWidth). These values were
then used in the bootstrap function, getBootstrapCount. The R sample function was used, taking
a sample of size 5000 from the values 1 through the length of vUniqueSeqs, with replacement,
using vCounts as the probability distribution, so that the output was a set of 5000 integers
ranging from 1 to the length of vUniqueSeqs. The unique values were then extracted using the R
function unique, so that repeats were eliminated. Finally, the numbers that remained
corresponded with one of the vUniqueSeqs, so the final variable calculated,
bootstrapUniqueSequences, was a list of less than 5000 unique sequences that was representative
of the original set of sequences.
To confirm AID activity in patients with CLL, the mutation on the variable region
6
(IGHV) was compared to the constant region (IGHC), since if the mutations were caused by
somatic hypermutation, there would be no mutations (or low frequency) on the constant region
compared to the variable region. Only mutations occurring at C:G were considered, since only
these would be indicative of AID activity. A function to determine the number of GC mutations
compared to the consensus sequence was made, compareMutationsGCsites. The sequence was
split into individual characters using the R function strsplit and assigned to the variable seq_split.
To ensure the sequence and the consensus sequence were the same length, whichever had the
smaller length was taken using the R function min. The variable GCsitesIndex was created,
which contained only those values in the consensus sequence which were G or C. Then, to
compare the two sequences, a for loop was created for i in 1:length(GCsitesIndex). For every
number, 1 through the length of GCsitesIndex, the corresponding element was assigned to the
variable Pos. The variable consensusNt was created to represent the values of Pos in the
consensus sequence. Meanwhile, the variable seqNt was created to represent the values of Pos in
seq_split. The if function was then used to determine if any element in consensusNt and seqNt
were equal, to add another number to the output, mutationNum. This value therefore gives the
total number of GC mutations in one sequence compared to the consensus sequence.
Another function, countMutationGCsites, was created to count the number of GC
mutations for all of the sequences in vUniqueSeqs, instead of just one. A for loop was created for
k in 1:length(vUniqueSeqs) which calculated the function compareMutationsGCsites for every
sequence in vUniqueSeqs and compiled all of the mutationNum values for each sequence into
one vector.
The data from the variable region and constant region were then compared using this
7
function after having been resampled. The values were then tested for statistical significance
using a two sample t test.
Another way in which AID activity was measured was by counting the mutations
occurring at WRC/GYW hotspots compared to all of the mutations, since it is known that AID
preferentially targets such sites. To test the hypothesis whether mutations in the variable region
were caused by somatic hypermutation, it was verified that the mutations occurring at
WRC/GYW hotspots were significant compared to all other mutations. A function
counttotalMutations was created to count all of the mutations in the resampled sequences. The R
function consensusMatrix was used to compare the consensus sequence with the unique
bootstrapped sequences, and all of the mutations were counted. Then, to count the mutations
occuring at hotspots, a function countHotSpotPosition was created to locate the hotspots in the
consensus sequence. The R function matchPattern was used to locate positions on the sequence
that matched the criteria for WRC/GYW hotspots. Then, another function,
countHotSpotMutations, was created to count all of the hotspot mutations that occured in the
bootstrapped sequences realtive to the consensus sequence. The R function consensusMatrix was
used to compare the hotspot positions of the consensus sequence (obtained from the function
countHotSpotPosition) with the bootstrapped sequences. The output of the function was the total
number of hotspot mutations within the resampled sequences relative to the consensus sequence.
The ratio of the resulting values, the total number of mutations and the total number of
hotspot mutations, was then calculated. This ratio was tested for significance against the null
hypothesis of one using a one sample t test.
Finally, a third test was performed to compare the number of mutations at SRC coldspots
8
to all of the mutations in the variable region. The same procedure was used for this test as the
previous one, except the pattern used in matchPattern was the coldspot sequence rather than the
hotspot sequence. Also, instead of being tested for being significantly higher, the coldspot data
set was tested to see if it was significantly lower than the null hypothesis of one.
9
Results
Figure 1 shows the ratio between variable region mutations and constant region
mutations, so if there was no AID activity, the ratio between the two would be one. If AID did
cause mutations in the variable region, the ratio would be higher than one. The test to determine
whether there were more mutations in the variable region compared to the constant region
produced many statistically significant results. Assuming a significance level of .05, eight of the
eleven samples of DNA rejected the null hypothesis that there was no difference in number of
mutations between the variable region and constant region.
Figure 2 shows the ratio between hotspot mutations and total mutations in the variable
region, so if there was no AID activity, the ratio between the two would be one. If AID was
active in the variable region, the ratio would be higher than one. The test to determine the
prevalence of mutations at AID hotspots versus all mutations also produced many statistically
significant results. Assuming a significance level of .05, six of the eleven samples of DNA
rejected the null hypothesis that there were was no targeting of WRC/GYW hotspots in the
immunoglobulin genes.
Figure 3 shows the ratio between coldspot mutations and total mutations in the
variable region, so if there was no AID activity, the ratio between the two would be one. If AID
was active in the variable region, the ratio would be lower than one, since AID tends to avoid
such coldspots. The test to determine the prevalence of mutations at AID coldspots versus all
mutations did not show as many statistically significant results as did the other two tests.
Assuming a significance level of .05, only four of the eleven samples of DNA rejected the null
hypothesis that there was no avoidance of SYC coldspots in the immunoglobulin genes.
10
Figure 1. Histogram of mutations occurring at variable region over constant region. The red line
is the expected value 1. The more area on the right of the read line, the more likely the mutation
is caused by somatic hypermutation.
11
Figure 2. Histogram of AID hotspot mutation over random mutation. Shows the ratio between
mutation occurring at AID hotspot(WRC/GYW) and total mutation for each new clone. The red
vertical line is the expected value 1, since if there is no bias for AID targeting, the expectation
should be 1. The more area on the right of the read line, the more likely the mutation is targeted
by AID.
12
Figure 3. Histogram of AID coldspot mutation over random mutation. Shows the ratio between
mutations occurring at AID coldspots and total mutations for each new clone. The red vertical
line is the expected value 1, since if there is no bias for AID targeting, the expectation should be
1. The more area on the left of the red line, the more likely the mutation is targeted by AID.
13
Discussion
The results of this study confirmed the activity of AID in patients with chronic
lymphocytic leukemia, linking the mutations that occur in the variable regions of these patients
with somatic hypermutation. Of the three major parameters tested, two returned promising
results and the other was still returned a relatively high ratio of significance. Both the
comparison of mutations in the variable region versus constant region and the comparison of
mutations occurring at AID hotspots versus all mutations in the variable region showed evidence
of AID activity. Although the test for lack of mutations at AID coldspots versus all mutations in
the variable region did not show results that were as promising as the other tests, several
examples did show significance and a few more were only just above the significance threshold.
These findings are consistent with studies that confirm the presence of AID in IGHV
mutated and unmutated CLL cells. Patten et al showed that both mutated and unmutated CLL
cells were able to produce AID mRNA protein, confirming the presence of AID in these cells. It
was therefore expected that AID would perform some function in the variable region of these
cells, although no distinction between mutated and unmutated cells was made.
These findings are pivotal in advancing our knowledge of CLL. Being able to confirm the
activity of AID in activated CLL cells enhances our understanding of the disease, which is
necessary if any progress is to be made with curing CLL. In addition, these findings help to
explain the differences between the two types of CLL – the indolent form and aggressive form.
Since the aggressive form has few mutations in the variable region, our results suggest the
aggressive form is perhaps due to some lack of function in the somatic hypermutation process,
which prevents the B-cells from mutating and providing some form of defense for the immune
14
system.
One aspect of our findings that was not what we had expected was the low significance in
the results of the test for lack of AID activity at AID coldspots. Although several of the results
were significant, less than half were and several had p-values higher than 0.5. There are several
possible explanations for this result. The first, and most obvious, would be experimental error.
However, since the results of the other two tests did not show any similar lack of significance,
this explanation, although possible, cannot solely explain this occurrence. Another possible
explanation for the results of the AID coldspot test would be that AID does not actually avoid
coldspots as much as it targets hotspots. In other words, even though AID does avoid these
specific DNA sequences, they cannot serve as thorough an indicator of AID activity as do AID
hotspots. Although there are no indications in the literature to suggest this possibility, it could
explain the results.
Although this study was important in furthering our knowledge of AID's role in CLL, the
possibilities of future projects are abundant. One such project could seek to further examine the
difference between mutated and unmutated CLL. It is now apparent that AID is active in mutated
CLL cells. However, the reason why this occurs is still ambiguous. If we were to know why AID
functions in the indolent form of CLL and doesn't in the aggressive form of CLL, we could try to
somehow change some aspect of the aggressive form to become the indolent form, if not treat the
disease entirely. Since patients with the indolent form often die with the disease and not from it,
this could save countless lives.
Another study is to follow up on why the AID coldspot test did not return as significant
results as expected. More of the same test could be performed to confirm the results; if the results
15
are consistent, a study could be designed to compare the relative targeting of hotspots versus
coldspots. A statistical analysis of hotspot/coldspot mutation frequency could reveal whether or
not there is a difference between the two. This information would advance our knowledge of
AID and how it targets the variable regions in immunoglobulin genes.
The goal of our research was to verify the role of AID in patients with CLL. Based on the
results of a study by Patten et al, which confirmed the presence of AID mRNA protein in mutated
and unmutated CLL cells, we believed AID would be proven to cause the mutations in the
mutated CLL cells. Our results confirmed this, as many of the samples showed significant
results. The tests for mutations in the variable region versus mutations in the constant region and
AID hotspot mutations versus all mutations were promising, and even though the test for
coldspot mutation versus total mutation did not show as significant results as were expected, a
few samples did show significance. This slight discrepancy should be further looked into. This
information will aid future investigations that seek to design treatment for CLL.
16
References
Ansorge, Wilhelm J. (2009). Next-generation DNA Sequencing Techniques. New Biotechnology,
25 (4), n. pag.
Cheson, B.D. (2001). The chronic lymphocytic leukemias. The Annals of Oncology, 13 (12),
1957 – 1957-a.
Chiorazzi, Nicholas, Katerina Hatzi, and Emilia Albesiano. (2005). B-Cell Chronic Lymphocytic
Leukemia, a Clonal Disease of B Lymphocytes with Receptors That Vary in Specificity
for (Auto)antigens. New York Academy of Sciences, 1062, 1-12.
Efron, B. (1979). Bootstrap methods: another look at the jackknife. Annals of Statistics, 7, 1-26.
Fais, F. et al. (1998). Chronic lymphocytic leukemia B cells express restricted sets of mutated
and unmutated antigen receptors. The Journal of Clinical Investigation, 102, 1515 –
1525.
Fakruddin, M.D. et al. (2012). Pyrosequencing - Principles and Applications. International
Journal of Life Science and Pharma Research, 2 (2), n. pag.
Geyer, Charles J. (2006). 5601 Notes: The Subsampling Bootstrap.
Patten, P.E. et al. (2012). IGHV-unmutated and IGHV-mutated chronic lymphocytic leukemia
cells produce activation-induced deaminase protein with a full range of biologic
functions. Blood. 120 (24), 4802–4811.
Rai, K.R. et al. (2000). Fludarabine compared with chlorambucil as primary therapy for chronic
lymphocytic leukemia. The New England Journal of Medicine, 343, 1750 – 1757.
Sanger F., Nicklen S. and Coulson A.R. (1977). DNA sequencing with chain-terminating
inhibitors. Proceedings of the National Academy of Sciences of the United States of
17
America, 74, 5463- 5467.
18

More Related Content

What's hot

Casc15 promotes lens epithelial cell apoptosis in age related cataracts by re...
Casc15 promotes lens epithelial cell apoptosis in age related cataracts by re...Casc15 promotes lens epithelial cell apoptosis in age related cataracts by re...
Casc15 promotes lens epithelial cell apoptosis in age related cataracts by re...Clinical Surgery Research Communications
 
Lnc rna meg3 promotes glaucomatous retinal ganglion cell apoptosis via upregu...
Lnc rna meg3 promotes glaucomatous retinal ganglion cell apoptosis via upregu...Lnc rna meg3 promotes glaucomatous retinal ganglion cell apoptosis via upregu...
Lnc rna meg3 promotes glaucomatous retinal ganglion cell apoptosis via upregu...Clinical Surgery Research Communications
 
Lung adenocarcinoma -molecular pathology
Lung adenocarcinoma -molecular pathologyLung adenocarcinoma -molecular pathology
Lung adenocarcinoma -molecular pathologymlahori
 
Crispr cas9 in pancreatic cancer
Crispr cas9 in pancreatic cancerCrispr cas9 in pancreatic cancer
Crispr cas9 in pancreatic cancerIzzxanTanpinarizza
 
Treatment for lysosomal storage diseases using crispr cas9
Treatment for lysosomal storage diseases using crispr cas9Treatment for lysosomal storage diseases using crispr cas9
Treatment for lysosomal storage diseases using crispr cas9limchloe
 
Journal Club- Urinary cell mRNA profile and acute cellular rejection
Journal Club- Urinary cell mRNA profile and acute cellular rejection Journal Club- Urinary cell mRNA profile and acute cellular rejection
Journal Club- Urinary cell mRNA profile and acute cellular rejection Wisit Cheungpasitporn
 
Inferring microbial gene function from evolution of synonymous codon usage bi...
Inferring microbial gene function from evolution of synonymous codon usage bi...Inferring microbial gene function from evolution of synonymous codon usage bi...
Inferring microbial gene function from evolution of synonymous codon usage bi...Fran Supek
 
Effects of splicing mutations on NF2 transcripts
Effects of splicing mutations on NF2 transcriptsEffects of splicing mutations on NF2 transcripts
Effects of splicing mutations on NF2 transcriptsBianca Heinrich
 
Detection of heterogeneous flt3 itd mutant variants in
Detection of heterogeneous flt3  itd mutant variants inDetection of heterogeneous flt3  itd mutant variants in
Detection of heterogeneous flt3 itd mutant variants inkamalmodi481
 
journal.pone.0113435
journal.pone.0113435journal.pone.0113435
journal.pone.0113435Eric Chiou
 
Journal club multiple myeloma
Journal club multiple myelomaJournal club multiple myeloma
Journal club multiple myelomaapeksha40
 
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013Functional Genomics Data Society
 
Roles of circular rn as and their interactions with micro rnas in human disor...
Roles of circular rn as and their interactions with micro rnas in human disor...Roles of circular rn as and their interactions with micro rnas in human disor...
Roles of circular rn as and their interactions with micro rnas in human disor...Clinical Surgery Research Communications
 

What's hot (20)

article
articlearticle
article
 
Casc15 promotes lens epithelial cell apoptosis in age related cataracts by re...
Casc15 promotes lens epithelial cell apoptosis in age related cataracts by re...Casc15 promotes lens epithelial cell apoptosis in age related cataracts by re...
Casc15 promotes lens epithelial cell apoptosis in age related cataracts by re...
 
Lnc rna meg3 promotes glaucomatous retinal ganglion cell apoptosis via upregu...
Lnc rna meg3 promotes glaucomatous retinal ganglion cell apoptosis via upregu...Lnc rna meg3 promotes glaucomatous retinal ganglion cell apoptosis via upregu...
Lnc rna meg3 promotes glaucomatous retinal ganglion cell apoptosis via upregu...
 
FA abstract
FA abstractFA abstract
FA abstract
 
Article_Subclones_BIG
Article_Subclones_BIGArticle_Subclones_BIG
Article_Subclones_BIG
 
Lung adenocarcinoma -molecular pathology
Lung adenocarcinoma -molecular pathologyLung adenocarcinoma -molecular pathology
Lung adenocarcinoma -molecular pathology
 
Crispr cas9 in pancreatic cancer
Crispr cas9 in pancreatic cancerCrispr cas9 in pancreatic cancer
Crispr cas9 in pancreatic cancer
 
vineeta poster 2
vineeta  poster  2vineeta  poster  2
vineeta poster 2
 
Treatment for lysosomal storage diseases using crispr cas9
Treatment for lysosomal storage diseases using crispr cas9Treatment for lysosomal storage diseases using crispr cas9
Treatment for lysosomal storage diseases using crispr cas9
 
Shamah MCB 93
Shamah MCB 93Shamah MCB 93
Shamah MCB 93
 
Journal Club- Urinary cell mRNA profile and acute cellular rejection
Journal Club- Urinary cell mRNA profile and acute cellular rejection Journal Club- Urinary cell mRNA profile and acute cellular rejection
Journal Club- Urinary cell mRNA profile and acute cellular rejection
 
Inferring microbial gene function from evolution of synonymous codon usage bi...
Inferring microbial gene function from evolution of synonymous codon usage bi...Inferring microbial gene function from evolution of synonymous codon usage bi...
Inferring microbial gene function from evolution of synonymous codon usage bi...
 
2015 10 21_pathology_wim_vancriekinge
2015 10 21_pathology_wim_vancriekinge2015 10 21_pathology_wim_vancriekinge
2015 10 21_pathology_wim_vancriekinge
 
20160218 hisham toma services
20160218 hisham toma services20160218 hisham toma services
20160218 hisham toma services
 
Effects of splicing mutations on NF2 transcripts
Effects of splicing mutations on NF2 transcriptsEffects of splicing mutations on NF2 transcripts
Effects of splicing mutations on NF2 transcripts
 
Detection of heterogeneous flt3 itd mutant variants in
Detection of heterogeneous flt3  itd mutant variants inDetection of heterogeneous flt3  itd mutant variants in
Detection of heterogeneous flt3 itd mutant variants in
 
journal.pone.0113435
journal.pone.0113435journal.pone.0113435
journal.pone.0113435
 
Journal club multiple myeloma
Journal club multiple myelomaJournal club multiple myeloma
Journal club multiple myeloma
 
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
 
Roles of circular rn as and their interactions with micro rnas in human disor...
Roles of circular rn as and their interactions with micro rnas in human disor...Roles of circular rn as and their interactions with micro rnas in human disor...
Roles of circular rn as and their interactions with micro rnas in human disor...
 

Similar to Verifying the role of AID in Chronic Lymphocytic Leukemia

Reference for long range pcr based ngs applications
Reference for long range pcr based ngs applicationsReference for long range pcr based ngs applications
Reference for long range pcr based ngs applicationsssuser1e2788
 
Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Ronak Shah
 
APPLICATION OF NEXT GENERATION SEQUENCING (NGS) IN CANCER TREATMENT
APPLICATION OF  NEXT GENERATION SEQUENCING (NGS)  IN CANCER TREATMENTAPPLICATION OF  NEXT GENERATION SEQUENCING (NGS)  IN CANCER TREATMENT
APPLICATION OF NEXT GENERATION SEQUENCING (NGS) IN CANCER TREATMENTDinie Fariz
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingJoaquin Dopazo
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingShelomi Karoon
 
AJP_12-0313_Araten_et_al_Word_Version
AJP_12-0313_Araten_et_al_Word_VersionAJP_12-0313_Araten_et_al_Word_Version
AJP_12-0313_Araten_et_al_Word_VersionJonathan Karten
 
Sequencing the circulating and infiltrating T-cell repertoire on the Ion S5TM
Sequencing the circulating and infiltrating T-cell repertoire on the Ion S5TMSequencing the circulating and infiltrating T-cell repertoire on the Ion S5TM
Sequencing the circulating and infiltrating T-cell repertoire on the Ion S5TMThermo Fisher Scientific
 
Tumor Mutational Load assessment of FFPE samples using an NGS based assay
Tumor Mutational Load assessment of FFPE samples using an NGS based assayTumor Mutational Load assessment of FFPE samples using an NGS based assay
Tumor Mutational Load assessment of FFPE samples using an NGS based assayThermo Fisher Scientific
 
A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicJoaquin Dopazo
 
Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
 Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art... Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...Healthcare and Medical Sciences
 
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Reid Robison
 
Poster_NHGRI_DahliaShvets.ver2
Poster_NHGRI_DahliaShvets.ver2Poster_NHGRI_DahliaShvets.ver2
Poster_NHGRI_DahliaShvets.ver2Dahlia Shvets
 
Systems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasetsSystems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasetsimprovemed
 
CNV and aneuploidy detection by Ion semiconductor sequencing
CNV and aneuploidy detection by Ion semiconductor sequencingCNV and aneuploidy detection by Ion semiconductor sequencing
CNV and aneuploidy detection by Ion semiconductor sequencingThermo Fisher Scientific
 
Bioinformatics-driven discovery of EGFR mutant Lung Cancer
Bioinformatics-driven discovery of EGFR mutant Lung CancerBioinformatics-driven discovery of EGFR mutant Lung Cancer
Bioinformatics-driven discovery of EGFR mutant Lung CancerPreveenRamamoorthy
 
Clinical investigational studies for validation of a next-generation sequenci...
Clinical investigational studies for validation of a next-generation sequenci...Clinical investigational studies for validation of a next-generation sequenci...
Clinical investigational studies for validation of a next-generation sequenci...Frank Ong, MD, CPI
 
Aug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsAug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsGenomeInABottle
 
Ong et al._Translational utility of next-generation sequencing_2013_Genomics
Ong et al._Translational utility of next-generation sequencing_2013_GenomicsOng et al._Translational utility of next-generation sequencing_2013_Genomics
Ong et al._Translational utility of next-generation sequencing_2013_GenomicsFrank Ong, MD, CPI
 

Similar to Verifying the role of AID in Chronic Lymphocytic Leukemia (20)

Reference for long range pcr based ngs applications
Reference for long range pcr based ngs applicationsReference for long range pcr based ngs applications
Reference for long range pcr based ngs applications
 
Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...
 
APPLICATION OF NEXT GENERATION SEQUENCING (NGS) IN CANCER TREATMENT
APPLICATION OF  NEXT GENERATION SEQUENCING (NGS)  IN CANCER TREATMENTAPPLICATION OF  NEXT GENERATION SEQUENCING (NGS)  IN CANCER TREATMENT
APPLICATION OF NEXT GENERATION SEQUENCING (NGS) IN CANCER TREATMENT
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene finding
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
antiviral coursework
antiviral courseworkantiviral coursework
antiviral coursework
 
AJP_12-0313_Araten_et_al_Word_Version
AJP_12-0313_Araten_et_al_Word_VersionAJP_12-0313_Araten_et_al_Word_Version
AJP_12-0313_Araten_et_al_Word_Version
 
JoB spike in manuscript 2014
JoB spike in manuscript 2014JoB spike in manuscript 2014
JoB spike in manuscript 2014
 
Sequencing the circulating and infiltrating T-cell repertoire on the Ion S5TM
Sequencing the circulating and infiltrating T-cell repertoire on the Ion S5TMSequencing the circulating and infiltrating T-cell repertoire on the Ion S5TM
Sequencing the circulating and infiltrating T-cell repertoire on the Ion S5TM
 
Tumor Mutational Load assessment of FFPE samples using an NGS based assay
Tumor Mutational Load assessment of FFPE samples using an NGS based assayTumor Mutational Load assessment of FFPE samples using an NGS based assay
Tumor Mutational Load assessment of FFPE samples using an NGS based assay
 
A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The Clinic
 
Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
 Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art... Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
 
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
 
Poster_NHGRI_DahliaShvets.ver2
Poster_NHGRI_DahliaShvets.ver2Poster_NHGRI_DahliaShvets.ver2
Poster_NHGRI_DahliaShvets.ver2
 
Systems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasetsSystems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasets
 
CNV and aneuploidy detection by Ion semiconductor sequencing
CNV and aneuploidy detection by Ion semiconductor sequencingCNV and aneuploidy detection by Ion semiconductor sequencing
CNV and aneuploidy detection by Ion semiconductor sequencing
 
Bioinformatics-driven discovery of EGFR mutant Lung Cancer
Bioinformatics-driven discovery of EGFR mutant Lung CancerBioinformatics-driven discovery of EGFR mutant Lung Cancer
Bioinformatics-driven discovery of EGFR mutant Lung Cancer
 
Clinical investigational studies for validation of a next-generation sequenci...
Clinical investigational studies for validation of a next-generation sequenci...Clinical investigational studies for validation of a next-generation sequenci...
Clinical investigational studies for validation of a next-generation sequenci...
 
Aug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsAug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigentics
 
Ong et al._Translational utility of next-generation sequencing_2013_Genomics
Ong et al._Translational utility of next-generation sequencing_2013_GenomicsOng et al._Translational utility of next-generation sequencing_2013_Genomics
Ong et al._Translational utility of next-generation sequencing_2013_Genomics
 

Verifying the role of AID in Chronic Lymphocytic Leukemia

  • 1. Verifying the role of activation-induced deaminase in chronic lymphocytic leukemia Charlotte Broadbent Ward Melville High School 380 Old Town Road East Setauket, NY 11733 1
  • 2. Abstract Chronic lymphocytic leukemia (CLL) is a relatively common disease amongst aging adults (Cheson, 2001). To successfully design a drug to combat this disease, more background on its mechanisms is vital. One aspect of CLL that remains unclear is the role of the enzyme activation-induced deaminase (AID). AID is normally involved in somatic hypermutation, a mechanism B-cells use to differentiate their variable region to combat pathogens. AID has already been associated with CLL, but the sequencing method used, known as pyrosequencing, is not accurate enough to detect all of the variable region clones when analyzing specific sequences such as AID hotspots (Ansorge, 2009). The goal of this research was to verify the role of AID using a statistical method known as bootstrapping, or resampling, so these inaccuracies would not affect the total distribution of variable region clones. Two parameters were measured to test for AID activity – mutations in the variable region compared to the constant region, and mutations at AID hotspots compared to total mutations. Results from both tests were statistically significant, verifying the role of AID in chronic lymphocytic leukemia. 2
  • 3. Introduction B-cell chronic lymphocytic leukemia (B-CLL) is the most prevalent type of leukemia among aging Caucasians (Cheson, 2001), and is a disease which is currently incurable. Patients with B-CLL follow one of two clinical outcomes: an indolent outcome, in which the median survival age of patients is greater than 25 years, and an aggressive outcome, in which those afflicted decline relatively rapidly, with a median survival age of less than 8 years (Chiorazzi et al., 2005). The ineffectiveness of treatment in prolonging lifespan (Rai et al., 2000) has led many physicians to delay aggressive treatment until the nature of the disease is evident. However, the two clinical outcomes have been associated with a biological difference – those patients whose immunoglobulin variable-region heavy chain (IgVH) is relatively free of mutations follow the more aggressive outcome, while patients whose IgVH region contains considerable mutations follow the more indolent outcome (Fais et al., 1998). One of the primary defenses of the immune system is the diverse repertoire of B cells that identify pathogens in the body. Unique antigen receptors on the surface of the B cells bind to foreign antigens, after which the B cell divides and produces millions of antibodies to destroy the pathogen. These antigen receptors are unique due to the variation in the amino acid sequence at the antigen-binding site, which consists of a variable (V) region and a constant (C) region. The distribution of V regions in immunoglobulin genes will always have a dominant clone, known as the consensus sequence. This clone accounts for approximately 95% of the V region clones. However, for this process to work, the unique variable region on each antigen receptor must have an efficient method of differentiating. Somatic hypermutation is the process which diversifies B cells so they can recognize 3
  • 4. threats to the immune system. The enzyme AID (activation-induced deaminase) induces point mutations, usually at locations in the variable region of the DNA strand known as WRC hotspots, where W represents A or T, R represents G or A, and C is always cytosine, or its inverse, GYW hotspots, where G is always guanine and Y represents a pyrimidine. AID tends to avoid SYC cold spots, where S represents G or C. Somatic hypermutation is responsible for many of the mutations in immunoglobulin genes – this research seeks to verify, using statistical methods, whether the mutations in cells of patients with the indolent form of B-CLL are in fact caused by AID activity, since it is already known that AID is associated with CLL (Patten et al., 2012). The DNA of leukemic and other cells has been sequenced since 1977 primarily via the chain termination sequencing method, also known as the Sanger method (Sanger et al., 1977). This method has several advantages: it can sequence strands up to five hundred base pairs in length with fairly high accuracy. However, the Sanger method also has limitations: there are few opportunities for parallel sequencing, making sequencing of large quantities of DNA very difficult. Also, sample preparation can take up to four hours (Fakruddin et al., 2012). These limitations are addressed by a relatively new method known as pyrosequencing. Pyrosequencing has many distinct advantages over the Sanger method. Even though it can only process strands around four hundred base pairs in length, it can sequence many strands in parallel, making it more efficient for much larger quantities of DNA. It eliminates the need to label the primers, which are the synthetically prepared starting points for the DNA polymerase. Pyrosequencing is also easy to automate, the cost is lower because of the smaller strands, and preparation time is as little as fifteen minutes (Fakruddin et al., 2012). In this project, each sequence was a different B cell's variable region. Pyrosequencing was therefore the logical 4
  • 5. choice, due to the immense quantity of sequences that were each relatively short in length. However, the pyrosequencing method is much more error prone than the Sanger method, resulting in many inaccuracies throughout the sequences (Ansorge, 2009). Another problem with this technique is the number of sequences changes during the course of the experiment. After AID is stimulated and the results are analyzed, there is often a discrepancy between how many sequences were originally counted and how many remain. This is attributed to the death of the cells containing the DNA during the course of the experiment. There is therefore uncertainty in the exact number and type of variable region clones after AID stimulation. These problems can be solved by applying a statistical technique known as resampling, or bootstrapping. The term “bootstrap” was first used by Efron in 1979 (Efron,1979). In particular, he described the nonparametric bootstrap, or resampling with replacement of size n from the original sample taken of size n (Geyer, 2006). Therefore, when a sample is resampled thousands of times, a more representative distribution of possible samples can be obtained regardless of the shape of the distribution. In this case, this technique allows for a more representative distribution of immunoglobulin mutations, allowing confirmation of AID activity in leukemic cells. Such a confirmation would be instrumental in aiding drug design for CLL. Methods Sequence data was obtained from collaborators at Northshore LIJ. DNA from patients with chronic lymphocytic leukemia was sequenced using Roche 454/GC FLX pyrosequencing technology, both before and after AID stimulation. Here stimulation means that different experiments were performed to create an AID expression environment. The sequencing was 5
  • 6. performed for different amounts of time for different patients, lasting up to ninety-seven days. Each day, the DNA was sequenced once from the 5'-end of the chain of bases and once from the 3'-end. The first 20 nucleotides from both ends were removed to eliminate the primer. The data from the 5'-end sequences were considered to be the variable region of the immunoglobulin gene, while the first 103 nucleotides from the 3'-end sequence were considered to be the constant region of the immunoglobulin heavy chains. All analysis was performed using the statistical program R. In addition, the program functions Biostrings, ape, lattice, and ShortRead were utilized. The data was first resampled using the bootstrap technique. A function was created to identify the unique sequences of variable regions in each patient. The output of this function, readUniqueSequences, was four different parameters: the unique sequences (vUniqueSeqs), the associated number of times each of those sequences appeared (vCounts), the consensus sequence (consensusSequence), and the length of the consensus sequence (blockWidth). These values were then used in the bootstrap function, getBootstrapCount. The R sample function was used, taking a sample of size 5000 from the values 1 through the length of vUniqueSeqs, with replacement, using vCounts as the probability distribution, so that the output was a set of 5000 integers ranging from 1 to the length of vUniqueSeqs. The unique values were then extracted using the R function unique, so that repeats were eliminated. Finally, the numbers that remained corresponded with one of the vUniqueSeqs, so the final variable calculated, bootstrapUniqueSequences, was a list of less than 5000 unique sequences that was representative of the original set of sequences. To confirm AID activity in patients with CLL, the mutation on the variable region 6
  • 7. (IGHV) was compared to the constant region (IGHC), since if the mutations were caused by somatic hypermutation, there would be no mutations (or low frequency) on the constant region compared to the variable region. Only mutations occurring at C:G were considered, since only these would be indicative of AID activity. A function to determine the number of GC mutations compared to the consensus sequence was made, compareMutationsGCsites. The sequence was split into individual characters using the R function strsplit and assigned to the variable seq_split. To ensure the sequence and the consensus sequence were the same length, whichever had the smaller length was taken using the R function min. The variable GCsitesIndex was created, which contained only those values in the consensus sequence which were G or C. Then, to compare the two sequences, a for loop was created for i in 1:length(GCsitesIndex). For every number, 1 through the length of GCsitesIndex, the corresponding element was assigned to the variable Pos. The variable consensusNt was created to represent the values of Pos in the consensus sequence. Meanwhile, the variable seqNt was created to represent the values of Pos in seq_split. The if function was then used to determine if any element in consensusNt and seqNt were equal, to add another number to the output, mutationNum. This value therefore gives the total number of GC mutations in one sequence compared to the consensus sequence. Another function, countMutationGCsites, was created to count the number of GC mutations for all of the sequences in vUniqueSeqs, instead of just one. A for loop was created for k in 1:length(vUniqueSeqs) which calculated the function compareMutationsGCsites for every sequence in vUniqueSeqs and compiled all of the mutationNum values for each sequence into one vector. The data from the variable region and constant region were then compared using this 7
  • 8. function after having been resampled. The values were then tested for statistical significance using a two sample t test. Another way in which AID activity was measured was by counting the mutations occurring at WRC/GYW hotspots compared to all of the mutations, since it is known that AID preferentially targets such sites. To test the hypothesis whether mutations in the variable region were caused by somatic hypermutation, it was verified that the mutations occurring at WRC/GYW hotspots were significant compared to all other mutations. A function counttotalMutations was created to count all of the mutations in the resampled sequences. The R function consensusMatrix was used to compare the consensus sequence with the unique bootstrapped sequences, and all of the mutations were counted. Then, to count the mutations occuring at hotspots, a function countHotSpotPosition was created to locate the hotspots in the consensus sequence. The R function matchPattern was used to locate positions on the sequence that matched the criteria for WRC/GYW hotspots. Then, another function, countHotSpotMutations, was created to count all of the hotspot mutations that occured in the bootstrapped sequences realtive to the consensus sequence. The R function consensusMatrix was used to compare the hotspot positions of the consensus sequence (obtained from the function countHotSpotPosition) with the bootstrapped sequences. The output of the function was the total number of hotspot mutations within the resampled sequences relative to the consensus sequence. The ratio of the resulting values, the total number of mutations and the total number of hotspot mutations, was then calculated. This ratio was tested for significance against the null hypothesis of one using a one sample t test. Finally, a third test was performed to compare the number of mutations at SRC coldspots 8
  • 9. to all of the mutations in the variable region. The same procedure was used for this test as the previous one, except the pattern used in matchPattern was the coldspot sequence rather than the hotspot sequence. Also, instead of being tested for being significantly higher, the coldspot data set was tested to see if it was significantly lower than the null hypothesis of one. 9
  • 10. Results Figure 1 shows the ratio between variable region mutations and constant region mutations, so if there was no AID activity, the ratio between the two would be one. If AID did cause mutations in the variable region, the ratio would be higher than one. The test to determine whether there were more mutations in the variable region compared to the constant region produced many statistically significant results. Assuming a significance level of .05, eight of the eleven samples of DNA rejected the null hypothesis that there was no difference in number of mutations between the variable region and constant region. Figure 2 shows the ratio between hotspot mutations and total mutations in the variable region, so if there was no AID activity, the ratio between the two would be one. If AID was active in the variable region, the ratio would be higher than one. The test to determine the prevalence of mutations at AID hotspots versus all mutations also produced many statistically significant results. Assuming a significance level of .05, six of the eleven samples of DNA rejected the null hypothesis that there were was no targeting of WRC/GYW hotspots in the immunoglobulin genes. Figure 3 shows the ratio between coldspot mutations and total mutations in the variable region, so if there was no AID activity, the ratio between the two would be one. If AID was active in the variable region, the ratio would be lower than one, since AID tends to avoid such coldspots. The test to determine the prevalence of mutations at AID coldspots versus all mutations did not show as many statistically significant results as did the other two tests. Assuming a significance level of .05, only four of the eleven samples of DNA rejected the null hypothesis that there was no avoidance of SYC coldspots in the immunoglobulin genes. 10
  • 11. Figure 1. Histogram of mutations occurring at variable region over constant region. The red line is the expected value 1. The more area on the right of the read line, the more likely the mutation is caused by somatic hypermutation. 11
  • 12. Figure 2. Histogram of AID hotspot mutation over random mutation. Shows the ratio between mutation occurring at AID hotspot(WRC/GYW) and total mutation for each new clone. The red vertical line is the expected value 1, since if there is no bias for AID targeting, the expectation should be 1. The more area on the right of the read line, the more likely the mutation is targeted by AID. 12
  • 13. Figure 3. Histogram of AID coldspot mutation over random mutation. Shows the ratio between mutations occurring at AID coldspots and total mutations for each new clone. The red vertical line is the expected value 1, since if there is no bias for AID targeting, the expectation should be 1. The more area on the left of the red line, the more likely the mutation is targeted by AID. 13
  • 14. Discussion The results of this study confirmed the activity of AID in patients with chronic lymphocytic leukemia, linking the mutations that occur in the variable regions of these patients with somatic hypermutation. Of the three major parameters tested, two returned promising results and the other was still returned a relatively high ratio of significance. Both the comparison of mutations in the variable region versus constant region and the comparison of mutations occurring at AID hotspots versus all mutations in the variable region showed evidence of AID activity. Although the test for lack of mutations at AID coldspots versus all mutations in the variable region did not show results that were as promising as the other tests, several examples did show significance and a few more were only just above the significance threshold. These findings are consistent with studies that confirm the presence of AID in IGHV mutated and unmutated CLL cells. Patten et al showed that both mutated and unmutated CLL cells were able to produce AID mRNA protein, confirming the presence of AID in these cells. It was therefore expected that AID would perform some function in the variable region of these cells, although no distinction between mutated and unmutated cells was made. These findings are pivotal in advancing our knowledge of CLL. Being able to confirm the activity of AID in activated CLL cells enhances our understanding of the disease, which is necessary if any progress is to be made with curing CLL. In addition, these findings help to explain the differences between the two types of CLL – the indolent form and aggressive form. Since the aggressive form has few mutations in the variable region, our results suggest the aggressive form is perhaps due to some lack of function in the somatic hypermutation process, which prevents the B-cells from mutating and providing some form of defense for the immune 14
  • 15. system. One aspect of our findings that was not what we had expected was the low significance in the results of the test for lack of AID activity at AID coldspots. Although several of the results were significant, less than half were and several had p-values higher than 0.5. There are several possible explanations for this result. The first, and most obvious, would be experimental error. However, since the results of the other two tests did not show any similar lack of significance, this explanation, although possible, cannot solely explain this occurrence. Another possible explanation for the results of the AID coldspot test would be that AID does not actually avoid coldspots as much as it targets hotspots. In other words, even though AID does avoid these specific DNA sequences, they cannot serve as thorough an indicator of AID activity as do AID hotspots. Although there are no indications in the literature to suggest this possibility, it could explain the results. Although this study was important in furthering our knowledge of AID's role in CLL, the possibilities of future projects are abundant. One such project could seek to further examine the difference between mutated and unmutated CLL. It is now apparent that AID is active in mutated CLL cells. However, the reason why this occurs is still ambiguous. If we were to know why AID functions in the indolent form of CLL and doesn't in the aggressive form of CLL, we could try to somehow change some aspect of the aggressive form to become the indolent form, if not treat the disease entirely. Since patients with the indolent form often die with the disease and not from it, this could save countless lives. Another study is to follow up on why the AID coldspot test did not return as significant results as expected. More of the same test could be performed to confirm the results; if the results 15
  • 16. are consistent, a study could be designed to compare the relative targeting of hotspots versus coldspots. A statistical analysis of hotspot/coldspot mutation frequency could reveal whether or not there is a difference between the two. This information would advance our knowledge of AID and how it targets the variable regions in immunoglobulin genes. The goal of our research was to verify the role of AID in patients with CLL. Based on the results of a study by Patten et al, which confirmed the presence of AID mRNA protein in mutated and unmutated CLL cells, we believed AID would be proven to cause the mutations in the mutated CLL cells. Our results confirmed this, as many of the samples showed significant results. The tests for mutations in the variable region versus mutations in the constant region and AID hotspot mutations versus all mutations were promising, and even though the test for coldspot mutation versus total mutation did not show as significant results as were expected, a few samples did show significance. This slight discrepancy should be further looked into. This information will aid future investigations that seek to design treatment for CLL. 16
  • 17. References Ansorge, Wilhelm J. (2009). Next-generation DNA Sequencing Techniques. New Biotechnology, 25 (4), n. pag. Cheson, B.D. (2001). The chronic lymphocytic leukemias. The Annals of Oncology, 13 (12), 1957 – 1957-a. Chiorazzi, Nicholas, Katerina Hatzi, and Emilia Albesiano. (2005). B-Cell Chronic Lymphocytic Leukemia, a Clonal Disease of B Lymphocytes with Receptors That Vary in Specificity for (Auto)antigens. New York Academy of Sciences, 1062, 1-12. Efron, B. (1979). Bootstrap methods: another look at the jackknife. Annals of Statistics, 7, 1-26. Fais, F. et al. (1998). Chronic lymphocytic leukemia B cells express restricted sets of mutated and unmutated antigen receptors. The Journal of Clinical Investigation, 102, 1515 – 1525. Fakruddin, M.D. et al. (2012). Pyrosequencing - Principles and Applications. International Journal of Life Science and Pharma Research, 2 (2), n. pag. Geyer, Charles J. (2006). 5601 Notes: The Subsampling Bootstrap. Patten, P.E. et al. (2012). IGHV-unmutated and IGHV-mutated chronic lymphocytic leukemia cells produce activation-induced deaminase protein with a full range of biologic functions. Blood. 120 (24), 4802–4811. Rai, K.R. et al. (2000). Fludarabine compared with chlorambucil as primary therapy for chronic lymphocytic leukemia. The New England Journal of Medicine, 343, 1750 – 1757. Sanger F., Nicklen S. and Coulson A.R. (1977). DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences of the United States of 17
  • 18. America, 74, 5463- 5467. 18