In this research paper from the Spring 2015 semester, I described my analysis of certain genome scaffolds, or gaps within the Malaclemys terrapin genome. I examined seven of these scaffolds and determined their approximate sizes through Polymerase Chain Reaction (PCR) and Gel Electrophoresis. The DNA was then prepped to be sent for sequencing by an external source. The resulting chromatograms gave inconclusive results on the exact sequences of these scaffolds.
1. Suraj Jaladanki
FIRE 155
Sequencing of the Diamondback Terrapin’s Genome and its Future Applications
Introduction
There are a multitude of attributes which are exclusive to turtles, making them a unique
group to study. For one, the dorsal portion of chelonian shells, or the carapace, is formed through
vertebrae and ribs (Wang, 2013). Turtles were phylogenetically shown to be morphologically
conservative, with their distinct body formation dating back at least 210 million yeas (Shaffer,
2013). In addition, testudines have extremely long lifespans, many of which are reproductively
active at later stages in life (Shaffer, 2013). Other significant features of turtles are their abilities
to survive in both anoxic and freezing conditions, while encountering minimal tissue damage
(Shaffer, 2013).Wang et al. found that turtle-specific pattern formation occurred to create the
shell (Wang, 2013). Shaffer et al. studied the genome of the western painted turtle (Chrsemys
picta bellii), and found potential genes involved in the species’ capacities to resist extreme
anoxia and freezing temperatures (Shaffer, 2013).
The diamondback terrapin (Malaclemys terrapin) is another unique chelonian species
which has not yet been thoroughly studied. It is believed to be one of the few turtles in the world
that lives solely in brackish water, estuaries, and lagoons, and examining these attributes in
terrapins could explain the genetic basis for terrapin’s capacity to live in this environment
(Basic). However, a major impediment limiting the progress of terrapin genomic studies is that
the terrapin genome has not been completely sequenced. In testudine studies, genomic analysis is
a vital component in research, as it provides scientists a means of comparison for examining the
similarities and differences of two species at a genetic level, known as comparative genomics
(Genome). Although genes account for less than 25% of the DNA in the genome, understanding
2. the full sequence aids in determining the roles of non-coding nucleotide regions (Genome). After
the human genome was completely sequenced after thirteen years of work, scientists discovered
that human have approximately 20,000-25,000 genes as opposed to the predicted 100,000 genes,
an interesting result suggesting that a majority of the genome does not directly contribute to
genes (Silverman).
This study details this group’s efforts to sequence the terrapin genome in order to be used
as a launching point for future terrapin analyses. The terrapin genome has been sequenced;
however, the programs used in sequencing are unable to produce a completed genome due to
gaps. Since the latest technology has not been fully successfully in filling all gaps in the terrapin
genome, the team reverts to older methods of PCR to close gaps in the terrapin genome.
Sequencing the diamondback terrapin’s genome in particular will assist in determining how large
the terrapin genome is (in base pairs) and examining the evolutionary history between terrapins
and other chelonian species. Knowledge will be added to the work of testudine studies, leading
to a more comprehensive understanding of turtle species, aiding scientists in using turtle
properties and mechanisms to heal human ailments including cardiac infarctions and cerebral
strokes.
Materials and methods:
Sequencing and Primer Creation
One adult female M. terrapin (Diamondback terrapin) was found on Poplar Island, 2
miles off the coast of Sherwood, Maryland, and was collected along with her 15 eggs. DNA and
RNA were isolated from all organismal tissues, including blood. Through a sequencing program,
the majority of the terrapin’s DNA scaffolds were reassembled. The unknown nucleotide regions
3. were then analyzed to determine if the gaps were between 100-800bp as this is the optimal size
of a DNA strand that can be visualized on a gel.
A portion of the genetic data, specifically the terrapin DNA isolated from blood, was
analyzed in this study. First, the scaffolds were compared with GenBank sequences to note if
similarities existed between the terrapin genome sequence and GenBank non-redundant
sequences, and this was accomplished through the Nucleotide Blast program which is a part of
the NCBI BLAST. After these similarities were noted, the scaffolds were utilized to design
specific PCR Primers. Utilizing the program NCBI Primer-Blast (Primer 3), primers were
designed to fill the gap, with the primer ranges based on the length of the un-sequenced gaps to
ensure that the entire gap would be copied. The primers, forward and reverse, were then ordered
for each scaffold from Integrated DNA Technologies (IDT).
Once the PCR primers were attained, a 100uM stock was created. This was accomplished
by spinning the tubes, and adding the amount of TE buffer (pH 7.4, Fluka) needed to make
100uM stock. Several tubes of working 10x stock were created through diluting 1:10 in TE
buffer. The 100 uM stock and 10x stocks were placed in a -20o
C freezer until further needed.
PCR Protocol
PCR was implemented to amplify the amount of scaffolds to be sequenced at a later time.
Eight labeled DNAse free PCR Eppendorf tubes (0.2mL) were used to amplify three separate
scaffolds in the first three tubes. The fourth tube acted as a positive control, and tubes 5-8 served
as negative controls for the three sets of PCR Primers (10x) in tubes 5-7 and the positive control
in tube 8. The reaction volume for each tube was 50uL. PCR mixes contained a concentration of
1x GoTaq Green (Promega, Cat No. M7122, Lot# 0000140255), forward and reverse primers
4. (1.ouM) and genomic terrapin DNA (1-2 ng/uL, isolated from blood), and nuclease-free water
(Ambion).
These tubes were added to the thermo-cycler (BioRad iCycler, V.3.021) which was
already pre-heated to 95o
C. PCR was performed under standard conditions (95o
C for 2 minutes,
35 cycles at 95o
C for 15 seconds, 55o
C for 30 seconds, and 72o
C for 60 seconds followed by a
final elongation at 72o
C). Upon completion, the temperature remained at a constant 4o
C. PCR
was optimized by increasing the annealing temperature to 57o
C instead of 55o
C, with all other
conditions remaining the same. After the tubes were retrieved, they were stored at 4o
C until
further needed.
Gel Electrophoresis Protocol
The electrophoresis procedure was utilized after PCR to visualize the size of the scaffold
through comparison with a 1KB DNA Ladder. The PCR products were subjected to agarose gel
electrophoresis (1% in 1x TAE, BioRad Gel Electrophoresis Unit Sub Cell GT, 90 V/cm.) and
staining with ethidium bromide. A 1 kb DNA Ladder was created through mixing 6uL of a 1KB
DNA ladder (New England Biolabs, Cat#N232L, Lot#1251411), 14uLTE buffer, and 5uL gel
loading dye (6x, NEB) to estimate PCR product sizes.
The gel was run for 45 minutes at 90V and at 400mA. DNA bands were visualized
(EpiChem 3 Darkroom) and photographed. These results were then compared to the size
predicted by the sequencing program. After initial outcomes, the protocol was modified to create
a 2% gel which contained 0.90g of agarose for the purpose of forming higher resolution gels, and
the remaining procedures were followed in the manner stated previously.
DNA Purification Protocol
5. Once the sizes of the scaffolds were confirmed, the DNA attained in PCR was needed to
be purified to be sent for sequencing. This DNA was purified using Qiagen Quickspin columns
according to the manufacturer’s recommendations. Buffer PE served as the wash buffer, and
Buffer PB acted as the binding buffer. This solution in a QIAquick column was then washed
with Buffer PE added to the QIAquick column. DNA was eluted with elution buffer (10 mM
Tris-Cl, pH 8.5) to the QIAquick membrane. Results of DNA purification were visualized by
running a gel (2% in 1x TAE, 90 V/cm) with 10uL of purified DNA and 2uL of loading dye, to
estimate DNA concentration, with the remaining purified DNA stored at 4o
C. Once the sample
was calculated to have a DNA concentration between 5-40 ng/uL, it would be used in
sequencing. If the concentration was above the limit, it could be diluted to lower the DNA
concentration to an adequate level.
Sequencing of Scaffolds
Once the scaffolds were confirmed to have the same size both before and after
purification through analysis of gel electrophoresis results, the purified DNA was sent to the
sequencing company, Macrogen.
6. Results
Figure I
(i) PCR Products of 1% Gel Electrophoresis Results at 55o
C annealing temperature
(ii) PCR Products are arranged in the gel as follows: (Lane 1) scaffold00227, (Lane 2)
scaffold00228, (Lane 3) scaffold00041, (Lane 4) positive control H gene: rp, (Lane 5)
negative control scaffold00227 w/out Terrapin DNA, (Lane 6) negative control
scaffold00228 w/out Terrapin DNA, (Lane 7) negative control scaffold00041 w/out
Terrapin DNA, (Lane 8) negative control of H gene w/out Terrapin DNA, (Lane 9)
1KB Ladder
Figure II
(i) PCR Products of 1% Gel Electrophoresis Results at 57o
C annealing temperature
7. (ii) PCR Products are arranged in the gel as follows: (Lane 9) 1KB Ladder, (Lane 10)
scaffold00227, (Lane 12) scaffold00041, (Lane 13) positive control H gene: rp, (Lane
14) negative control scaffold00227 w/out Terrapin DNA, (Lane 15) negative control
scaffold00041 w/out Terrapin DNA, (Lane 16) negative control of H gene w/out
Terrapin DNA
Figure III
(i) 2% Gel Electrophoresis of the purified PCR products of the above PCR products
scaffold00227 and scaffold 00041
(ii) The Gel is arranged as follows: (Lane 9) 1KB Ladder, (Lane 10) scaffold00227,
(Lane 11) scaffold00041,
Figure IV
8. (i) PCR Products of 2% Gel Electrophoresis Results at 55o
C annealing temperature
(ii) PCR Products are arranged in the gel as follows: (Lane 9) 1KB Ladder, (Lane 10)
scaffold00219, (Lane 11) scaffold00225, (Lane 12) scaffold00229, (Lane 13)
scaffold00228, (Lane 14) positive control D gene: rp, (Lane 15) negative control
scaffold00219 w/out Terrapin DNA, (Lane 16) negative control scaffold00225 w/out
Terrapin DNA, (Lane 17) negative control scaffold00229 w/out Terrapin DNA, (Lane
18) negative control scaffold00228 w/out Terrapin DNA, (Lane 19) negative control
of D gene w/out Terrapin DNA
Scaffold ID Expected Length
(Computer-
determined Gap
Length)
Observed Length
on First Gel
Observed Length
on Second Gel
Purified PCR
Product Lengths
Scaffold 00227 821bp Not seen <500bp <500bp
Scaffold 00228 841bp <500bp Not run Not run
Scaffold 00041 888bp <500bp <500bp <500bp
Positive Control
H (gene rp)
631bp <500bp 750bp Not run
Figure V
(i) Summary of Results from Sequencing First Three Scaffolds (scaffold 00227, scaffold
00228, scaffold00041). Scaffold 00228 was not attained from PCR due to loss of
product.
Scaffold ID Expected Length
(Computer-
determined Gap
Length)
Observed Length
on First Gel
Scaffold 00219 956bp <500bp
Scaffold 00225 930bp ~500bp
Scaffold 00229 937bp <500bp
Scaffold 00228 841bp <500bp
Positive Control D (gene rp) 723bp <750bp
Figure VI
(i) Summary of Results from Sequencing Second Set of Scaffolds and Previous Scaffold
(scaffold 00219, scaffold 00225, scaffold00229, scaffold 00228)
9. Figure VII
(i) Sequencing Results of Scaffold 00041 from Macrogen
Figure VIII
(i) Sequencing Results of Scaffold 00219 from Macrogen
10. Figure IX
(i) Sequencing Results of Scaffold 00225 from Macrogen
Discussion
For the first three scaffolds (scaffolds 00227, 00228, 00041), the results shown in Figure I
appeared significantly less than expected. This was known because the positive control H which
was expected to be 631bp appeared less than 500bp, and this was not seen in the results of other
researchers, indicating an issue only in this gel run. The lack of proper size placement of the
positive control invalidated the results of this run, and these lower than expected results most
likely appeared because of an operation through loading the incorrect samples in wells. A band
did not appear in Well 1 because the sample could have leaked out when it was loaded and other
forms of user error.
In the next gel analysis, PCR was performed at a 57o
C annealing temperature instead of the
original 55o
C temperature because it was possible that un-optimized annealing temperatures led
11. to the incorrect binding of primers, leading to smaller than expected sized bands appearing in the
gel electrophoresis. These results are seen in Figure II where all loaded samples appeared. The
PCR tube containing scaffold00228 popped during PCR leading to a major loss of sample and
inability to load it in the gel. These results could be used in the analysis because the positive
control H appeared closer to its predicted value of 631bp, appearing at 750bp.The change in
annealing temperature could have had an impact on the sizes of the bands seen in the gel, so this
annealing temperature was continued to be used in future runs of PCR. Scaffolds 00041 and
00027 still were shown to be less than 500bp compared to their respective predicted sizes of
888bp and 821bp respectively. The PI noted that the computer programs utilized to predict these
sizes could be inaccurate to a degree of several hundred base pairs, so the expected sizes should
not have much weight in future analysis of results seen in gels.
After DNA purification, the results were not seen in a 1% gel, and this could have occurred
as a result of the lower resolution of the 1% gel. As a result, a 2% gel was created and used to
run the purified samples of scaffolds 00227 and 00041. With this agarose concentration, faint
bands were visible for both scaffolds as seen in Figure III, and they again appeared lower than
500bp, closely matching their respective sizes in their pre-purified forms. Since both pre- and
post- purification results were similar, the concentrations of the DNA in the scaffolds were
determined by comparison to the DNA ladder. The concentrations for scaffolds 00227 and 00041
were 7.5ng/uL and 15ng/uL respectively. This was well within the range of 5-40ng/uL as
requested by the DNA sequencing company, so the tubes containing these scaffolds were sent for
sequencing. Since there was no internal primer for scaffold 00227, it could not be sequenced, so
only scaffold 00041 was able to be sequenced.
12. With the next set of primers from IDT that were created through Primer-BLAST, the DNA
was amplified through PCR and run on the gel, as seen in Figure IV. The new scaffolds 00219,
00225, 00229 were used in PCR. In addition, scaffold 00228, which was unable to be purified
with the first set of samples, was included with this set. All bands appeared less than 500bp, but
the positive control D which had an established size of 723bp appeared at 750bp, validating the
results of the run. When comparing the predicted sizes to the observed sizes, there were
differences of at least 300bp, and the expected results were pointed out earlier to be inaccurate.
All lanes had single bands except for lane containing scaffold 00229 which had a double band.
This double band is possible evidence of a primer dimer, which meant that this sample couldn’t
be purified because there were multiple segments containing this sequence. Thus, all samples
other than scaffold00229 were purified to be compared with the results seen in Figure IV.
The purified samples of scaffolds 00219, 00225, and 00228 were visualized in Figure V.
The bands seen were close to their respective bands seen in the unpurified DNA samples in
Figure IV, with all bands appearing less than the 500bp band, but in close proximity to it. Since
the results in Figure V more or less matched the data in Figure IV, the concentrations of DNA in
the samples were calculated and were found to all be 50ng/uL, for scaffolds 00219, 00225, and
00228. Although the concentrations were over the sequencing company’s range of 5-40ng/uL, it
was found that using the brightness of the bands to calculate DNA concentration could lead to
slightly inaccurate results, and a concentration marginally over the recommended limit was
sufficient to be used in sequencing.
The results of the sequenced scaffold 00041 returned and are seen in Figure VII. The
chromatogram showed numerous uneven spacing between the bands and many mixtures of
colors at one band location, indicating unusable results. The QV scores also confirmed that the
13. data for scaffold00041 could not be used in the terrapin genome’s sequencing process because
there were only 5 QV scores which were >= 16 and 4 QV scores which were >= 20, which
demonstrates low confidence in the created sequence. The poor results seen with scaffold 00041
could have arisen due to a significant amount of noise being present, mis-called nucleotides, and
double peaks arising from single nucleotide polymorphisms (SNP’s). Collectively, these factors
lead to this scaffold unable to be sequenced with this particular set of primers, and this scaffold
will have to be approached in the future with a different set of primers in the hopes of attaining
more promising results in the sequencing process. The three other scaffolds (00219, 00225, and
00228) contain internal primers, so they were sent to the sequencing company.
The sequence for scaffold 00225 was returned from Macrogen and is seen in Figure VIII.
Similar to the results seen in Figure VII, there were numerous mis-spaced nucleotides and a
moderate level of noise throughout the chromatogram. Although the QV scores were not able to
be found, they most likely contain low numbers for bases which were >=16 and >=20 due to the
level of noise and mis-spaced nucleotides present. These features of the chromatogram lead this
scaffold to be unable for sequencing using this set of primers.
Also, the sequence for scaffold 00229 was returned from Macrogen and is seen in Figure
IX. Analogous to the results seen in other chromatograms (Figures VII and VIII), numerous mis-
spaced nucleotides and a moderate level of noise were present throughout the chromatogram.
The QV scores were not able to be located, but they most likely provide low values above the 16
and 20 thresholds, given the noise and mis-spaced nucleotides. These features of the
chromatogram lead this scaffold to be unable for sequencing using this set of primers.
14. References:
Basic Facts About Diamondback Terrapin. Defenders of Wildlife. Retrieved from
http://www.defenders.org/diamondback-terrapin/basic-facts
(2003, January 15). GENOME SEQUENCING. Genome News Network. Retrieved from
http://www.genomenewsnetwork.org/resources/whats_a_genome/Chp2_1.shtml.
Rogers, Y.C., Munk, A.C., Meincke, L.J., Han, C.S. (2005). Closing bacterial genomic sequence
gaps with adaptor-PCR. BioTechniques, Volume 39 (1), pp. 31-34.
Shaffer B., et al. (2013). The western painted turtle genome, a model for the evolution of
extreme physiological adaptations in a slowly evolving lineage. Genome Biology,14:R28.
Silverman, J. What have we learned from the Human Genome Project? Retrieved from
http://science.howstuffworks.com/life/genetic/human-genome-project-results1.htm
Wang, Z., Pascual-Anaya, J., Zadissa, A., Li, W., Niimura, Y., Huang, Z., Li, C., White, S.,
Xiong, Z., Fang, D., Wang, B., Ming, Y., Chen, Y., Zheng, Y., Kuraku, S., Pignatelli, M.,
Herrero, L., Beal, K., Nozawa, M., Li, Q., Wang, J., Zhang, H., Yu, L., Shigenobu, S., Wang, J.,
et al. (2013). The draft genomes of soft-shell turtle and green sea turtle yield insights into the
development and evolution of the turtle-specific body plan. Nature Genetics, Volume (45), pp.
701-706.