Human identification from DNA is typically based
on 13 short-tandem repeat (STR) alleles. Commercial kits used in forensic casework rely on the detection of these alleles in DNA samples acquired from an individual. However, the process itself is slow (it can take up to 2 days when conducting a laboratory analysis or 1 hour when using Rapid DNA systems) and has been designed to operate on pristine DNA samples. The need for
achieving fast and accurate DNA processing has spurred efforts in developing portable systems that can reduce the processing time to less than 1 hour. But such systems are expected to operate on degraded DNA samples due to the architecture and process used by the instrument. Consequently, detecting the alleles in such degraded DNA samples can be a challenging problem. In this paper, we present an algorithm to detected allelic peaks from degraded DNA signals based on an adaptive signal processing scheme.
Student Profile Sample - We help schools to connect the data they have, with ...
Detecting STR Peaks in Degraded DNA samples
1. 4th International Conference on Bioinformatics and Computational Biology (BICoB)
Detecting STR peaks in degraded
DNA samples
Emanuela Marasco, Arun Ross, Jeremy Dawson, Tina Moroose, Tanya Ambrose
Lane Department of Computer Science and Electrical Engineering
&
Forensic and Investigative Science Program
March 13, 2012 - Las Vegas, Nevada, USA
Thanks to
Prof. Tom O’Haver
1
3. DNA typing
• Technologies used for
performing DNA analysis
differ in their ability to
differentiate two
individuals for 2 aspects:
• The speed
• The sensitivity
• STR typing offers the
best trade-off
• Our contribution: improve the sensitivity of the DNA
processing methodologies in the presence of degraded samples
J. Butler, Forensic DNA typing, Biology, Technology, and Genetic of STR markers, Second Edition
3
4. The human genome
• The human genome contained
in every cell consists of 23
pairs chromosomes
• Chromosomes which are dense
packets of DNA and proteins
• Four bases make up DNA:
Adenine (A), Guanine (G),
Cytosine (C) and Thymine (T)
• Most human identity testing is performed using markers on
autosomal chromosomes
• A DNA marker or locus refers to the chromosomal position
J. Butler, Forensic DNA typing, Biology, Technology, and Genetic of STR markers, Second Edition
5. Alleles
• The 99.7% of our DNA molecules is the
same between people; a small fraction of
0.3% makes us unique individuals
• DNA variation is exhibited in the form of
different alleles; an allele is a variant of the
DNA sequence or length at a given locus
Sequence
polymorphism
Length
polymorphism
5
6. What is a Short Tandem Repeat (STR)?
ATCTTCTAACACATGACCGATCATGCATGCATGCATGCATGC
ATGCATGCATGCATGCATGCATGTTCCATGATAGCACAT
• An STR is the repetitive section of the sequence
(2-5 base pair)
• STRs are short and fast to be processed
(all together at a time)
• Human identification from DNA is
typically based on 13 core loci
(american system)
• They are very discriminative
among individuals
• There is no overlapping among
loci
6
7. Steps in DNA sample processing
Biological perspective
DNA
extraction
DNA
quantitation
PCR
amplification
of multiple
STR markers
PCR
products
Technological perspective
Separation and
detection of PCR product
(STR alleles)
•
Sample
genotype
determination
DNA
profile
Amplified data are displayed as fluorescent peaks
7
8. DNA analysis via GeneMapperID
• the Internal Lane Standard (ILS)
is used to assign the size the peaks
Internal Lane Standard
• only peaks above an analytical threshold are
considered real data (lower peaks are noise)
• actual peaks are higher than a stochastic
threshold
8
9. Effects of degradation on STR peaks
•
•
Degradation process occurs
rapidly when samples are
exposed to the enviromental
conditions: UV light, humidity,
high temperature and bacterial
contamination
As samples age, DNA begins to
break down (or degrade)
Nondegraded
45 secs
• Degradation can reduce the
height of some peaks or making
them disappear entirely
• Peaks are shifted
http://www.bioforensics.com/articles/champion1/champion1.html
75 secs
9
10. The proposed signal processing scheme
• A DNA sample is represented by a DNA signal in which x-axis
indicates data point and y-axis the amplitude
• A degraded DNA sample is represented by a weak signal
confounded by several noise sources
Input:
• Let x= {xj} with j=1, …N
be the input DNA signal
• Let NC= {NCj} with
j=1, …N be the signal
due to the Negative
Control
• Let t= {tj} with j=1, …N
be the instances in which
the signals are sampled
Positive Control
(Reagents, DNA, ILS)
Negative Control
(Reagents, ILS)
10
11. Differentiation of signals
Peak-type signal
Derivative
• Given a peak-type
signal such as those
used in STR analysis,
the location of the
maximum can be
computed as location of
the zero-crossing points
in its first derivative
Peak-type signal
Derivative
• Given two peaks
having the same
height, the wider
peak results in a
lower amplitude in
the first derivative
11
12. Our proposed method
1. De-noise the DNA signal x:
PC-NC
Negative Control (NC) contains
only reagents and no DNA
2. Compute the first derivative
of the enhanced signal:
Diff(PC-NC)
False
Positive
12
13. Our proposed method
3. Smooth the derivative of the signal
each point in the signal is replaced with the average of
m adiacent points (m is the smooth width)
http://terpconnect.umd.edu/~toh/spectrum/PeakFindingandMeasurement.htm
13
14. Our proposed method
4. Peak detection
based on an amplitude threshold and a slope threshold
5. Estimate peaks details (location, height and width) on the
un-smoothed signal by using a curve fitting function:
a polynomial of degree 2 is fitted through the detected peaks
and
are mean and standard deviation of a set of
points in the vicinity of the detected peak
14
15. Dataset 1: ultraviolet degradation
• The first dataset was collected at the WVU Department of
Forensic and Investigative Sciences by performing a controlled
artificial DNA degradation using ultraviolet radiation
• Positive Control
• Degraded samples obtained after an exposure to UV of
35secs, 75secs, 150secs and 240secs
Pang B. C. M. and Cheung B. K. K. One-step generation of degraded DNA by UV irradiation.
Analytical Biochemistry 2007; 360: 163-165
15
16. Dataset 2: Low Copy Number (LCN) data
• Low Copy Number (LCN) are samples which contain
less than 100pg of DNA template
• One of the reason for LCN is degradation
• The second dataset was provided by NIST obtained
by varying cycle counts for the PCR processing step
•
•
Positive Control, 1 ng/ μL
Low Copy Number (LCN):
100pg/ μL, 30pg/ μL and 10pg/ μL
at 28 cycles and 31 cycles
16
17. Results on degraded data with UV
GeneMapper results
In the presence of degraded samples,
for th=100
• 1 detected peak for a degradation level
obtained after an UV exposure of 75 secs
• 0 detected peaks for a obtained after an
exposure of 150 secs onward 0
Peak finding results
For Th= 0.37*max(x(t))
7 peaks detected for the sample obtained
after an UV exposure of 150 seconds
• 3 peaks for the sample obtained after an
UV exposure of 240 seconds
17
18. Results on Low Copy Number (LCN) data
GeneMapper results
Th = 100
Peak finding results
• The success rate of GeneMapper typing system decreases when
decreasing the DNA amount present in the analyzed samples
• The amount of DNA factoring the sample presents a non-significant
impact on the performance detection of the peak finding algorithm
18
19. Obtained improvement
• Under UV degradation
PC
0%
35 secs
0%
75 secs
80%
150 secs
100%
240 secs
100%
• In the presence of LCN
1ng
-50.0%
100 pg
3.1%
30 pg
34.4%
10 pg
25.0%
• Results are reported for blue dye data by setting the
GeneMapperID threshold to 100 and averaging on two
samples for LCN data
• The peak detection rate improves in the presence of
degraded samples since the algorithm has been designed
to deal with noise signals
19
20. Conclusions
•
Strength of the proposed approach:
– it uses an adaptive threshold
– High discrimination power against wider peaks provided by
differentiation
•
Our experiments show the robustness of the proposed peak
finding algorithm to high level UV degradation and when dealing
with critical amount of DNA (less than 100pg)
•
Limitation: the peak detection algorithm uses a global threshold
•
Coming up:
– to model the degradation process
– designing a local threshold
– since the adopted derivative was first-order, we will carry out
experiments with higher order
20
21. Thanks!
Any questions?
Acknowledgements
•
•
•
•
•
The work was funded by Citer
Many thanks to Raghunandan Pasula, Lane Department of Computer
Science and Electrical Engineering, West Virginia University, for his
assistance during the development of the project;
Prof. Thomas C. O’Haver, Department of Chemistry and Biochemistry,
University of Maryland for his assistance with our queries regarding
peak detection;
National Forensic Science Technology Center (NFSTC) for providing
scientific training services.
The Peak Finding code developed by Prof. O’Haver was used in this work
21