SlideShare a Scribd company logo
1 of 10
Download to read offline
Assign 2.0: software for the analysis of
Phred quality values for quality control of
HLA sequencing-based typing
D.C. Sayer
D.M. Goodridge
F.T. Christiansen
Authors’ affiliations:
D.C. Sayer1,2,3
,
D.M. Goodridge1,3
,
F.T. Christiansen1,2
1
Department of Clinical
Immunology and Biochemical
Genetics, Royal Perth
Hospital, Wellington Street,
Perth 6000, Western
Australia, Australia
2
School of Surgery and
Pathology, Division of
Pathology, University of
Western Australia, Verdun
Street, Nedlands, Western
Australia, Australia
3
Conexio Genomics, PO Box
1670, Applecross, Western
Australia, Australia
Correspondence to:
David C. Sayer
Department of Clinical Immu-
nology and Biochemical
Genetics
Royal Perth Hospital
Wellington Street
Perth 6000
Western Australia
Australia
Tel.: þ61 8 92242899
Fax: þ61 8 92242920
e-mail: david.sayer@
health.wa.gov.au
Abstract: As improvements to DNA sequencing technology have resulted in
increasing the throughput of DNA sequencing, the bottleneck for high
throughput DNA sequencing-based typing (SBT) has shifted to sequence
analysis, genotyping and quality control (QC). Consistent high-quality DNA
sequence is required in order to reduce manual verification and editing of
sequence electropherograms. However, identifying systematic changes in
quality is difficult to achieve without the aid of sophisticated sequence
analysis programs dedicated to this purpose. We describe a computer
software program called Assign 2.0, which integrates sequence QC analysis
and genotyping in order to facilitate high-throughput SBT. Assign 2.0
performs an analysis of Phred quality values in order to produce quality
scores for a sample and a sequencing run. This enables sample-to-sample and
run-to-run QC monitoring and provides a mechanism for the comparison of
sequence quality between various genes, various reagents and various
protocols with the aim of improving the overall quality of DNA sequence data.
This, in turn, will result in reducing sequence analysis as a bottleneck for
high-throughput SBT.
Recent advances in DNA-sequencing technology, including the intro-
duction of capillary DNA sequencers and improvements in dye-
labelling technology (1, 2), have simplified DNA-sequencing protocols
and have improved the ability to detect heterozygous sequences.
As a result, an increasing number of clinical and research labora-
tories are using DNA sequencing in order to study genetic diversity.
This is particularly true for laboratories performing human leucocyte
antigen (HLA) typing for the matching of donors and recipients for
bone marrow transplantation. Several HLA genes are required for
matching for transplantation and each is highly polymorphic (http://
www.ebi.ac.uk/imgt/hla). Current genotyping approaches are hier-
archical and employ low typing resolution molecular techniques
that are relatively inexpensive and suitable for high throughput,
followed by DNA sequencing to provide high-resolution typing
when required. DNA sequencing is regarded as the gold standard
Key words:
assign; quality control; resequencing;
sequencing
Received 14 January 2004, revised 6 April 2004, accepted
for publication 19 April 2004
Copyright ß Blackwell Munksgaard 2004
doi: 10.1111/j.1399-0039.2004.00283.x
Tissue Antigens 2004: 64: 556–565
Printed in Denmark. All rights reserved
556
for HLA typing and therefore the ideal would be for DNA sequencing
to be the sole method for HLA typing. State-of-the-art DNA sequen-
cers provide the throughput requirements for most HLA-typing
laboratories. However, data analysis, including manual verification
of automated sequence base calling, allele assignment and quality
control (QC), is a significant impediment to high-throughput sequen-
cing-based typing (SBT).
HLA SBT is a complex multi-step process, which requires the
specific polymerase chain reaction (PCR) amplification of the region
to be sequenced, sequencing up to four polymorphic exons in both
directions, splicing the intron sequence and creating a single con-
catenated consensus sequence for analysis. The consensus sequence
is usually matched against a database of allele sequences in order to
identify those alleles, which are best matched to the test sequence.
Computer software programs, such as SeqScape1
v2.0 Software
(SeqScape) from Applied Biosystems (Foster City, CA), perform base
calling, align forward and reverse complementary sequences, splice
intron sequence and produce a concatenated consensus sequence for
allele assignment. However, base calling may be unreliable, espe-
cially for heterozygous sequence, because an arbitrary threshold for
heterozygosity is assigned based on the percentage of one peak
within another. If the threshold is too low, the presence of any back-
ground may result in false calling of heterozygotes. If the threshold
is not set low enough, then some heterozygotes with low di-
deoxynucleotide incorporation may be incorrectly called homozygotes.
Therefore, manual verification by viewing the sequence electropherograms
(EPG) is required.
The requirement for manual sequence base-call verification and
sequence editing is highest, when the quality of the sequence is poor.
The ability to obtain and maintain high-quality sequence is critical to
improving the throughput capabilities of SBT. High-quality sequence
results in improved accuracy of base calling and removes the time
required for manual verification. As sequencing is being increasingly
used in a clinical setting, guidelines for sequence quality have been
suggested by groups, such as the Clinical Molecular Genetics Society
(http://cmgs.org/BPG/Guidelines/2002/data%20quality). However,
these guidelines tend to be subjective.
Unique and objective approaches to SBT QC are required. We
suggest that various combinations of alleles in heterozygous sam-
ples, each with its own unique sequence, are amplified in PCR and
sequencing reactions with various efficiencies, largely as a result of
the different melting temperatures and GC content. Thus, every
sample should have its own QC. Furthermore, as the sequence for
every sample is usually derived from concatenated bi-directionally
sequenced units (BSU) or exons as is the case for most HLA class-I
SBT assays (3), the basic unit of QC should be the BSU. We have
developed a computer software program that enables such QC to be
performed. We have integrated this with our allele assignment
software in order to provide a comprehensive sequence-analysis
software program, called Assign 2.0. Assign 2.0 is suitable for high--
throughput HLA SBT or any resequencing application.
The Assign 2.0 QC tools enable the analysis of several indicators
of sequence quality. However, the primary function of Assign 2.0 is
the analysis of Phred quality values (PQV) (4, 5). Phred is a software
program, which provides a probability that a base call within a
sequence is correct by using the algorithm QV ¼ 10*log10 (PE),
where PE is the probability that the base call is an error. Thus, a
PQV of 40 indicates that there is a one in 10,000 chance that the base
call is incorrect. However, this algorithm was developed for cloned
template and the same interpretations of base call accuracy may not
apply to heterozygous sequence from PCR products. Therefore, we
have investigated whether PQV can have a broader utility for the
assessment of SBT QC and provide a quality score for a sequenced
sample and a sequencing run or gel. We demonstrate a unique and
informative objective assessment of sequence quality following the
analysis of PQV that enables the setting of target specifications of
quality. As a result, we are able to monitor samples and sequencing
runs for deviations from target specifications (accuracy) and exces-
sive variability around target specifications (precision), thus meeting
the criteria for effective QC (6).
Methods
Sequencing reactions were performed by means of Applied Biosys-
tems Big Dye1
Terminator v3.0 sequencing chemistry. All sequen-
cing was performed on an Applied Biosystems ABI PRISM1
3730
Genetic Analyzer (AB 3730). The AB 3730 is a 48 capillary auto-
mated DNA sequencer. HLA-A, HLA-B and HLA-C SBT protocols
were developed in house. Each locus was typed by means of DNA
sequencing following locus-specific amplification and bi-directional
sequencing of exons 2 and 3. HLA-A and HLA-C were amplified with
a single set of amplification primers and HLA-B was amplified in two
PCRs in order to amplify the HLA-B alleles in two groups character-
ized by the alternate ‘TA’ and ‘CG’ dimorphism located in intron 1 (7).
The locus names HLA-BTA and HLA-BCG have been used in order
to indicate the alternative PCR amplifications. The DNA sequences
were analysed in a two-step process. First, the sequences
were analysed with the help of ABI PRISM SeqScape1
Software
(SeqScape) in order to splice intron sequence, align forward and reverse
sequence strands and assign consensus sequence quality values. The
DNA sequence files in .xml format were then imported into Assign
2.0 for allele assignment and QC analysis. The data included in the
Sayer et al : Quality control of SBT
Tissue Antigens 2004: 64: 556–565 557
.xml files contain the consensus sequence base calls and the consensus
sequence PQV (CSPQV). The .xml files are named according to a
strict convention, which includes the sample name, the locus being
sequenced and the sequencing primer. In addition, the .xml file
storage system is organized by means of locus and sequencing
date in order to facilitate data retrieval and enable chronological
analysis of sequence QC data. Assign 2.0 QC tools perform inde-
pendent analysis of CSPQV of automated homozygous (CSPQV-hom)
and heterozygous (CSPQV-het) base calls for a single position, a
range of positions (e.g., exon 2 or exon 3) or a selected date range
for a selected locus. We present an analysis of data from HLA-A
SBT runs from 12 February 2003 to 7 July 2003. This included
1086 samples sequence on 76 different sequencing runs.
The Assign 2.0 allele assignment component of the software matches
the consensus test sequence against an HLA allele sequence library
generated from the IMGT/HLA database (http://www.ebi.ac.uk/imgt/hla).
The matching algorithm has been developed in order to enable high-
speed matching on multiple samples to facilitate high-throughput SBT.
Results
Assign 2.0 QC tools: QC analysis of PQV
As PQV have previously been reported to be an indicator of base call
accuracy (5) and therefore sequence quality, we examined the possi-
bility that analysis of CSPQV could be extrapolated in order to
provide useful QC data for sample and/or a sequencing run. The
hypothesis is that the mean and standard deviation (SD) of CSPQV
for all nucleotide positions will reflect the sequence quality of the
sample sequenced. Furthermore, the mean and SD CSPQV of all
nucleotides for all samples on a sequencing run will reflect the
quality of the sequencing run. However, in order to determine the
feasibility of this approach, we needed to determine the degree of
variability of base call CSPQV at the same site between various
sequences, which appeared visually to be of good quality, between
various samples. It is important to demonstrate that CSPQV only
varies because of changes in sequence quality. For this purpose, we
analysed the CSPQV at 100 conserved (and therefore homozygous)
positions within exon 2 of HLA-A SBT from 20 samples within
the same sequencing run. The results have been presented in Fig. 1.
The mean CSPQV between positions may differ slightly, but more
importantly the CSPQV at each position are reproducible between
various samples. All but three positions have SD of less than 5
CSPQV units and a coefficient of variation (CV) of 5% with a
mean CV value for all positions of 2.7%.
While CSPQV are highly reproducible between samples, the
CSPQV of homozygous and heterozygous base calls are different.
This is demonstrated for two polymorphic positions (positions 165
and 170) within exon 2 of HLA-A in Fig. 2 Figure 2(A) shows the
frequency distribution of CSPQV-hom and CSPQV-het for position
165 of HLA-A. HLA-A alleles can be either A or G at this position.
The grey bars represent the frequency distribution of CSPQV-het
base calls (where both A and G are sequenced) and the black bars
represent the frequency distribution of homozygous base calls (this
includes both A and G base calls). Similarly, Fig. 2B is a frequency
histogram of the CSPQV at position 170. HLA-A alleles are also
either A or G at this position. For both positions, the distribution of
the CSPQV for heterozygous and homozygous positions is normally
distributed, but the CSPQV-het values are lower than CSPQV-hom
values. At position 165, the mean CSPQV-het is 27.10 and SD is 1.14
and the mean CSPQV-hom is 40.84 and SD is 1.75. For position 170,
the mean CSPQV-het is 25.48 and SD is 1.14 and the CSPQV-hom is
40.86 and SD is 1.53.
As a result of the findings described above, we suggest that:
1. The mean and/or SD values of CSPQV-hom of a BSU (i.e., the
various exons for HLA class-I) will provide good indicators of
sequence quality of the BSU. Some samples may not have
0
5
10
15
20
25
30
35
40
45
50
Mean
CSPQV
0
2
4
6
8
10
12
14
16
18
20
SD
CSPQV
Mean
SD
Conserved sequence nucleotide positions within exon 2 HLA-A
Fig. 1. The mean and standard deviation of
consensus sequence PQV (CSPQV) at 100
conserved (therefore, homozygous) positions
of exon 2 of HLA-A are shown from 20
consecutive unrelated samples. The mean
CSPQV (the plot in the top half of the graph) varies
between positions within the same sequence, but the
CSPQV at one position is reproducible between
samples as indicated by the low-standard deviations.
This indicates that a mean value of all CSPQV-hom for
a BSU should provide an indication of sequence
quality of the BSU. BSU, bi-directionally sequenced
units; CSPQV-hom, CSPQV of automated homozygous
base calls; PQV, Phred quality values.
Sayer et al : Quality control of SBT
558 Tissue Antigens 2004: 64: 556–565
heterozygous positions and so the use of CSPQV-het should not
be used as an indicator of sequence quality of a BSU.
2. The mean and/or SD values of all CSPQV-hom for all samples on a
sequencing run will provide good indicators of sequence quality of
the sequencing run.
3. Sequence quality ‘target’ (or ‘expected’) values can be calculated
from multiple data points and the mean and SD values of CSPQV
for individual BSU and sequencing runs can be compared to
expected values according to Shewhart rules for analysing con-
trols (6).
In order to test these hypotheses, we performed a retrospective
analysis of SBT data for HLA obtained between 12 February 2003
and 7 July 2003.
Within-run QC analysis
The graphs shown in Figs. 3 and 5 are examples of CSPQV analysis
that can be performed by the Assign 2.0 QC tools in just a few
seconds. Analyses of CSPQV-hom data for exons 2 and 3, respec-
tively, for each of 24 samples of the HLA-A SBT run 10–05–03 have
been presented in Fig. 3(A, B). In both graphs, the mean and SD data
are mirror images such that a sample with a high mean CSPQV
usually has a low SD. Grey bars with a horizontal line through the
middle have been used in order to indicate the mean  2  SD of
CSPQV data calculated from all runs between 12 February 2003 and
7 July 2003.
The exon 2 graph (Fig. 3A) reveals considerable variability
between samples, compared to the graph for exon 3 (Fig. 3B). This
40
(A)
35
30
mean = 27.10
Heterozygous
Sequence
Heterozygous
sequence
Homozygous
sequence
Homozygous
Sequence
SD = 0.90
mean = 25.48
SD = 1.14 mean = 40.86
SD = 1.53
mean = 40.90
SD = 1.75
25
HLA-A exon 2 position 165
HLA-A exon 2 position 165
20
Frequency
(%)
Frequency
(%)
15
10
5
0
40
35
30
25
20
15
10
5
0
1 4
(B)
7 10 13 16 19
PQV scores
22 25 28 31 34 37 40 43 46 49
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Fig. 2. The frequency histograms of
consensus sequence PQV (CSPQV) at
homozygous (black bars) and heterozygous
(grey bars) base calls have been shown for
two polymorphic positions (positions 165,
Fig. 2A and 170, Fig. 2B) within exon 2 of
HLA-A for all samples (n ¼ 1086 samples)
sequenced between 12 February 2003 and 7
July 2003. The distribution of the CSPQV is bi-
modal with CSPQV of heterozygous base calls being
less than the CSPQV of homozygous base calls. These
results indicate that homozygous and heterozygous
CSPQV should be considered independently if CSPQV
is used as a measure of sequence quality for a sample
or a sequencing run as the number of heterozygous
positions vary between samples.
Sayer et al : Quality control of SBT
Tissue Antigens 2004: 64: 556–565 559
indicates variability in sequence quality between the exon 2
sequences of the samples and consistent high-quality sequence for
exon 3 for all samples. Analysis of the sequence EPG for the forward
and reverse sequencing primers for exon 2 revealed that the
sequences from the forward sequencing primer contained high back-
ground for some samples, whereas the reverse sequencing primers
resulted in consistent good quality sequence (data not shown). The
CSPQV is deduced from PQV from both strands and poor quality
sequence on one strand is sufficient to reduce the CSPQV. The EPG
from the forward sequencing primer for some of the samples with
and without background have been shown in Fig. 4. A comparison of
the EPG and CSPQV-hom for these samples reveals that when the
background is high, i.e., the quality of sequence is poor (e.g., samples
13 and 19), the mean CSPQV-hom is low (35.01 and 33.58, respec-
tively) and SD is high (7.61 and 8.42, respectively). In samples where
there is no background, i.e., good quality sequence (e.g., samples 02,
21 and 06), the mean CSPQV is high (41.41, 41.30 and 41.03, respec-
tively) and the SD is low (2.2, 2.1 and 1.8, respectively).
These data demonstrate that mean and SD of CSPQV-hom are
sensitive and quantitative measurements of sequence quality.
With the exception of sample 3, the QC data for exon 3 indicate that
all sequence is of similar quality. Furthermore, all CSPQV-hom means
are greater than the expected mean CSPQV (horizontal line through the
middle of the grey bar) and all but one of the sample SDs are below the
expected SD. This indicates that the quality of sequence obtained for
exon 3 for all samples of this run is of greater quality than is expected.
For sample 3, only two of the 276 bases of exon 3 were included by the
SeqScape algorithm for analysis for one of the sequencing primers. As
a result, much of the sequence is single-stranded. The high PQV is an
anomaly of the SeqScape/Phred algorithm where the CSPQV may be
higher for single-strand sequence than for those with bi-directional
coverage. As a result, a SD was not calculated for this sample.
48
(A) Exon 2
Run 05_10_03. Position: exon 2
(B) Exon 3
Run 05_10_03. Position: exon 3
44
40
36
32
28
PQV-hom
mean
PQV-hom
mean
PQV-hom
SD
PQV-hom
SD
24
20
16
12
8
4
0
48
44
40
36
32
28
24
20
16
12
8
4
0
01 02 03 04 05 06 07 08 09 10 11 12
Sample
Sample
13 14
Mean (this run) = 39.82
SD (this run) = 1.96
Mean (this run) = 4.00
SD (this run) = 1.99
Mean (this run) = 40.6
SD (this run) = 1.29
Mean (this run) = 3.41
SD (this run) = 0.81
15 16 17 18 19 20 21 22 23 24
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
20
18
16
14
12
10
8
6
4
2
0
20
18
16
14
12
10
8
6
4
2
0
Fig. 3. The mean and standard deviation (SD) of
consensus sequence PQV (CSPQV) for
homozygous base calls within exon 2 (Fig. 3A)
and exon 3 (Fig.3B) have been shown for each of
24 samples within an HLA-A SBT run (run
ID¼ 05–10–03). The mean values of each sample are
plotted on the top part of each graph and are associated
with the Y-axis on the left hand side of the graph and the
SD values are plotted on the lower half of each graph
and the values are on the Y-axis on the right hand side of
the graph. The grey bars represent the mean 2  SD
limits of the mean and SD values of all samples for all
runs (n ¼ 76 runs) between 12 February 2003 and 7 July
2003. The mean and SD plots are mirror images, such
that when the mean is high, the SD is low and vice versa.
The plots demonstrate why individual BSU, in this case
each exon, are analysed separately. The exon 2 data is
variable, indicating sequence of variable quality, with
the mean and SD CSPQV for two samples (e.g., samples
13 and 19) outside the expected limits. By contrast, the
exon 3 data are much more consistent with all values
being on or greater than the expected mean of the mean
CSPQV and all but one sample being below the mean of
the expected SD CSPQV. These data indicate a potential
problem of varying degree effecting exon 2 sequences
only. SBT, sequencing-based typing;
Sayer et al : Quality control of SBT
560 Tissue Antigens 2004: 64: 556–565
Between-run QC analysis
In contrast to Fig. 3, where sample-to-sample QC analysis within a
sequencing run is demonstrated, Fig. 5 demonstrates run-to-run
(between-run) QC analysis. Between-run analysis is performed by
plotting the mean and SD of CSPQV calculated from all positions
for all samples on a sequencing run. This has been demonstrated in
Fig. 5, where the CSPQV data for exons 2 and 3 are plotted for each
run between 12 February 2003 and 7 July 2003 (76 runs, 1086
samples). The grey bars represent the mean  2  SD of data from
all runs. The data from the sequencing run of 10–05–03 (as demon-
strated in Fig. 3) are indicated by the arrows and does not appear to
be significantly different from data from other runs. However, the
exon 2 mean and SD data from the 19 runs after the run of 10–05–03
indicate that there has been a change in sequence quality. For nine of
the last 19 runs, the mean CSPQV-hom is below the expected mean
and four of the nine are on the lower 2  SD limit. By contrast, only
one result of the previous 57 runs has been on the lower 2  SD limit.
Similarly for the SD data for CSPQV-hom, 14 of the last 19 runs have
SD greater than the expected SD value. This indicates a change of
sequence quality as a result of the variable sequence obtained with
the exon 2 forward sequencing primer shown in Fig. 4. It is of interest
to note that similar changes in sequence quality are not indicated by
the CSPQV-het data. It is not clear why this is the case, but it may be
because of the smaller number of heterozygous sequence positions,
some which may be at positions where the background does not exist.
It is of interest to note that, although unlikely to be statistically
significant, the mean CSPQV-hom for exon 2 for all runs is higher
than the mean CSPQV-hom for exon 3 for all runs (exon 2 ¼ 40.06,
exon 3 ¼ 38.93). In addition, the SD is lower (mean SD for exon 2 is
3.99 and for exon 3 the mean SD is 5.25). This indicates that the
sequence quality for exon 2 is consistently better than the sequence
quality for exon 3. It is possible that this difference is because of the
inherent sequence differences between exon 2 and exon 3. However,
this difference may suggest that the conditions are not optimal for
exon 3. Table 1 lists the mean and SD of CSPQV-hom for the BSU
(i.e., exons 2 and 3) of HLA-A, HLA-B (both the HLA-BTA and HLA-
BCG HLA-B protocols) and HLA-C sequenced during the same
period. The exon 2 BSU sequence of HLA-A has the highest mean
PQV-hom and lowest SD, compared to all the other BSU for the other
loci. This indicates that the sequence quality obtained for the HLA-A
exon 2 BSU is better than the quality of sequence for the exon 3 BSU
of HLA-A and better than the sequence for all other BSU for the other
loci. The challenge now is to understand why this is the case and
optimize the sequencing conditions for the other loci to improve the
sequence quality at least to the level of the HLA-A exon 2 BSU.
Allele assignment
An example of an HLA allele assignment result page has been shown
in Fig. 6. A unique feature of Assign 2.0 is that the result page
contains important QC information in addition to the HLA allele
assignment. The allele assignment is displayed as a list of allele
combinations within the library that are best matched with test
sequence. Mismatched positions include the sequence base call of
the test sample at this position and the expected base call for the
allele combination. Additional information, including the CSPQV of
the test sequence at the mismatched positions and whether there was
Sample 02
PQV
41.41
41.30
41.03
39.56
38.56
35.01
33.58
2.2
2.1
1.8
5.3
6.0
7.6
8.4
Mean SD
Sample 21
Sample06
Sample 01
Sample 04
Sample 13
Sample 19
Fig. 4. The electropherogram (EPG) from a
region of exon 2 for selected samples from run
10–05–03 has been shown. The figure also
includes the mean and SD CSPQV-hom for the
samples of the EPG. When the sequence quality is
good (no background noise), the CSPQV means are
high and SDs are low. As the background noise
increases, the mean CSPQV-hom decreases and the SD
increases. CSPQV is an indicator of sequence quality.
The background noise appears as non-specific peaks
usually smaller than the specific sequence peak.
CSPQV, consensus sequence PQV; CSPQV-hom,
CSPQV of automated homozygous base calls; PQV,
Phred quality values.
Sayer et al : Quality control of SBT
Tissue Antigens 2004: 64: 556–565 561
a discrepancy between forward and reverse strand base calls (FRD)
or whether the mismatched position was sequenced in a single
direction only (SS), is also shown. Base calls that have arisen from
sequencing one strand only are also indicated in the result table by
‘SS’ in the ‘Quality Values’ row (not present in the example in Fig. 6).
The QC information of the sample includes the number of bases
sequenced (e.g., n ¼ 546 of the 546 bases which constitute exon
2 þ exon 3 for HLA-A, the homozygous and heterozygous base call
CSPQV (CSPQV-hom and CSPQV-het) statistics (mean CSPQV-
hom ¼ 39.9 and SD ¼ 4.3, mean CSPQV-het ¼ 25.8 and SD ¼ 2.1)
and the SS (0% for homozygous base calls, 0% for heterozygous
base calls) and FRD data (2% of homozygous and 0% of heterozy-
gous consensus base calls had FRD).
In the example shown in Fig. 6, there are two mismatches between
the test sequence and the best-matched alleles. Both mismatches
(position 282 and 448) are at positions, where there was an FRD.
An FRD indicates a base call error when sequencing in one direction
and high potential for an incorrect consensus base call. Such a
position is a priority for manual review. In addition, the base calls
at these positions are mismatched against all of the alleles in the
result table, indicating that the test sequence contains unique poly-
morphisms or they are incorrect base calls. By contrast, the base call
50 20
18
16
14
12
10
8
CSPQV
SD
6
4
2
0
20
18
16
14
12
10
8
CSPQV-het
SD
6
4
2
0
20
18
16
14
12
10
8
6
4
2
0
Homozygous base calls by SBT run-HLA-A exon 2 Homozygous base calls by SBT run-HLA-A exon 3
Heterozygous base calls by SBT run-HLA-A exon 2 Heterozygous base calls by SBT run-HLA-A exon 3
45
40
Mean (all runs) = 40.06
Mean (all runs) = 3.99
SD (all runs) = 1.05
Mean (all runs) = 23.97
SD (all runs) = 2.31
Mean (all runs) = 3.94
SD (all runs) = 1.5
Mean (all runs) = 3.94
SD (all runs) = 1.40
Mean (all runs) = 38.93
SD (all runs) = 1.58
Mean (all runs) = 5.25
SD (all runs) = 1.08
SD (all runs) = 1.16
35
30
25
CSPQV
mean
CSPQV-het
Mean
CSPQV
hom
Mean
CSPQV-het
mean
CSPQV
hom
SD
20
18
16
14
12
10
8
6
4
2
0
CSPQV-het
SD
20
15
10
5
0
30
28
26
24
22
20
18
16
14
12
10
8
6
4
2
0
30
28
26
24
22
20
18
16
14
12
10
8
6
4
2
0
50
45
40
35
30
25
20
15
10
5
0
Sequence run
Sequence run
Sequence run
Sequence run
Fig. 5. Between-run monitoring of sequence quality has been shown. The mean and SD CSPQV-hom and CSPQV-het for all samples of each
run (n ¼ 76 runs) for the period 12 February 2003 and 7 July 2003 have been plotted for exons 2 and 3. The grey bars represent the
mean  2  SD limits for all values on each graph. As for Fig. 3(A,B), the mean values have been shown in the top half of the graph and the SD values have been
shown in the bottom half of each graph. The arrows show the values for the run 5_10_3 (from Fig. 3A,B). Despite the poor quality sequence for the forward
sequencing primer in exon 2 for some samples in runs that follow 5_10_03, run mean does not fall out of the 2  SD limits (see the top left hand graph). However,
it is of interest to note that of the 19 runs following the run of 05–10–03, nine of the runs have a mean value below the expected mean and four of the nine runs
have values on the lower limit. By contrast, only one run in the previous 57 runs has been on the lower limit. This indicates a shift (decrease) in the mean CSPQV
for this assay, as a result of the suboptimal sequence obtained from the forward sequencing primer. The situation is similar for the SD values. Fourteen of the
last 19 SD value runs are greater than the mean SD value for all runs, indicating a shift in the mean SD for this assay. By contrast, the exon 3 data indicate that
the quality of sequence has increased. Sixteen of the last 19 runs are above the expected mean CSPQV and 12 of the last 19 are below the expected SD. This
indicates an overall improvement of SBT of exon 3 of HLA-A. However, a specific problem exists with the forward sequencing primer of exon 2. The changes in
sequence quality demonstrated in the CSPQV-hom data are not reflected in the CSPQV-het data. CSPQV, consensus sequence PQV; CSPQV-het, CSPQV of
heterozygous base cells; CSPQV-hom, CSPQV of automated homozygous base cells; PQV, Phred quality values.
Sayer et al : Quality control of SBT
562 Tissue Antigens 2004: 64: 556–565
at position 258 is ‘C’ and the CSPQV at this position is 42. This
indicates that ‘C’ has been called on both strands and a CSPQV of 42
indicates sequence of high quality and very low probability of an
incorrect base call.
Confirmation of base calls at positions within the mismatch table
is performed by viewing the EPG in SeqScape. Any edits to the
sequence are then performed directly in Assign 2.0 and the result
table is updated without the need for re-analysing the sequence
against the allele sequence library (i.e., in real time). Following con-
firmation of all base calls, Assign 2.0 will produce a report listing the
alleles that are best matched to the test sequence. The operator can
then click to the next sample for analysis and the result table is
immediately updated with data from the next sample.
Discussion
We have described a sequence data analysis computer software
program called Assign 2.0 that combines allele assignment with a
comprehensive and effective quality control system. Thousands of
sequences can be analysed in seconds making Assign 2.0 suitable
for high throughput sequencing-based typing or any resequencing
project. We have used the sequence-based typing of the highly
polymorphic HLA-A locus to demonstrate the utility of Assign 2.0.
The unique feature of Assign 2.0 is the ability to analyse PQV in
order to provide a comprehensive QC analysis of SBT data. We have
demonstrated that the mean and SD of all CSPQV-hom within a BSU
are sensitive indicators of sequence quality for that sample. Similarly,
the CSPQV-hom data for all BSU for all samples within a sequencing
run provide QC data for that sequencing run. As a result, sample-
to-sample and run-to-run QC monitoring can be performed.
Furthermore, the normal distribution of mean PQV data indicates
that Shewhart control graphs can be used and changes in sequence
quality can be accurately monitored. These processes add very little
time to the SBT process and yet provide valuable QC data.
A retrospective analysis of all data from February 2003 to July
2003 generated in our laboratory revealed changes in sequence quality
associated with an intermittent increase in background with a single
sequencing primer in our HLA-A SBT assay. This resulted a greater
than expected number of runs falling below the expected mean
CSPQV-hom. In addition, a comparison of CSPQV-hom data between
our HLA-A, HLA-B and HLA-C SBT assays revealed a difference in
sequence quality between the assays with HLA-A exon 2 providing
the best quality data. We are in the process of using Assign 2.0 in
order to re-optimize the HLA-B, HLA-C and HLA-A exon 3 assays so
optimal quality sequence data are obtained.
It is of interest to note that Phred was not designed to provide quality
values for heterozygous sequence (4, 5). However, the data shown in Fig.2
demonstrate that CSPQV-het are normally distributed but with a much
lower mean than CSPQV-hom. Therefore, in theory, CSPQV-het can also
be used for monitoring sequence quality. In most cases, the mean and SD
values of CSPQV-hom were mirror images, indicating that either of these
values, or the coefficient of variation (CV (%) ¼ SD*100/mean) can be
used as an indicator of sequence quality. The data presented in this study
did not indicate that analysis of CSPQV-het provided as sensitive an
indicatorofqualityasCSPQV-hom.Thisislikelytobebecauseofvariable
and low numbers of heterozygous positions, compared to homozygous
positions within a sequence.
The analysis of CSPQV in the ways we have described provides the
ability to assess the effect of reagents and SBT protocols on sequence
data quality. By improving the data obtained from SBT protocols, the
data analysis component of SBT protocols will be significantly reduced
and SBT will become a high-throughput protocol for measuring diver-
sity. In addition, the Assign 2.0 QC tools can be used for between-
laboratory comparison of data and provide a means of standardizing
SBT assays through workshops and QA exchange programs.
The applications of DNA sequencing are moving from the ‘sequence
factories’, where cloned DNA from a single chromosome is sequenced,
to studies of genetic diversity that includes the sequencing of PCR
products of highly polymorphic genes from pairs of chromosomes.
This includes research studies of evolution and population migration
(8) or for clinical diagnostic purposes (9–11). In addition, DNA sequen-
cing is being used by some laboratories for low to medium throughput
SNP analysis and de novo mutation detection (Ivo Gut, CNG, Paris,
France, personal communication). Appropriate QC is critical. Obtain-
ing, maintaining and monitoring sequence quality is required for all of
these applications. This manuscript describes a means by which
appropriate sequencing QC can be performed.
Assign v3.0 has been developed and does not require a third party
software, such as SeqScape, thus further improving the efficiency of SBT.
Mean and standard deviation CSPQV for homozygous base calls (CSPQV-hom) of
exon 2 and exon 3 of various HLA class-I SBT assays
CSPQV-hom
Exon 2 Exon 3
Locus Mean SD Mean SD
HLA-A 40.06 1.05 38.93 1.58
HLA-BCG 38.70 1.95 39.07 2.04
HLA-BTA 39.07 2.04 39.22 1.73
HLA-C 39.33 2.55 38.43 2.81
HLA-A exon 2 results in sequence quality with highest mean CSPQV and lowest SD, which may
reflect that the SBT conditions are better optimized for this BSU than the BSU of other loci. BSU,
bi-directionally sequenced units; CSPQV, consensus sequence PQV; PQV, Phred quality values;
SBT, sequencing-based typing.
Table 1
Sayer et al : Quality control of SBT
Tissue Antigens 2004: 64: 556–565 563
A
I
H
B C
E
F
G
D
A) Browse window for locating the .xml files for analysis
B) Locus being typed. If the locus is indicated in the sample name the selected locus in the ‘‘Locus’’ pane is over ridden
C) Indicates the maximum tolerance at which results are listed. Assign will list the best matched alleles up to 31 mismatches within the library.
D) The sample quality control information for the homozygous and heterozygous base calls. Included is the mean and standard deviation Phred quality value
information. The amount of sequence which was from a single strand (SS) and the percentage of base calls which were made from forward/reverse strand base
call discrepancies
E) Contains the ID of the sample for which the report is shown. The number of bases sequenced in also shown
F) This is the results pane. It lists the alleles which are best matched with the test sequence, the number of sequence differences between the alleles and the test
sequence and the sequence base call information at positions that are discrepant between the test sequence and the best matched alleles. This includes the
observed base calls of the test sample, the Phred quality value which is colour coded to represent base calls of high quality which do not require review (green).
Base calls which require review but which are probably correct (yellow) and base calls which definitely require review because they are either at a position with
single strand coverage, there is a forward/reverse strand base call discrepancy or the sequence quality is very poor (red).
G) This is the editor window and allows confirmation of the base calls. Once confirmed the final result can be determined and a report is generated
H) This is the list of samples that have been analysed. Selecting a sample ID results in immediate viewing of the SBT details as described above. Above the
sample IDs is the date of the release of the IMGT/HLA database.
I) This is the control panel which includes access to the QC tools
Fig. 6. A typical allele assignment result page has been shown. A detailed description of the result page is present in the key. The result page contains
the list of alleles, which are best matched to the test sequence, ranked in order of best match. The results have been presented, so that mismatched sequence
positions have been listed across the result page in sequence number order and include the consensus sequence of the test sample, the Phred quality value of the
consensus sequence (CSPQV) base call, if there was a forward and reverse strand base call discrepancy (FRD) and if the position was sequenced in both
directions (SS if sequence was from a single strand only) and the corresponding sequence of the alleles within the table. Moreover, included on the result page
are the total number of bases sequenced, the mean and standard deviation of CSPQV of the homozygous sequence base calls, CSPQV of the heterozygous base
calls, the number of positions (expressed as a percentage of the homozygous and heterozygous base calls), at which there were forward and reverse strand
sequence base call discrepancies (FRD), and the total amount of SS sequence.
Sayer et al : Quality control of SBT
564 Tissue Antigens 2004: 54: 556–565
References
1. Rosenblum BB, Lee LG, Spurgeon SL et al.
New dye-labeled terminators for improved
DNA sequencing patterns. Nucleic Acids Res
1997: 25: 4500–4.
2. Lee LG, Spurgeon SL, Heiner CR et al. New
energy transfer dyes for DNA sequencing.
Nucleic Acids Res 1997: 25: 2816–22.
3. Sayer DC, Whidborne R, De Santis D,
Rozemuller E, Christiansen FT, Tilanus M. A
multi centre evaluation of single-tube
amplification protocols for SBT of HLA-DRB1
and HLA-DRB3, 4, 5 are reproducible and
robust. HLA 2002. 2003. Tissue Antigens
2004: 63(5): 412–23.
4. Ewing B, Green P. Base-calling of automated
sequencer traces using phred. II. Error
probabilities. Genome Res 1998: 8: 186–94.
5. Ewing B, Hillier L, Wendl MC, Green P. Base-
calling of automated sequencer traces using
phred. I. Accuracy assessment. Genome Res
1998: 8: 175–85.
6. Shewhart WA. Economic Control of Quality of
Manufactured Product, 1st edn. New York:
Van Nostrand, 1931.
7. Cereb N, Yang SY. Dimorphic primers derived
from intron 1 for use in the molecular typing
of HLA-B alleles. Tissue Antigens 1997: 50:
74–6.
8. Malhi RS, Mortensen HM, Eshleman JA et al.
Native American mtDNA prehistory in the
American Southwest. Am J Phys Anthropol 2003:
120: 108–24.
9. Sayer DC, Land S, Gizzarelli L et al. A quality
assessment program (QAP) for genotypic
antiretroviral testing (GART) results in an
improvement in the detection of drug
resistance mutations. J Clin Microbiol 2003:
41: 227–36.
10. Sayer D, Whidborne R, Brestovac B, Trimboli F,
Witt C, Christiansen F. HLA-DRB1 DNA
sequencing based typing: an approach
suitable for high throughput typing including
unrelated bone marrow registry donors.
Tissue Antigens 2001: 57: 46–54.
11. Pryce TM, Palladino S, Kay D, Coombs GW.
Rapid identification of fungi by sequencing
the ITS1 and ITS2 regions using an
automated capillar electrophoresis system.
Med Mycol 2003: 41: 369–81.
Sayer et al : Quality control of SBT
Tissue Antigens 2004: 64: 556–565 565

More Related Content

Similar to Assign 2.0 software for the analysis of Phred quality values for quality control of HLA sequencing-based typing.pdf

Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data AnalysisRavi Gandham
 
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...QIAGEN
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods Zohaib HUSSAIN
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowHorizonDiscovery
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshopGenomeInABottle
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsDelaina Hawkins
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsGolden Helix Inc
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GenomeInABottle
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923GenomeInABottle
 
Achieve Complete Coverage of the SARS-CoV-2 Genome
Achieve Complete Coverage of the SARS-CoV-2 GenomeAchieve Complete Coverage of the SARS-CoV-2 Genome
Achieve Complete Coverage of the SARS-CoV-2 GenomeCamille Cappello
 
2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappanElsa von Licy
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsGenomeInABottle
 

Similar to Assign 2.0 software for the analysis of Phred quality values for quality control of HLA sequencing-based typing.pdf (20)

Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data Analysis
 
ISHIposter16_f
ISHIposter16_fISHIposter16_f
ISHIposter16_f
 
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
CV 2015
CV 2015CV 2015
CV 2015
 
Ngs webinar 2013
Ngs webinar 2013Ngs webinar 2013
Ngs webinar 2013
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Achieve Complete Coverage of the SARS-CoV-2 Genome
Achieve Complete Coverage of the SARS-CoV-2 GenomeAchieve Complete Coverage of the SARS-CoV-2 Genome
Achieve Complete Coverage of the SARS-CoV-2 Genome
 
20140711 2 j_willey_ercc2.0_workshop
20140711 2 j_willey_ercc2.0_workshop20140711 2 j_willey_ercc2.0_workshop
20140711 2 j_willey_ercc2.0_workshop
 
2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
 

More from Crystal Sanchez

Free Printable Castle Templates - PRINTABLE T
Free Printable Castle Templates - PRINTABLE TFree Printable Castle Templates - PRINTABLE T
Free Printable Castle Templates - PRINTABLE TCrystal Sanchez
 
Writing An Abstract For A Research Paper Guideline
Writing An Abstract For A Research Paper GuidelineWriting An Abstract For A Research Paper Guideline
Writing An Abstract For A Research Paper GuidelineCrystal Sanchez
 
Start Writing Your Own Statement Of Purpose (SO
Start Writing Your Own Statement Of Purpose (SOStart Writing Your Own Statement Of Purpose (SO
Start Writing Your Own Statement Of Purpose (SOCrystal Sanchez
 
Top 10 Effective Tips To Hire Your Next Essay Writer TopTeny.Com
Top 10 Effective Tips To Hire Your Next Essay Writer TopTeny.ComTop 10 Effective Tips To Hire Your Next Essay Writer TopTeny.Com
Top 10 Effective Tips To Hire Your Next Essay Writer TopTeny.ComCrystal Sanchez
 
016 My Career Goals 1024X867 Essay Example
016 My Career Goals 1024X867 Essay Example016 My Career Goals 1024X867 Essay Example
016 My Career Goals 1024X867 Essay ExampleCrystal Sanchez
 
Research Process- Objective, Hypothesis (Lec2) Hypothesis, Hypothesis
Research Process- Objective, Hypothesis (Lec2) Hypothesis, HypothesisResearch Process- Objective, Hypothesis (Lec2) Hypothesis, Hypothesis
Research Process- Objective, Hypothesis (Lec2) Hypothesis, HypothesisCrystal Sanchez
 
PDF A Manual For Writers Of Research Papers, Theses
PDF A Manual For Writers Of Research Papers, ThesesPDF A Manual For Writers Of Research Papers, Theses
PDF A Manual For Writers Of Research Papers, ThesesCrystal Sanchez
 
Write My Persuasive Speech, 11 Tips How To Writ
Write My Persuasive Speech, 11 Tips How To WritWrite My Persuasive Speech, 11 Tips How To Writ
Write My Persuasive Speech, 11 Tips How To WritCrystal Sanchez
 
University Entrance Essay Help. Online assignment writing service.
University Entrance Essay Help. Online assignment writing service.University Entrance Essay Help. Online assignment writing service.
University Entrance Essay Help. Online assignment writing service.Crystal Sanchez
 
Essay About My First Day At A New Schoo. Online assignment writing service.
Essay About My First Day At A New Schoo. Online assignment writing service.Essay About My First Day At A New Schoo. Online assignment writing service.
Essay About My First Day At A New Schoo. Online assignment writing service.Crystal Sanchez
 
Why Dogs Are Better Pets Than Cats Essay
Why Dogs Are Better Pets Than Cats EssayWhy Dogs Are Better Pets Than Cats Essay
Why Dogs Are Better Pets Than Cats EssayCrystal Sanchez
 
Abstracts For Research Papers What Are Some Fre
Abstracts For Research Papers What Are Some FreAbstracts For Research Papers What Are Some Fre
Abstracts For Research Papers What Are Some FreCrystal Sanchez
 
8 Steps To Write Your Memoir Memoir Writing Prompts,
8 Steps To Write Your Memoir Memoir Writing Prompts,8 Steps To Write Your Memoir Memoir Writing Prompts,
8 Steps To Write Your Memoir Memoir Writing Prompts,Crystal Sanchez
 
(PDF) How To Write A Book Review. Online assignment writing service.
(PDF) How To Write A Book Review. Online assignment writing service.(PDF) How To Write A Book Review. Online assignment writing service.
(PDF) How To Write A Book Review. Online assignment writing service.Crystal Sanchez
 
How To Format An Apa Paper. How To Format A
How To Format An Apa Paper. How To Format AHow To Format An Apa Paper. How To Format A
How To Format An Apa Paper. How To Format ACrystal Sanchez
 
Best College Essay Ever - UK Essay Writing Help.
Best College Essay Ever - UK Essay Writing Help.Best College Essay Ever - UK Essay Writing Help.
Best College Essay Ever - UK Essay Writing Help.Crystal Sanchez
 
Home - Write Better Scripts Screenplay Writing, Writin
Home - Write Better Scripts Screenplay Writing, WritinHome - Write Better Scripts Screenplay Writing, Writin
Home - Write Better Scripts Screenplay Writing, WritinCrystal Sanchez
 
Free Classification Essay Examples Topics, Outline
Free Classification Essay Examples Topics, OutlineFree Classification Essay Examples Topics, Outline
Free Classification Essay Examples Topics, OutlineCrystal Sanchez
 
Contoh Essay Argumentative Beinyu.Com. Online assignment writing service.
Contoh Essay Argumentative Beinyu.Com. Online assignment writing service.Contoh Essay Argumentative Beinyu.Com. Online assignment writing service.
Contoh Essay Argumentative Beinyu.Com. Online assignment writing service.Crystal Sanchez
 
Persuasive Essay Introduction Examp. Online assignment writing service.
Persuasive Essay Introduction Examp. Online assignment writing service.Persuasive Essay Introduction Examp. Online assignment writing service.
Persuasive Essay Introduction Examp. Online assignment writing service.Crystal Sanchez
 

More from Crystal Sanchez (20)

Free Printable Castle Templates - PRINTABLE T
Free Printable Castle Templates - PRINTABLE TFree Printable Castle Templates - PRINTABLE T
Free Printable Castle Templates - PRINTABLE T
 
Writing An Abstract For A Research Paper Guideline
Writing An Abstract For A Research Paper GuidelineWriting An Abstract For A Research Paper Guideline
Writing An Abstract For A Research Paper Guideline
 
Start Writing Your Own Statement Of Purpose (SO
Start Writing Your Own Statement Of Purpose (SOStart Writing Your Own Statement Of Purpose (SO
Start Writing Your Own Statement Of Purpose (SO
 
Top 10 Effective Tips To Hire Your Next Essay Writer TopTeny.Com
Top 10 Effective Tips To Hire Your Next Essay Writer TopTeny.ComTop 10 Effective Tips To Hire Your Next Essay Writer TopTeny.Com
Top 10 Effective Tips To Hire Your Next Essay Writer TopTeny.Com
 
016 My Career Goals 1024X867 Essay Example
016 My Career Goals 1024X867 Essay Example016 My Career Goals 1024X867 Essay Example
016 My Career Goals 1024X867 Essay Example
 
Research Process- Objective, Hypothesis (Lec2) Hypothesis, Hypothesis
Research Process- Objective, Hypothesis (Lec2) Hypothesis, HypothesisResearch Process- Objective, Hypothesis (Lec2) Hypothesis, Hypothesis
Research Process- Objective, Hypothesis (Lec2) Hypothesis, Hypothesis
 
PDF A Manual For Writers Of Research Papers, Theses
PDF A Manual For Writers Of Research Papers, ThesesPDF A Manual For Writers Of Research Papers, Theses
PDF A Manual For Writers Of Research Papers, Theses
 
Write My Persuasive Speech, 11 Tips How To Writ
Write My Persuasive Speech, 11 Tips How To WritWrite My Persuasive Speech, 11 Tips How To Writ
Write My Persuasive Speech, 11 Tips How To Writ
 
University Entrance Essay Help. Online assignment writing service.
University Entrance Essay Help. Online assignment writing service.University Entrance Essay Help. Online assignment writing service.
University Entrance Essay Help. Online assignment writing service.
 
Essay About My First Day At A New Schoo. Online assignment writing service.
Essay About My First Day At A New Schoo. Online assignment writing service.Essay About My First Day At A New Schoo. Online assignment writing service.
Essay About My First Day At A New Schoo. Online assignment writing service.
 
Why Dogs Are Better Pets Than Cats Essay
Why Dogs Are Better Pets Than Cats EssayWhy Dogs Are Better Pets Than Cats Essay
Why Dogs Are Better Pets Than Cats Essay
 
Abstracts For Research Papers What Are Some Fre
Abstracts For Research Papers What Are Some FreAbstracts For Research Papers What Are Some Fre
Abstracts For Research Papers What Are Some Fre
 
8 Steps To Write Your Memoir Memoir Writing Prompts,
8 Steps To Write Your Memoir Memoir Writing Prompts,8 Steps To Write Your Memoir Memoir Writing Prompts,
8 Steps To Write Your Memoir Memoir Writing Prompts,
 
(PDF) How To Write A Book Review. Online assignment writing service.
(PDF) How To Write A Book Review. Online assignment writing service.(PDF) How To Write A Book Review. Online assignment writing service.
(PDF) How To Write A Book Review. Online assignment writing service.
 
How To Format An Apa Paper. How To Format A
How To Format An Apa Paper. How To Format AHow To Format An Apa Paper. How To Format A
How To Format An Apa Paper. How To Format A
 
Best College Essay Ever - UK Essay Writing Help.
Best College Essay Ever - UK Essay Writing Help.Best College Essay Ever - UK Essay Writing Help.
Best College Essay Ever - UK Essay Writing Help.
 
Home - Write Better Scripts Screenplay Writing, Writin
Home - Write Better Scripts Screenplay Writing, WritinHome - Write Better Scripts Screenplay Writing, Writin
Home - Write Better Scripts Screenplay Writing, Writin
 
Free Classification Essay Examples Topics, Outline
Free Classification Essay Examples Topics, OutlineFree Classification Essay Examples Topics, Outline
Free Classification Essay Examples Topics, Outline
 
Contoh Essay Argumentative Beinyu.Com. Online assignment writing service.
Contoh Essay Argumentative Beinyu.Com. Online assignment writing service.Contoh Essay Argumentative Beinyu.Com. Online assignment writing service.
Contoh Essay Argumentative Beinyu.Com. Online assignment writing service.
 
Persuasive Essay Introduction Examp. Online assignment writing service.
Persuasive Essay Introduction Examp. Online assignment writing service.Persuasive Essay Introduction Examp. Online assignment writing service.
Persuasive Essay Introduction Examp. Online assignment writing service.
 

Recently uploaded

INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfINU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfbu07226
 
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17Celine George
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Celine George
 
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45MysoreMuleSoftMeetup
 
Essential Safety precautions during monsoon season
Essential Safety precautions during monsoon seasonEssential Safety precautions during monsoon season
Essential Safety precautions during monsoon seasonMayur Khatri
 
An Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxAn Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxCeline George
 
Behavioral-sciences-dr-mowadat rana (1).pdf
Behavioral-sciences-dr-mowadat rana (1).pdfBehavioral-sciences-dr-mowadat rana (1).pdf
Behavioral-sciences-dr-mowadat rana (1).pdfaedhbteg
 
ppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesashishpaul799
 
IATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdffIATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdff17thcssbs2
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文中 央社
 
The Ultimate Guide to Social Media Marketing in 2024.pdf
The Ultimate Guide to Social Media Marketing in 2024.pdfThe Ultimate Guide to Social Media Marketing in 2024.pdf
The Ultimate Guide to Social Media Marketing in 2024.pdfdm4ashexcelr
 
How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17Celine George
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...Nguyen Thanh Tu Collection
 
[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online PresentationGDSCYCCE
 
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdf
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdfPost Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdf
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdfPragya - UEM Kolkata Quiz Club
 
Morse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxMorse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxjmorse8
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/siemaillard
 

Recently uploaded (20)

INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfINU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
 
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
 
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
Operations Management - Book1.p  - Dr. Abdulfatah A. SalemOperations Management - Book1.p  - Dr. Abdulfatah A. Salem
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
 
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
 
Word Stress rules esl .pptx
Word Stress rules esl               .pptxWord Stress rules esl               .pptx
Word Stress rules esl .pptx
 
Essential Safety precautions during monsoon season
Essential Safety precautions during monsoon seasonEssential Safety precautions during monsoon season
Essential Safety precautions during monsoon season
 
An Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxAn Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptx
 
Behavioral-sciences-dr-mowadat rana (1).pdf
Behavioral-sciences-dr-mowadat rana (1).pdfBehavioral-sciences-dr-mowadat rana (1).pdf
Behavioral-sciences-dr-mowadat rana (1).pdf
 
ppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyes
 
IATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdffIATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdff
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
 
The Ultimate Guide to Social Media Marketing in 2024.pdf
The Ultimate Guide to Social Media Marketing in 2024.pdfThe Ultimate Guide to Social Media Marketing in 2024.pdf
The Ultimate Guide to Social Media Marketing in 2024.pdf
 
How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
 
[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation
 
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdf
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdfPost Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdf
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdf
 
Morse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxMorse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptx
 
“O BEIJO” EM ARTE .
“O BEIJO” EM ARTE                       .“O BEIJO” EM ARTE                       .
“O BEIJO” EM ARTE .
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/
 

Assign 2.0 software for the analysis of Phred quality values for quality control of HLA sequencing-based typing.pdf

  • 1. Assign 2.0: software for the analysis of Phred quality values for quality control of HLA sequencing-based typing D.C. Sayer D.M. Goodridge F.T. Christiansen Authors’ affiliations: D.C. Sayer1,2,3 , D.M. Goodridge1,3 , F.T. Christiansen1,2 1 Department of Clinical Immunology and Biochemical Genetics, Royal Perth Hospital, Wellington Street, Perth 6000, Western Australia, Australia 2 School of Surgery and Pathology, Division of Pathology, University of Western Australia, Verdun Street, Nedlands, Western Australia, Australia 3 Conexio Genomics, PO Box 1670, Applecross, Western Australia, Australia Correspondence to: David C. Sayer Department of Clinical Immu- nology and Biochemical Genetics Royal Perth Hospital Wellington Street Perth 6000 Western Australia Australia Tel.: þ61 8 92242899 Fax: þ61 8 92242920 e-mail: david.sayer@ health.wa.gov.au Abstract: As improvements to DNA sequencing technology have resulted in increasing the throughput of DNA sequencing, the bottleneck for high throughput DNA sequencing-based typing (SBT) has shifted to sequence analysis, genotyping and quality control (QC). Consistent high-quality DNA sequence is required in order to reduce manual verification and editing of sequence electropherograms. However, identifying systematic changes in quality is difficult to achieve without the aid of sophisticated sequence analysis programs dedicated to this purpose. We describe a computer software program called Assign 2.0, which integrates sequence QC analysis and genotyping in order to facilitate high-throughput SBT. Assign 2.0 performs an analysis of Phred quality values in order to produce quality scores for a sample and a sequencing run. This enables sample-to-sample and run-to-run QC monitoring and provides a mechanism for the comparison of sequence quality between various genes, various reagents and various protocols with the aim of improving the overall quality of DNA sequence data. This, in turn, will result in reducing sequence analysis as a bottleneck for high-throughput SBT. Recent advances in DNA-sequencing technology, including the intro- duction of capillary DNA sequencers and improvements in dye- labelling technology (1, 2), have simplified DNA-sequencing protocols and have improved the ability to detect heterozygous sequences. As a result, an increasing number of clinical and research labora- tories are using DNA sequencing in order to study genetic diversity. This is particularly true for laboratories performing human leucocyte antigen (HLA) typing for the matching of donors and recipients for bone marrow transplantation. Several HLA genes are required for matching for transplantation and each is highly polymorphic (http:// www.ebi.ac.uk/imgt/hla). Current genotyping approaches are hier- archical and employ low typing resolution molecular techniques that are relatively inexpensive and suitable for high throughput, followed by DNA sequencing to provide high-resolution typing when required. DNA sequencing is regarded as the gold standard Key words: assign; quality control; resequencing; sequencing Received 14 January 2004, revised 6 April 2004, accepted for publication 19 April 2004 Copyright ß Blackwell Munksgaard 2004 doi: 10.1111/j.1399-0039.2004.00283.x Tissue Antigens 2004: 64: 556–565 Printed in Denmark. All rights reserved 556
  • 2. for HLA typing and therefore the ideal would be for DNA sequencing to be the sole method for HLA typing. State-of-the-art DNA sequen- cers provide the throughput requirements for most HLA-typing laboratories. However, data analysis, including manual verification of automated sequence base calling, allele assignment and quality control (QC), is a significant impediment to high-throughput sequen- cing-based typing (SBT). HLA SBT is a complex multi-step process, which requires the specific polymerase chain reaction (PCR) amplification of the region to be sequenced, sequencing up to four polymorphic exons in both directions, splicing the intron sequence and creating a single con- catenated consensus sequence for analysis. The consensus sequence is usually matched against a database of allele sequences in order to identify those alleles, which are best matched to the test sequence. Computer software programs, such as SeqScape1 v2.0 Software (SeqScape) from Applied Biosystems (Foster City, CA), perform base calling, align forward and reverse complementary sequences, splice intron sequence and produce a concatenated consensus sequence for allele assignment. However, base calling may be unreliable, espe- cially for heterozygous sequence, because an arbitrary threshold for heterozygosity is assigned based on the percentage of one peak within another. If the threshold is too low, the presence of any back- ground may result in false calling of heterozygotes. If the threshold is not set low enough, then some heterozygotes with low di- deoxynucleotide incorporation may be incorrectly called homozygotes. Therefore, manual verification by viewing the sequence electropherograms (EPG) is required. The requirement for manual sequence base-call verification and sequence editing is highest, when the quality of the sequence is poor. The ability to obtain and maintain high-quality sequence is critical to improving the throughput capabilities of SBT. High-quality sequence results in improved accuracy of base calling and removes the time required for manual verification. As sequencing is being increasingly used in a clinical setting, guidelines for sequence quality have been suggested by groups, such as the Clinical Molecular Genetics Society (http://cmgs.org/BPG/Guidelines/2002/data%20quality). However, these guidelines tend to be subjective. Unique and objective approaches to SBT QC are required. We suggest that various combinations of alleles in heterozygous sam- ples, each with its own unique sequence, are amplified in PCR and sequencing reactions with various efficiencies, largely as a result of the different melting temperatures and GC content. Thus, every sample should have its own QC. Furthermore, as the sequence for every sample is usually derived from concatenated bi-directionally sequenced units (BSU) or exons as is the case for most HLA class-I SBT assays (3), the basic unit of QC should be the BSU. We have developed a computer software program that enables such QC to be performed. We have integrated this with our allele assignment software in order to provide a comprehensive sequence-analysis software program, called Assign 2.0. Assign 2.0 is suitable for high-- throughput HLA SBT or any resequencing application. The Assign 2.0 QC tools enable the analysis of several indicators of sequence quality. However, the primary function of Assign 2.0 is the analysis of Phred quality values (PQV) (4, 5). Phred is a software program, which provides a probability that a base call within a sequence is correct by using the algorithm QV ¼ 10*log10 (PE), where PE is the probability that the base call is an error. Thus, a PQV of 40 indicates that there is a one in 10,000 chance that the base call is incorrect. However, this algorithm was developed for cloned template and the same interpretations of base call accuracy may not apply to heterozygous sequence from PCR products. Therefore, we have investigated whether PQV can have a broader utility for the assessment of SBT QC and provide a quality score for a sequenced sample and a sequencing run or gel. We demonstrate a unique and informative objective assessment of sequence quality following the analysis of PQV that enables the setting of target specifications of quality. As a result, we are able to monitor samples and sequencing runs for deviations from target specifications (accuracy) and exces- sive variability around target specifications (precision), thus meeting the criteria for effective QC (6). Methods Sequencing reactions were performed by means of Applied Biosys- tems Big Dye1 Terminator v3.0 sequencing chemistry. All sequen- cing was performed on an Applied Biosystems ABI PRISM1 3730 Genetic Analyzer (AB 3730). The AB 3730 is a 48 capillary auto- mated DNA sequencer. HLA-A, HLA-B and HLA-C SBT protocols were developed in house. Each locus was typed by means of DNA sequencing following locus-specific amplification and bi-directional sequencing of exons 2 and 3. HLA-A and HLA-C were amplified with a single set of amplification primers and HLA-B was amplified in two PCRs in order to amplify the HLA-B alleles in two groups character- ized by the alternate ‘TA’ and ‘CG’ dimorphism located in intron 1 (7). The locus names HLA-BTA and HLA-BCG have been used in order to indicate the alternative PCR amplifications. The DNA sequences were analysed in a two-step process. First, the sequences were analysed with the help of ABI PRISM SeqScape1 Software (SeqScape) in order to splice intron sequence, align forward and reverse sequence strands and assign consensus sequence quality values. The DNA sequence files in .xml format were then imported into Assign 2.0 for allele assignment and QC analysis. The data included in the Sayer et al : Quality control of SBT Tissue Antigens 2004: 64: 556–565 557
  • 3. .xml files contain the consensus sequence base calls and the consensus sequence PQV (CSPQV). The .xml files are named according to a strict convention, which includes the sample name, the locus being sequenced and the sequencing primer. In addition, the .xml file storage system is organized by means of locus and sequencing date in order to facilitate data retrieval and enable chronological analysis of sequence QC data. Assign 2.0 QC tools perform inde- pendent analysis of CSPQV of automated homozygous (CSPQV-hom) and heterozygous (CSPQV-het) base calls for a single position, a range of positions (e.g., exon 2 or exon 3) or a selected date range for a selected locus. We present an analysis of data from HLA-A SBT runs from 12 February 2003 to 7 July 2003. This included 1086 samples sequence on 76 different sequencing runs. The Assign 2.0 allele assignment component of the software matches the consensus test sequence against an HLA allele sequence library generated from the IMGT/HLA database (http://www.ebi.ac.uk/imgt/hla). The matching algorithm has been developed in order to enable high- speed matching on multiple samples to facilitate high-throughput SBT. Results Assign 2.0 QC tools: QC analysis of PQV As PQV have previously been reported to be an indicator of base call accuracy (5) and therefore sequence quality, we examined the possi- bility that analysis of CSPQV could be extrapolated in order to provide useful QC data for sample and/or a sequencing run. The hypothesis is that the mean and standard deviation (SD) of CSPQV for all nucleotide positions will reflect the sequence quality of the sample sequenced. Furthermore, the mean and SD CSPQV of all nucleotides for all samples on a sequencing run will reflect the quality of the sequencing run. However, in order to determine the feasibility of this approach, we needed to determine the degree of variability of base call CSPQV at the same site between various sequences, which appeared visually to be of good quality, between various samples. It is important to demonstrate that CSPQV only varies because of changes in sequence quality. For this purpose, we analysed the CSPQV at 100 conserved (and therefore homozygous) positions within exon 2 of HLA-A SBT from 20 samples within the same sequencing run. The results have been presented in Fig. 1. The mean CSPQV between positions may differ slightly, but more importantly the CSPQV at each position are reproducible between various samples. All but three positions have SD of less than 5 CSPQV units and a coefficient of variation (CV) of 5% with a mean CV value for all positions of 2.7%. While CSPQV are highly reproducible between samples, the CSPQV of homozygous and heterozygous base calls are different. This is demonstrated for two polymorphic positions (positions 165 and 170) within exon 2 of HLA-A in Fig. 2 Figure 2(A) shows the frequency distribution of CSPQV-hom and CSPQV-het for position 165 of HLA-A. HLA-A alleles can be either A or G at this position. The grey bars represent the frequency distribution of CSPQV-het base calls (where both A and G are sequenced) and the black bars represent the frequency distribution of homozygous base calls (this includes both A and G base calls). Similarly, Fig. 2B is a frequency histogram of the CSPQV at position 170. HLA-A alleles are also either A or G at this position. For both positions, the distribution of the CSPQV for heterozygous and homozygous positions is normally distributed, but the CSPQV-het values are lower than CSPQV-hom values. At position 165, the mean CSPQV-het is 27.10 and SD is 1.14 and the mean CSPQV-hom is 40.84 and SD is 1.75. For position 170, the mean CSPQV-het is 25.48 and SD is 1.14 and the CSPQV-hom is 40.86 and SD is 1.53. As a result of the findings described above, we suggest that: 1. The mean and/or SD values of CSPQV-hom of a BSU (i.e., the various exons for HLA class-I) will provide good indicators of sequence quality of the BSU. Some samples may not have 0 5 10 15 20 25 30 35 40 45 50 Mean CSPQV 0 2 4 6 8 10 12 14 16 18 20 SD CSPQV Mean SD Conserved sequence nucleotide positions within exon 2 HLA-A Fig. 1. The mean and standard deviation of consensus sequence PQV (CSPQV) at 100 conserved (therefore, homozygous) positions of exon 2 of HLA-A are shown from 20 consecutive unrelated samples. The mean CSPQV (the plot in the top half of the graph) varies between positions within the same sequence, but the CSPQV at one position is reproducible between samples as indicated by the low-standard deviations. This indicates that a mean value of all CSPQV-hom for a BSU should provide an indication of sequence quality of the BSU. BSU, bi-directionally sequenced units; CSPQV-hom, CSPQV of automated homozygous base calls; PQV, Phred quality values. Sayer et al : Quality control of SBT 558 Tissue Antigens 2004: 64: 556–565
  • 4. heterozygous positions and so the use of CSPQV-het should not be used as an indicator of sequence quality of a BSU. 2. The mean and/or SD values of all CSPQV-hom for all samples on a sequencing run will provide good indicators of sequence quality of the sequencing run. 3. Sequence quality ‘target’ (or ‘expected’) values can be calculated from multiple data points and the mean and SD values of CSPQV for individual BSU and sequencing runs can be compared to expected values according to Shewhart rules for analysing con- trols (6). In order to test these hypotheses, we performed a retrospective analysis of SBT data for HLA obtained between 12 February 2003 and 7 July 2003. Within-run QC analysis The graphs shown in Figs. 3 and 5 are examples of CSPQV analysis that can be performed by the Assign 2.0 QC tools in just a few seconds. Analyses of CSPQV-hom data for exons 2 and 3, respec- tively, for each of 24 samples of the HLA-A SBT run 10–05–03 have been presented in Fig. 3(A, B). In both graphs, the mean and SD data are mirror images such that a sample with a high mean CSPQV usually has a low SD. Grey bars with a horizontal line through the middle have been used in order to indicate the mean 2 SD of CSPQV data calculated from all runs between 12 February 2003 and 7 July 2003. The exon 2 graph (Fig. 3A) reveals considerable variability between samples, compared to the graph for exon 3 (Fig. 3B). This 40 (A) 35 30 mean = 27.10 Heterozygous Sequence Heterozygous sequence Homozygous sequence Homozygous Sequence SD = 0.90 mean = 25.48 SD = 1.14 mean = 40.86 SD = 1.53 mean = 40.90 SD = 1.75 25 HLA-A exon 2 position 165 HLA-A exon 2 position 165 20 Frequency (%) Frequency (%) 15 10 5 0 40 35 30 25 20 15 10 5 0 1 4 (B) 7 10 13 16 19 PQV scores 22 25 28 31 34 37 40 43 46 49 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 Fig. 2. The frequency histograms of consensus sequence PQV (CSPQV) at homozygous (black bars) and heterozygous (grey bars) base calls have been shown for two polymorphic positions (positions 165, Fig. 2A and 170, Fig. 2B) within exon 2 of HLA-A for all samples (n ¼ 1086 samples) sequenced between 12 February 2003 and 7 July 2003. The distribution of the CSPQV is bi- modal with CSPQV of heterozygous base calls being less than the CSPQV of homozygous base calls. These results indicate that homozygous and heterozygous CSPQV should be considered independently if CSPQV is used as a measure of sequence quality for a sample or a sequencing run as the number of heterozygous positions vary between samples. Sayer et al : Quality control of SBT Tissue Antigens 2004: 64: 556–565 559
  • 5. indicates variability in sequence quality between the exon 2 sequences of the samples and consistent high-quality sequence for exon 3 for all samples. Analysis of the sequence EPG for the forward and reverse sequencing primers for exon 2 revealed that the sequences from the forward sequencing primer contained high back- ground for some samples, whereas the reverse sequencing primers resulted in consistent good quality sequence (data not shown). The CSPQV is deduced from PQV from both strands and poor quality sequence on one strand is sufficient to reduce the CSPQV. The EPG from the forward sequencing primer for some of the samples with and without background have been shown in Fig. 4. A comparison of the EPG and CSPQV-hom for these samples reveals that when the background is high, i.e., the quality of sequence is poor (e.g., samples 13 and 19), the mean CSPQV-hom is low (35.01 and 33.58, respec- tively) and SD is high (7.61 and 8.42, respectively). In samples where there is no background, i.e., good quality sequence (e.g., samples 02, 21 and 06), the mean CSPQV is high (41.41, 41.30 and 41.03, respec- tively) and the SD is low (2.2, 2.1 and 1.8, respectively). These data demonstrate that mean and SD of CSPQV-hom are sensitive and quantitative measurements of sequence quality. With the exception of sample 3, the QC data for exon 3 indicate that all sequence is of similar quality. Furthermore, all CSPQV-hom means are greater than the expected mean CSPQV (horizontal line through the middle of the grey bar) and all but one of the sample SDs are below the expected SD. This indicates that the quality of sequence obtained for exon 3 for all samples of this run is of greater quality than is expected. For sample 3, only two of the 276 bases of exon 3 were included by the SeqScape algorithm for analysis for one of the sequencing primers. As a result, much of the sequence is single-stranded. The high PQV is an anomaly of the SeqScape/Phred algorithm where the CSPQV may be higher for single-strand sequence than for those with bi-directional coverage. As a result, a SD was not calculated for this sample. 48 (A) Exon 2 Run 05_10_03. Position: exon 2 (B) Exon 3 Run 05_10_03. Position: exon 3 44 40 36 32 28 PQV-hom mean PQV-hom mean PQV-hom SD PQV-hom SD 24 20 16 12 8 4 0 48 44 40 36 32 28 24 20 16 12 8 4 0 01 02 03 04 05 06 07 08 09 10 11 12 Sample Sample 13 14 Mean (this run) = 39.82 SD (this run) = 1.96 Mean (this run) = 4.00 SD (this run) = 1.99 Mean (this run) = 40.6 SD (this run) = 1.29 Mean (this run) = 3.41 SD (this run) = 0.81 15 16 17 18 19 20 21 22 23 24 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 20 18 16 14 12 10 8 6 4 2 0 20 18 16 14 12 10 8 6 4 2 0 Fig. 3. The mean and standard deviation (SD) of consensus sequence PQV (CSPQV) for homozygous base calls within exon 2 (Fig. 3A) and exon 3 (Fig.3B) have been shown for each of 24 samples within an HLA-A SBT run (run ID¼ 05–10–03). The mean values of each sample are plotted on the top part of each graph and are associated with the Y-axis on the left hand side of the graph and the SD values are plotted on the lower half of each graph and the values are on the Y-axis on the right hand side of the graph. The grey bars represent the mean 2 SD limits of the mean and SD values of all samples for all runs (n ¼ 76 runs) between 12 February 2003 and 7 July 2003. The mean and SD plots are mirror images, such that when the mean is high, the SD is low and vice versa. The plots demonstrate why individual BSU, in this case each exon, are analysed separately. The exon 2 data is variable, indicating sequence of variable quality, with the mean and SD CSPQV for two samples (e.g., samples 13 and 19) outside the expected limits. By contrast, the exon 3 data are much more consistent with all values being on or greater than the expected mean of the mean CSPQV and all but one sample being below the mean of the expected SD CSPQV. These data indicate a potential problem of varying degree effecting exon 2 sequences only. SBT, sequencing-based typing; Sayer et al : Quality control of SBT 560 Tissue Antigens 2004: 64: 556–565
  • 6. Between-run QC analysis In contrast to Fig. 3, where sample-to-sample QC analysis within a sequencing run is demonstrated, Fig. 5 demonstrates run-to-run (between-run) QC analysis. Between-run analysis is performed by plotting the mean and SD of CSPQV calculated from all positions for all samples on a sequencing run. This has been demonstrated in Fig. 5, where the CSPQV data for exons 2 and 3 are plotted for each run between 12 February 2003 and 7 July 2003 (76 runs, 1086 samples). The grey bars represent the mean 2 SD of data from all runs. The data from the sequencing run of 10–05–03 (as demon- strated in Fig. 3) are indicated by the arrows and does not appear to be significantly different from data from other runs. However, the exon 2 mean and SD data from the 19 runs after the run of 10–05–03 indicate that there has been a change in sequence quality. For nine of the last 19 runs, the mean CSPQV-hom is below the expected mean and four of the nine are on the lower 2 SD limit. By contrast, only one result of the previous 57 runs has been on the lower 2 SD limit. Similarly for the SD data for CSPQV-hom, 14 of the last 19 runs have SD greater than the expected SD value. This indicates a change of sequence quality as a result of the variable sequence obtained with the exon 2 forward sequencing primer shown in Fig. 4. It is of interest to note that similar changes in sequence quality are not indicated by the CSPQV-het data. It is not clear why this is the case, but it may be because of the smaller number of heterozygous sequence positions, some which may be at positions where the background does not exist. It is of interest to note that, although unlikely to be statistically significant, the mean CSPQV-hom for exon 2 for all runs is higher than the mean CSPQV-hom for exon 3 for all runs (exon 2 ¼ 40.06, exon 3 ¼ 38.93). In addition, the SD is lower (mean SD for exon 2 is 3.99 and for exon 3 the mean SD is 5.25). This indicates that the sequence quality for exon 2 is consistently better than the sequence quality for exon 3. It is possible that this difference is because of the inherent sequence differences between exon 2 and exon 3. However, this difference may suggest that the conditions are not optimal for exon 3. Table 1 lists the mean and SD of CSPQV-hom for the BSU (i.e., exons 2 and 3) of HLA-A, HLA-B (both the HLA-BTA and HLA- BCG HLA-B protocols) and HLA-C sequenced during the same period. The exon 2 BSU sequence of HLA-A has the highest mean PQV-hom and lowest SD, compared to all the other BSU for the other loci. This indicates that the sequence quality obtained for the HLA-A exon 2 BSU is better than the quality of sequence for the exon 3 BSU of HLA-A and better than the sequence for all other BSU for the other loci. The challenge now is to understand why this is the case and optimize the sequencing conditions for the other loci to improve the sequence quality at least to the level of the HLA-A exon 2 BSU. Allele assignment An example of an HLA allele assignment result page has been shown in Fig. 6. A unique feature of Assign 2.0 is that the result page contains important QC information in addition to the HLA allele assignment. The allele assignment is displayed as a list of allele combinations within the library that are best matched with test sequence. Mismatched positions include the sequence base call of the test sample at this position and the expected base call for the allele combination. Additional information, including the CSPQV of the test sequence at the mismatched positions and whether there was Sample 02 PQV 41.41 41.30 41.03 39.56 38.56 35.01 33.58 2.2 2.1 1.8 5.3 6.0 7.6 8.4 Mean SD Sample 21 Sample06 Sample 01 Sample 04 Sample 13 Sample 19 Fig. 4. The electropherogram (EPG) from a region of exon 2 for selected samples from run 10–05–03 has been shown. The figure also includes the mean and SD CSPQV-hom for the samples of the EPG. When the sequence quality is good (no background noise), the CSPQV means are high and SDs are low. As the background noise increases, the mean CSPQV-hom decreases and the SD increases. CSPQV is an indicator of sequence quality. The background noise appears as non-specific peaks usually smaller than the specific sequence peak. CSPQV, consensus sequence PQV; CSPQV-hom, CSPQV of automated homozygous base calls; PQV, Phred quality values. Sayer et al : Quality control of SBT Tissue Antigens 2004: 64: 556–565 561
  • 7. a discrepancy between forward and reverse strand base calls (FRD) or whether the mismatched position was sequenced in a single direction only (SS), is also shown. Base calls that have arisen from sequencing one strand only are also indicated in the result table by ‘SS’ in the ‘Quality Values’ row (not present in the example in Fig. 6). The QC information of the sample includes the number of bases sequenced (e.g., n ¼ 546 of the 546 bases which constitute exon 2 þ exon 3 for HLA-A, the homozygous and heterozygous base call CSPQV (CSPQV-hom and CSPQV-het) statistics (mean CSPQV- hom ¼ 39.9 and SD ¼ 4.3, mean CSPQV-het ¼ 25.8 and SD ¼ 2.1) and the SS (0% for homozygous base calls, 0% for heterozygous base calls) and FRD data (2% of homozygous and 0% of heterozy- gous consensus base calls had FRD). In the example shown in Fig. 6, there are two mismatches between the test sequence and the best-matched alleles. Both mismatches (position 282 and 448) are at positions, where there was an FRD. An FRD indicates a base call error when sequencing in one direction and high potential for an incorrect consensus base call. Such a position is a priority for manual review. In addition, the base calls at these positions are mismatched against all of the alleles in the result table, indicating that the test sequence contains unique poly- morphisms or they are incorrect base calls. By contrast, the base call 50 20 18 16 14 12 10 8 CSPQV SD 6 4 2 0 20 18 16 14 12 10 8 CSPQV-het SD 6 4 2 0 20 18 16 14 12 10 8 6 4 2 0 Homozygous base calls by SBT run-HLA-A exon 2 Homozygous base calls by SBT run-HLA-A exon 3 Heterozygous base calls by SBT run-HLA-A exon 2 Heterozygous base calls by SBT run-HLA-A exon 3 45 40 Mean (all runs) = 40.06 Mean (all runs) = 3.99 SD (all runs) = 1.05 Mean (all runs) = 23.97 SD (all runs) = 2.31 Mean (all runs) = 3.94 SD (all runs) = 1.5 Mean (all runs) = 3.94 SD (all runs) = 1.40 Mean (all runs) = 38.93 SD (all runs) = 1.58 Mean (all runs) = 5.25 SD (all runs) = 1.08 SD (all runs) = 1.16 35 30 25 CSPQV mean CSPQV-het Mean CSPQV hom Mean CSPQV-het mean CSPQV hom SD 20 18 16 14 12 10 8 6 4 2 0 CSPQV-het SD 20 15 10 5 0 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0 50 45 40 35 30 25 20 15 10 5 0 Sequence run Sequence run Sequence run Sequence run Fig. 5. Between-run monitoring of sequence quality has been shown. The mean and SD CSPQV-hom and CSPQV-het for all samples of each run (n ¼ 76 runs) for the period 12 February 2003 and 7 July 2003 have been plotted for exons 2 and 3. The grey bars represent the mean 2 SD limits for all values on each graph. As for Fig. 3(A,B), the mean values have been shown in the top half of the graph and the SD values have been shown in the bottom half of each graph. The arrows show the values for the run 5_10_3 (from Fig. 3A,B). Despite the poor quality sequence for the forward sequencing primer in exon 2 for some samples in runs that follow 5_10_03, run mean does not fall out of the 2 SD limits (see the top left hand graph). However, it is of interest to note that of the 19 runs following the run of 05–10–03, nine of the runs have a mean value below the expected mean and four of the nine runs have values on the lower limit. By contrast, only one run in the previous 57 runs has been on the lower limit. This indicates a shift (decrease) in the mean CSPQV for this assay, as a result of the suboptimal sequence obtained from the forward sequencing primer. The situation is similar for the SD values. Fourteen of the last 19 SD value runs are greater than the mean SD value for all runs, indicating a shift in the mean SD for this assay. By contrast, the exon 3 data indicate that the quality of sequence has increased. Sixteen of the last 19 runs are above the expected mean CSPQV and 12 of the last 19 are below the expected SD. This indicates an overall improvement of SBT of exon 3 of HLA-A. However, a specific problem exists with the forward sequencing primer of exon 2. The changes in sequence quality demonstrated in the CSPQV-hom data are not reflected in the CSPQV-het data. CSPQV, consensus sequence PQV; CSPQV-het, CSPQV of heterozygous base cells; CSPQV-hom, CSPQV of automated homozygous base cells; PQV, Phred quality values. Sayer et al : Quality control of SBT 562 Tissue Antigens 2004: 64: 556–565
  • 8. at position 258 is ‘C’ and the CSPQV at this position is 42. This indicates that ‘C’ has been called on both strands and a CSPQV of 42 indicates sequence of high quality and very low probability of an incorrect base call. Confirmation of base calls at positions within the mismatch table is performed by viewing the EPG in SeqScape. Any edits to the sequence are then performed directly in Assign 2.0 and the result table is updated without the need for re-analysing the sequence against the allele sequence library (i.e., in real time). Following con- firmation of all base calls, Assign 2.0 will produce a report listing the alleles that are best matched to the test sequence. The operator can then click to the next sample for analysis and the result table is immediately updated with data from the next sample. Discussion We have described a sequence data analysis computer software program called Assign 2.0 that combines allele assignment with a comprehensive and effective quality control system. Thousands of sequences can be analysed in seconds making Assign 2.0 suitable for high throughput sequencing-based typing or any resequencing project. We have used the sequence-based typing of the highly polymorphic HLA-A locus to demonstrate the utility of Assign 2.0. The unique feature of Assign 2.0 is the ability to analyse PQV in order to provide a comprehensive QC analysis of SBT data. We have demonstrated that the mean and SD of all CSPQV-hom within a BSU are sensitive indicators of sequence quality for that sample. Similarly, the CSPQV-hom data for all BSU for all samples within a sequencing run provide QC data for that sequencing run. As a result, sample- to-sample and run-to-run QC monitoring can be performed. Furthermore, the normal distribution of mean PQV data indicates that Shewhart control graphs can be used and changes in sequence quality can be accurately monitored. These processes add very little time to the SBT process and yet provide valuable QC data. A retrospective analysis of all data from February 2003 to July 2003 generated in our laboratory revealed changes in sequence quality associated with an intermittent increase in background with a single sequencing primer in our HLA-A SBT assay. This resulted a greater than expected number of runs falling below the expected mean CSPQV-hom. In addition, a comparison of CSPQV-hom data between our HLA-A, HLA-B and HLA-C SBT assays revealed a difference in sequence quality between the assays with HLA-A exon 2 providing the best quality data. We are in the process of using Assign 2.0 in order to re-optimize the HLA-B, HLA-C and HLA-A exon 3 assays so optimal quality sequence data are obtained. It is of interest to note that Phred was not designed to provide quality values for heterozygous sequence (4, 5). However, the data shown in Fig.2 demonstrate that CSPQV-het are normally distributed but with a much lower mean than CSPQV-hom. Therefore, in theory, CSPQV-het can also be used for monitoring sequence quality. In most cases, the mean and SD values of CSPQV-hom were mirror images, indicating that either of these values, or the coefficient of variation (CV (%) ¼ SD*100/mean) can be used as an indicator of sequence quality. The data presented in this study did not indicate that analysis of CSPQV-het provided as sensitive an indicatorofqualityasCSPQV-hom.Thisislikelytobebecauseofvariable and low numbers of heterozygous positions, compared to homozygous positions within a sequence. The analysis of CSPQV in the ways we have described provides the ability to assess the effect of reagents and SBT protocols on sequence data quality. By improving the data obtained from SBT protocols, the data analysis component of SBT protocols will be significantly reduced and SBT will become a high-throughput protocol for measuring diver- sity. In addition, the Assign 2.0 QC tools can be used for between- laboratory comparison of data and provide a means of standardizing SBT assays through workshops and QA exchange programs. The applications of DNA sequencing are moving from the ‘sequence factories’, where cloned DNA from a single chromosome is sequenced, to studies of genetic diversity that includes the sequencing of PCR products of highly polymorphic genes from pairs of chromosomes. This includes research studies of evolution and population migration (8) or for clinical diagnostic purposes (9–11). In addition, DNA sequen- cing is being used by some laboratories for low to medium throughput SNP analysis and de novo mutation detection (Ivo Gut, CNG, Paris, France, personal communication). Appropriate QC is critical. Obtain- ing, maintaining and monitoring sequence quality is required for all of these applications. This manuscript describes a means by which appropriate sequencing QC can be performed. Assign v3.0 has been developed and does not require a third party software, such as SeqScape, thus further improving the efficiency of SBT. Mean and standard deviation CSPQV for homozygous base calls (CSPQV-hom) of exon 2 and exon 3 of various HLA class-I SBT assays CSPQV-hom Exon 2 Exon 3 Locus Mean SD Mean SD HLA-A 40.06 1.05 38.93 1.58 HLA-BCG 38.70 1.95 39.07 2.04 HLA-BTA 39.07 2.04 39.22 1.73 HLA-C 39.33 2.55 38.43 2.81 HLA-A exon 2 results in sequence quality with highest mean CSPQV and lowest SD, which may reflect that the SBT conditions are better optimized for this BSU than the BSU of other loci. BSU, bi-directionally sequenced units; CSPQV, consensus sequence PQV; PQV, Phred quality values; SBT, sequencing-based typing. Table 1 Sayer et al : Quality control of SBT Tissue Antigens 2004: 64: 556–565 563
  • 9. A I H B C E F G D A) Browse window for locating the .xml files for analysis B) Locus being typed. If the locus is indicated in the sample name the selected locus in the ‘‘Locus’’ pane is over ridden C) Indicates the maximum tolerance at which results are listed. Assign will list the best matched alleles up to 31 mismatches within the library. D) The sample quality control information for the homozygous and heterozygous base calls. Included is the mean and standard deviation Phred quality value information. The amount of sequence which was from a single strand (SS) and the percentage of base calls which were made from forward/reverse strand base call discrepancies E) Contains the ID of the sample for which the report is shown. The number of bases sequenced in also shown F) This is the results pane. It lists the alleles which are best matched with the test sequence, the number of sequence differences between the alleles and the test sequence and the sequence base call information at positions that are discrepant between the test sequence and the best matched alleles. This includes the observed base calls of the test sample, the Phred quality value which is colour coded to represent base calls of high quality which do not require review (green). Base calls which require review but which are probably correct (yellow) and base calls which definitely require review because they are either at a position with single strand coverage, there is a forward/reverse strand base call discrepancy or the sequence quality is very poor (red). G) This is the editor window and allows confirmation of the base calls. Once confirmed the final result can be determined and a report is generated H) This is the list of samples that have been analysed. Selecting a sample ID results in immediate viewing of the SBT details as described above. Above the sample IDs is the date of the release of the IMGT/HLA database. I) This is the control panel which includes access to the QC tools Fig. 6. A typical allele assignment result page has been shown. A detailed description of the result page is present in the key. The result page contains the list of alleles, which are best matched to the test sequence, ranked in order of best match. The results have been presented, so that mismatched sequence positions have been listed across the result page in sequence number order and include the consensus sequence of the test sample, the Phred quality value of the consensus sequence (CSPQV) base call, if there was a forward and reverse strand base call discrepancy (FRD) and if the position was sequenced in both directions (SS if sequence was from a single strand only) and the corresponding sequence of the alleles within the table. Moreover, included on the result page are the total number of bases sequenced, the mean and standard deviation of CSPQV of the homozygous sequence base calls, CSPQV of the heterozygous base calls, the number of positions (expressed as a percentage of the homozygous and heterozygous base calls), at which there were forward and reverse strand sequence base call discrepancies (FRD), and the total amount of SS sequence. Sayer et al : Quality control of SBT 564 Tissue Antigens 2004: 54: 556–565
  • 10. References 1. Rosenblum BB, Lee LG, Spurgeon SL et al. New dye-labeled terminators for improved DNA sequencing patterns. Nucleic Acids Res 1997: 25: 4500–4. 2. Lee LG, Spurgeon SL, Heiner CR et al. New energy transfer dyes for DNA sequencing. Nucleic Acids Res 1997: 25: 2816–22. 3. Sayer DC, Whidborne R, De Santis D, Rozemuller E, Christiansen FT, Tilanus M. A multi centre evaluation of single-tube amplification protocols for SBT of HLA-DRB1 and HLA-DRB3, 4, 5 are reproducible and robust. HLA 2002. 2003. Tissue Antigens 2004: 63(5): 412–23. 4. Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 1998: 8: 186–94. 5. Ewing B, Hillier L, Wendl MC, Green P. Base- calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 1998: 8: 175–85. 6. Shewhart WA. Economic Control of Quality of Manufactured Product, 1st edn. New York: Van Nostrand, 1931. 7. Cereb N, Yang SY. Dimorphic primers derived from intron 1 for use in the molecular typing of HLA-B alleles. Tissue Antigens 1997: 50: 74–6. 8. Malhi RS, Mortensen HM, Eshleman JA et al. Native American mtDNA prehistory in the American Southwest. Am J Phys Anthropol 2003: 120: 108–24. 9. Sayer DC, Land S, Gizzarelli L et al. A quality assessment program (QAP) for genotypic antiretroviral testing (GART) results in an improvement in the detection of drug resistance mutations. J Clin Microbiol 2003: 41: 227–36. 10. Sayer D, Whidborne R, Brestovac B, Trimboli F, Witt C, Christiansen F. HLA-DRB1 DNA sequencing based typing: an approach suitable for high throughput typing including unrelated bone marrow registry donors. Tissue Antigens 2001: 57: 46–54. 11. Pryce TM, Palladino S, Kay D, Coombs GW. Rapid identification of fungi by sequencing the ITS1 and ITS2 regions using an automated capillar electrophoresis system. Med Mycol 2003: 41: 369–81. Sayer et al : Quality control of SBT Tissue Antigens 2004: 64: 556–565 565