2. Error Processes Outlined
• Trinucleotide Quality Score miscalibrated GT*,*AC
are more error prone
• F + R strand non-ref reads are informative
• Coverage predicts high error rates
3. Measuring Error Processes
• IDEA: Simple knowledge of the second
best base should be used in SNP
calling - Calls for extracting four base
intensities or four base probabilities.
4. Reference allele intensity in 4-
base intensity report
• Taking heterozygous
dbSNP sites from two
lanes with SNP calls
~31-33% of reads
reference allele has 2nd
highest intensity value
• ~80% of reads reference
allele has 2nd highest
intensity value in False
Positives (validation
done).
5. Focusing on Examples
• Q: How does the
distribution of non-
ref intensity versus
ref intensity look like
when non-ref allele
is called?
• A: Reference noise
intensity is
informative.
6. Zooming in on Examples
Reads at true positive versus false positive are plotted to distinguish two distinct
species of reads.
Residual reference intensity in false positive trends overall towards reference call
reads.
7. Zooming in on Examples 2
Reads at true positive versus false positive are plotted to distinguish two distinct
species of reads.
Residual reference intensity in false positive trends overall towards reference call
reads. Strand is informative.