6. Current limitations
• Linear: gap cost always the same
• Affine: separate penalties for opening and extending a gap
• Using one gap cost is considered state of the art
• Problem with PacBio/ONT: two different gap models required
– Sequencing error: large high number of 1 bp indels
– Real indels: extending a gap more likely than opening a new one
– Sequencing error + repeats cause one gap cost to fail even for real
indels
AAAGAATTCA
A-A-A-T-CA
AAAGAATTCA
AAA----TCA
vs.
7. Convex gap costs
• Costs for a gap follow a convex function of gap length
• Close to linear gap costs for 1 - 2 bp gaps
• As gap gets longer penalty for "splitting" gaps increases
• Problem optimal approach: O(nm2 + n2m)
• Heuristic implementation O(nm)
16. Outlook
• Finish new version of Sniffles
– Assessment of noisy alignments
• NGM-LR:
– MQ calculation
– Runtime
• Visual inspection and comparison of SV calls
Editor's Notes
Delete repeating regions + low score.
LIS: Longest increasing subsequence.
cLIS: Longes increasing subsequence + with respect on the location along the read. (let it run multiple times. Max rounds 10).
Bei low quality region -> reconcile.