Semi-local string comparison:              Algorithmic techniques and applications                                    Alex...
1   Introduction                                7    Sparse string comparison2   Matrix distance multiplication           ...
1   Introduction                                7    Sparse string comparison2   Matrix distance multiplication           ...
IntroductionString matching: finding an exact pattern in a stringString comparison: finding similar patterns in two stringsA...
IntroductionString matching: finding an exact pattern in a stringString comparison: finding similar patterns in two stringsA...
IntroductionTerminology and notationx− = x −       1               2        x+ = x +     1                                ...
IntroductionTerminology and notationGiven matrix D, its distribution matrix is made up of          -dominance sums:   Alex...
IntroductionTerminology and notationGiven matrix D, its distribution matrix is made up of          -dominance sums:Given m...
IntroductionTerminology and notationGiven matrix D, its distribution matrix is made up of          -dominance sums:Given m...
IntroductionTerminology and notationGiven matrix D, its distribution matrix is made up of          -dominance sums:Given m...
IntroductionTerminology and notationMatrix E is Monge, if E         is nonnegativeIntuition: boundary-to-boundary distance...
IntroductionTerminology and notationMatrix E is Monge, if E               is nonnegativeIntuition: boundary-to-boundary di...
IntroductionImplicit unit-Monge matricesEfficient P Σ queries: range tree on nonzeros of P                             [Bent...
IntroductionImplicit unit-Monge matricesEfficient P Σ queries: (contd.)Every node of the range tree represents a canonical r...
IntroductionImplicit unit-Monge matricesEfficient P Σ queries: (contd.)Every node of the range tree represents a canonical r...
1   Introduction                                7    Sparse string comparison2   Matrix distance multiplication           ...
Matrix distance multiplicationSeaweed braidsDistance algebra (a.k.a (min, +) or tropical algebra):       addition ⊕ given ...
Matrix distance multiplicationSeaweed braidsRecall that simple unit-Monge matrices are represented implicitly bypermutatio...
Matrix distance multiplicationSeaweed braidsPA           PB = PC can be seen as combing of seaweed braids     •           ...
Matrix distance multiplicationSeaweed braidsPA           PB = PC can be seen as combing of seaweed braids     •           ...
Matrix distance multiplicationSeaweed braidsPA           PB = PC can be seen as combing of seaweed braids     •           ...
Matrix distance multiplicationSeaweed braidsPA           PB = PC can be seen as combing of seaweed braids     •           ...
Matrix distance multiplicationSeaweed braidsThe seaweed monoid Tn :      n! elements (permutations of size n)      n − 1 g...
Matrix distance multiplicationSeaweed braidsIdentity: 1        x =x              • · · ·  · • · ·  · · • · =1=     ...
Matrix distance multiplicationSeaweed braidsRelated structures:      positive braids: far comm; braid relations      braid...
Matrix distance multiplicationSeaweed braidsComputation in the seaweed monoid: a confluent rewriting system can beobtained ...
Matrix distance multiplicationSeaweed braidsComputation in the seaweed monoid: a confluent rewriting system can beobtained ...
Matrix distance multiplicationSeaweed braidsComputation in the seaweed monoid: a confluent rewriting system can beobtained ...
Matrix distance multiplicationSeaweed braidsComputation in the seaweed monoid: a confluent rewriting system can beobtained ...
Matrix distance multiplicationSeaweed matrix multiplicationThe implicit unit-Monge matrix              -multiplication pro...
Matrix distance multiplicationSeaweed matrix multiplicationThe implicit unit-Monge matrix                      -multiplica...
Matrix distance multiplicationSeaweed matrix multiplication                                             PB                ...
Matrix distance multiplicationSeaweed matrix multiplication                                    PB,lo , PB,hi              ...
Matrix distance multiplicationSeaweed matrix multiplication                                    PB,lo , PB,hi              ...
Matrix distance multiplicationSeaweed matrix multiplication                                    PB,lo , PB,hi              ...
Matrix distance multiplicationSeaweed matrix multiplication                                    PB,lo , PB,hi              ...
Matrix distance multiplicationSeaweed matrix multiplication                                    PB,lo , PB,hi              ...
Matrix distance multiplicationSeaweed matrix multiplication                                    PB,lo , PB,hi              ...
Matrix distance multiplicationSeaweed matrix multiplication                                             PB                ...
Matrix distance multiplicationSeaweed matrix multiplicationImplicit unit-Monge matrix              -multiplication: the al...
Matrix distance multiplicationBruhat orderComparing permutations by the “degree of sortedness”Bruhat orderPermutation A is...
Matrix distance multiplicationBruhat orderBruhat comparability: running timeO(n2 )                                        ...
1   Introduction                                7    Sparse string comparison2   Matrix distance multiplication           ...
Semi-local string comparisonSemi-local LCS and edit distanceConsider strings (= sequences) over an alphabet of size σDisti...
Semi-local string comparisonSemi-local LCS and edit distanceConsider strings (= sequences) over an alphabet of size σDisti...
Semi-local string comparisonSemi-local LCS and edit distanceThe LCS problemGive the LCS score for a vs b   Alexander Tiski...
Semi-local string comparisonSemi-local LCS and edit distanceThe LCS problemGive the LCS score for a vs bLCS: running timeO...
Semi-local string comparisonSemi-local LCS and edit distanceLCS on the alignment graph (directed, acyclic)  B A A B C A B ...
Semi-local string comparisonSemi-local LCS and edit distanceLCS: dynamic programming                                      ...
Semi-local string comparisonSemi-local LCS and edit distance                                     ‘Begin at the beginning,’...
Semi-local string comparisonSemi-local LCS and edit distanceSometimes dynamic programming can be run from both ends for ex...
Semi-local string comparisonSemi-local LCS and edit distanceSometimes dynamic programming can be run from both ends for ex...
Semi-local string comparisonSemi-local LCS and edit distanceLCS: micro-block dynamic programming                          ...
Semi-local string comparisonSemi-local LCS and edit distanceThe semi-local LCS problemGive the (implicit) matrix of O (m +...
Semi-local string comparisonSemi-local LCS and edit distanceThe semi-local LCS problemGive the (implicit) matrix of O (m +...
Semi-local string comparisonSemi-local LCS and edit distanceSemi-local LCS on the alignment graph  B A A B C A B C A B A C...
Semi-local string comparisonScore matrices and seaweed matricesThe score matrix H 0     1   2   3    4   5   6     6   7  ...
Semi-local string comparisonScore matrices and seaweed matricesSemi-local LCS: output representation and running timesize ...
Semi-local string comparisonScore matrices and seaweed matricesThe score matrix H and the seaweed matrix PH(i, j): the num...
Semi-local string comparisonScore matrices and seaweed matricesThe score matrix H and the seaweed matrix P 0     1   2   3...
Semi-local string comparisonScore matrices and seaweed matricesThe score matrix H and the seaweed matrix P 0     1   2   3...
Semi-local string comparisonScore matrices and seaweed matricesThe score matrix H and the seaweed matrix P 0     1   2   3...
Semi-local string comparisonScore matrices and seaweed matricesThe score matrix H and the seaweed matrix P                ...
Semi-local string comparisonScore matrices and seaweed matricesThe seaweed braid in the alignment graph  B A A B C A B C A...
Semi-local string comparisonScore matrices and seaweed matricesThe seaweed braid in the alignment graph  B A A B C A B C A...
Semi-local string comparisonScore matrices and seaweed matricesSeaweed braid: a highly symmetric object (element of the 0-...
Semi-local string comparisonWeighted alignmentThe LCS problem is a special case of the weighted alignment scoreproblem wit...
Semi-local string comparisonWeighted alignmentThe LCS problem is a special case of the weighted alignment scoreproblem wit...
Semi-local string comparisonWeighted alignmentThe LCS problem is a special case of the weighted alignment scoreproblem wit...
Semi-local string comparisonWeighted alignmentWeighted alignment graph  B A A B C A B C A B A C A                         ...
Semi-local string comparisonWeighted alignmentAlignment graph for blown-up strings     $B $A $A $B $C $A $B $C $A $B $A $C...
Semi-local string comparisonWeighted alignmentRational-weighted semi-local alignment reduced to semi-local LCS     $B $A $...
Alexander Tiskin (Warwick)   Semi-local string comparison   47 / 132
1   Introduction                                7    Sparse string comparison2   Matrix distance multiplication           ...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-local string c...
The seaweed methodSeaweed combingSemi-local LCS: seaweed combing                                       [T: 2006]Initialise...
The seaweed methodMicro-block seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-lo...
The seaweed methodMicro-block seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-lo...
The seaweed methodMicro-block seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-lo...
The seaweed methodMicro-block seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-lo...
The seaweed methodMicro-block seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-lo...
The seaweed methodMicro-block seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-lo...
The seaweed methodMicro-block seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-lo...
The seaweed methodMicro-block seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-lo...
The seaweed methodMicro-block seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-lo...
The seaweed methodMicro-block seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-lo...
The seaweed methodMicro-block seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-lo...
The seaweed methodMicro-block seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-lo...
The seaweed methodMicro-block seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-lo...
The seaweed methodMicro-block seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-lo...
The seaweed methodMicro-block seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   Semi-lo...
The seaweed methodMicro-block seaweed combingSemi-local LCS: micro-block seaweed combing                                [T...
The seaweed methodCyclic LCSThe cyclic LCS problemGive the maximum LCS score for a vs all cyclic rotations of b   Alexande...
The seaweed methodCyclic LCSThe cyclic LCS problemGive the maximum LCS score for a vs all cyclic rotations of bCyclic LCS:...
The seaweed methodCyclic LCSCyclic LCS: the algorithm                                                               mn(log...
The seaweed methodLongest repeating subsequenceThe longest repeating subsequence problemFind the longest subsequence of a ...
The seaweed methodLongest repeating subsequenceThe longest repeating subsequence problemFind the longest subsequence of a ...
The seaweed methodLongest repeating subsequenceLongest repeating subsequence: the algorithm                               ...
The seaweed methodApproximate matchingThe approximate pattern matching problemGive the substring closest to a by alignment...
The seaweed methodApproximate matchingApproximate pattern matching: the algorithmMicro-block seaweed combing on a vs b (wi...
Alexander Tiskin (Warwick)   Semi-local string comparison   63 / 132
1   Introduction                                7    Sparse string comparison2   Matrix distance multiplication           ...
Periodic string comparisonWraparound seaweed combingThe periodic string-substring LCS problemGive (implicit) LCS scores fo...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combing    B A A B C A B C A B A C ABAABCBCA    Alexander Tiskin (Warwick)   ...
Periodic string comparisonWraparound seaweed combingPeriodic string-substring LCS: Wraparound seaweed combingInitialise se...
Periodic string comparisonWraparound seaweed combingThe tandem LCS problemGive LCS score for a vs b = u kWe have n = kp; m...
Periodic string comparisonWraparound seaweed combingThe tandem alignment problemGive the substring closest to a by alignme...
Periodic string comparisonWraparound seaweed combingCyclic tandem alignment: the algorithmPeriodic seaweed combing for a v...
Alexander Tiskin (Warwick)   Semi-local string comparison   73 / 132
1   Introduction                                7    Sparse string comparison2   Matrix distance multiplication           ...
The transposition network methodTransposition networksComparison network: a circuit of comparatorsA comparator sorts two i...
The transposition network methodTransposition networksComparison network: a circuit of comparatorsA comparator sorts two i...
The transposition network methodTransposition networksSeaweed combing as a transposition network                          ...
The transposition network methodTransposition networksGlobal LCS: transposition network with binary input                 ...
The transposition network methodParameterised string comparisonParameterised string comparisonString comparison sensitive ...
The transposition network methodParameterised string comparisonLow-similarity comparison: small λ      sparse set of match...
The transposition network methodParameterised string comparisonParameterised string comparison: running timeLow-similarity...
The transposition network methodParameterised string comparisonParameterised string comparison: the waterfall algorithmLow...
The transposition network methodDynamic string comparisonThe dynamic LCS problemMaintain current LCS score under updates t...
The transposition network methodDynamic string comparisonDynamic LCS in linear time: update modelsleft            right–  ...
The transposition network methodBit-parallel string comparisonBit-parallel string comparisonString comparison using standa...
The transposition network methodBit-parallel string comparisonBit-parallel string comparison: binary transposition network...
The transposition network methodBit-parallel string comparisonBit-parallel string comparison: binary transposition network...
Alexander Tiskin (Warwick)   Semi-local string comparison   86 / 132
1   Introduction                                7    Sparse string comparison2   Matrix distance multiplication           ...
Sparse string comparisonSemi-local LCS between permutationsThe LCS problem on permutation stringsGive LCS score for a vs b...
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
20121020 semi local-string_comparison_tiskin
Upcoming SlideShare
Loading in …5
×

20121020 semi local-string_comparison_tiskin

405
-1

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
405
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

20121020 semi local-string_comparison_tiskin

  1. 1. Semi-local string comparison: Algorithmic techniques and applications Alexander Tiskin Department of Computer Science University of Warwick http://go.warwick.ac.uk/alextiskinAlexander Tiskin (Warwick) Semi-local string comparison 1 / 132
  2. 2. 1 Introduction 7 Sparse string comparison2 Matrix distance multiplication 8 Compressed string comparison3 Semi-local string comparison 9 Beyond semi-locality4 The seaweed method 10 Conclusions and future work5 Periodic string comparison6 The transposition network method Alexander Tiskin (Warwick) Semi-local string comparison 2 / 132
  3. 3. 1 Introduction 7 Sparse string comparison2 Matrix distance multiplication 8 Compressed string comparison3 Semi-local string comparison 9 Beyond semi-locality4 The seaweed method 10 Conclusions and future work5 Periodic string comparison6 The transposition network method Alexander Tiskin (Warwick) Semi-local string comparison 3 / 132
  4. 4. IntroductionString matching: finding an exact pattern in a stringString comparison: finding similar patterns in two stringsApplications: computational biology, image recognition, . . . Alexander Tiskin (Warwick) Semi-local string comparison 4 / 132
  5. 5. IntroductionString matching: finding an exact pattern in a stringString comparison: finding similar patterns in two stringsApplications: computational biology, image recognition, . . .Standard types of string comparison: global: whole string vs whole string local: substrings vs substringsMain focus of this work: semi-local: whole string vs substrings; prefixes vs suffixesClosely related to approximate string matching (no relation toapproximation algorithms!)Main tool: implicit unit-Monge matrices (a.k.a. seaweed matrices) Alexander Tiskin (Warwick) Semi-local string comparison 4 / 132
  6. 6. IntroductionTerminology and notationx− = x − 1 2 x+ = x + 1 2Integers: {. . . − 2, −1, 0, 1, 2, . . .}Half-integers: . . . − 3 , − 1 , 1 , 2 , 2 , . . . = . . . (−2)+ , (−1)+ , 0+ , 1+ , 2+ 2 2 2 3 5(i, j) (i , j ) iff i < i and j < j (i, j) (i , j ) iff i > i and j < jA permutation matrix is a 0/1 matrix with exactly one nonzero per rowand per column  0 1 01 0 0 0 0 1 Alexander Tiskin (Warwick) Semi-local string comparison 5 / 132
  7. 7. IntroductionTerminology and notationGiven matrix D, its distribution matrix is made up of -dominance sums: Alexander Tiskin (Warwick) Semi-local string comparison 6 / 132
  8. 8. IntroductionTerminology and notationGiven matrix D, its distribution matrix is made up of -dominance sums:Given matrix E , its density matrix is made up of quadrangle differences:E (ˆ, ) = E (ˆ− , + ) − E (ˆ− , − ) − E (ˆ+ , + ) + E (ˆ+ , − ) ı ˆ ı ˆ ı ˆ ı ˆ ı ˆwhere D Σ , E over integers; D, E over half-integers Alexander Tiskin (Warwick) Semi-local string comparison 6 / 132
  9. 9. IntroductionTerminology and notationGiven matrix D, its distribution matrix is made up of -dominance sums:Given matrix E , its density matrix is made up of quadrangle differences:E (ˆ, ) = E (ˆ− , + ) − E (ˆ− , − ) − E (ˆ+ , + ) + E (ˆ+ , − ) ı ˆ ı ˆ ı ˆ ı ˆ ı ˆwhere D Σ , E over integers; D, E over half-integers     Σ 0 1 2 3 0 1 2 3   0 1 0 0 1 1 2 0 1 1 2 0 1 01 0 0 =  0 0 0 1 = 1 0 0      0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 Alexander Tiskin (Warwick) Semi-local string comparison 6 / 132
  10. 10. IntroductionTerminology and notationGiven matrix D, its distribution matrix is made up of -dominance sums:Given matrix E , its density matrix is made up of quadrangle differences:E (ˆ, ) = E (ˆ− , + ) − E (ˆ− , − ) − E (ˆ+ , + ) + E (ˆ+ , − ) ı ˆ ı ˆ ı ˆ ı ˆ ı ˆwhere D Σ , E over integers; D, E over half-integers     Σ 0 1 2 3 0 1 2 3   0 1 0 0 1 1 2 0 1 1 2 0 1 01 0 0 =  0 0 0 1 = 1 0 0      0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0(D Σ ) = D for all DMatrix E is simple, if (E )Σ = E ; equivalently, if it has all zeros in the leftcolumn and bottom row Alexander Tiskin (Warwick) Semi-local string comparison 6 / 132
  11. 11. IntroductionTerminology and notationMatrix E is Monge, if E is nonnegativeIntuition: boundary-to-boundary distances in a (weighted) planar graphMatrix E is unit-Monge, if E is a permutation matrixIntuition: boundary-to-boundary distances in a grid-like graph Alexander Tiskin (Warwick) Semi-local string comparison 7 / 132
  12. 12. IntroductionTerminology and notationMatrix E is Monge, if E is nonnegativeIntuition: boundary-to-boundary distances in a (weighted) planar graphMatrix E is unit-Monge, if E is a permutation matrixIntuition: boundary-to-boundary distances in a grid-like graphSimple unit-Monge matrix: P Σ , where P is a permutation matrixSeaweed matrix: P used as an implicit representation of P Σ   Σ 0 1 2 3 0 1 0 01 0 0 =  1 1 2  0 0 0 1 0 0 1 0 0 0 0 Alexander Tiskin (Warwick) Semi-local string comparison 7 / 132
  13. 13. IntroductionImplicit unit-Monge matricesEfficient P Σ queries: range tree on nonzeros of P [Bentley: 1980] binary search tree by i-coordinate under every node, binary search tree by j-coordinate • • • • −→ • −→ • • • • • • • ↓ • • • • −→ • −→ • • • • • • • ↓ • • • • −→ • −→ • • • • • • • Alexander Tiskin (Warwick) Semi-local string comparison 8 / 132
  14. 14. IntroductionImplicit unit-Monge matricesEfficient P Σ queries: (contd.)Every node of the range tree represents a canonical range (rectangularregion), and stores its nonzero countOverall, ≤ n log n canonical ranges are non-emptyA P Σ query is equivalent to -dominance counting: how many nonzerosare -dominated by query point?Answer: sum up nonzero counts in ≤ log2 n disjoint canonical rangesTotal size O(n log n), query time O(log2 n) Alexander Tiskin (Warwick) Semi-local string comparison 9 / 132
  15. 15. IntroductionImplicit unit-Monge matricesEfficient P Σ queries: (contd.)Every node of the range tree represents a canonical range (rectangularregion), and stores its nonzero countOverall, ≤ n log n canonical ranges are non-emptyA P Σ query is equivalent to -dominance counting: how many nonzerosare -dominated by query point?Answer: sum up nonzero counts in ≤ log2 n disjoint canonical rangesTotal size O(n log n), query time O(log2 n)There are asymptotically more efficient (but less practical) data structures log nTotal size O(n), query time O log log n [J´J´+: 2004] a a [Chan, Pˇtra¸cu: 2010] a s Alexander Tiskin (Warwick) Semi-local string comparison 9 / 132
  16. 16. 1 Introduction 7 Sparse string comparison2 Matrix distance multiplication 8 Compressed string comparison3 Semi-local string comparison 9 Beyond semi-locality4 The seaweed method 10 Conclusions and future work5 Periodic string comparison6 The transposition network method Alexander Tiskin (Warwick) Semi-local string comparison 10 / 132
  17. 17. Matrix distance multiplicationSeaweed braidsDistance algebra (a.k.a (min, +) or tropical algebra): addition ⊕ given by min multiplication given by +Matrix -multiplicationA B=C C (i, k) = j A(i, j) B(j, k) = minj A(i, j) + B(j, k)Matrix classes closed under -multiplication (for given n): general numerical (integer, real) matrices Monge matrices simple unit-Monge matrices (!) Alexander Tiskin (Warwick) Semi-local string comparison 11 / 132
  18. 18. Matrix distance multiplicationSeaweed braidsRecall that simple unit-Monge matrices are represented implicitly bypermutation (seaweed) matricesDefine PA Σ PB = PC as PA Σ Σ PB = PCThe seaweed monoid Tn : simple unit-Monge matrices under equivalently, permutation (seaweed) matrices underAlso known as the 0-Hecke monoid of the symmetric group H0 (Sn ) Alexander Tiskin (Warwick) Semi-local string comparison 12 / 132
  19. 19. Matrix distance multiplicationSeaweed braidsPA PB = PC can be seen as combing of seaweed braids • • • • • • • • • • • • • • • • • • PA PB PC Alexander Tiskin (Warwick) Semi-local string comparison 13 / 132
  20. 20. Matrix distance multiplicationSeaweed braidsPA PB = PC can be seen as combing of seaweed braids • • • • • • • • • • • • • • • • • • PA PB PCPAPB Alexander Tiskin (Warwick) Semi-local string comparison 13 / 132
  21. 21. Matrix distance multiplicationSeaweed braidsPA PB = PC can be seen as combing of seaweed braids • • • • • • • • • • • • • • • • • • PA PB PCPAPB Alexander Tiskin (Warwick) Semi-local string comparison 13 / 132
  22. 22. Matrix distance multiplicationSeaweed braidsPA PB = PC can be seen as combing of seaweed braids • • • • • • • • • • • • • • • • • • PA PB PCPA PCPB Alexander Tiskin (Warwick) Semi-local string comparison 13 / 132
  23. 23. Matrix distance multiplicationSeaweed braidsThe seaweed monoid Tn : n! elements (permutations of size n) n − 1 generators g1 , g2 , . . . , gn−1 (elementary crossings)Idempotence:gi2 = gi for all i =Far commutativity:gi gj = gj gi j − i > 1 ··· = ···Braid relations:gi gj gi = gj gi gj j − i = 1 = Alexander Tiskin (Warwick) Semi-local string comparison 14 / 132
  24. 24. Matrix distance multiplicationSeaweed braidsIdentity: 1 x =x   • · · · · • · · · · • · =1=  · · · •Zero: 0 x =0   · · · • · · • ·0= · • · · =  • · · · Alexander Tiskin (Warwick) Semi-local string comparison 15 / 132
  25. 25. Matrix distance multiplicationSeaweed braidsRelated structures: positive braids: far comm; braid relations braids: gi gi−1 = 1; far comm; braid relations Coxeter’s presentation of Sn : gi2 = 1; far comm; braid relations locally free idempotent monoid: idem; far comm [Vershik+: 2000]Generalisations: general 0-Hecke monoids [Fomin, Greene: 1998; Buch+: 2008] Coxeter monoids [Tsaranov: 1990; Richardson, Springer: 1990] J -trivial monoids [Denton+: 2011] Alexander Tiskin (Warwick) Semi-local string comparison 16 / 132
  26. 26. Matrix distance multiplicationSeaweed braidsComputation in the seaweed monoid: a confluent rewriting system can beobtained by software (Semigroupe, GAP) Alexander Tiskin (Warwick) Semi-local string comparison 17 / 132
  27. 27. Matrix distance multiplicationSeaweed braidsComputation in the seaweed monoid: a confluent rewriting system can beobtained by software (Semigroupe, GAP)T3 : 1, a = g1 , b = g2 ; ab, ba, aba = 0aa → a bb → b bab → 0 aba → 0 Alexander Tiskin (Warwick) Semi-local string comparison 17 / 132
  28. 28. Matrix distance multiplicationSeaweed braidsComputation in the seaweed monoid: a confluent rewriting system can beobtained by software (Semigroupe, GAP)T3 : 1, a = g1 , b = g2 ; ab, ba, aba = 0aa → a bb → b bab → 0 aba → 0T4 : 1, a = g1 , b = g2 , c = g3 ; ab, ac, ba, bc, cb, aba, abc, acb, bac,bcb, cba, abac, abcb, acba, bacb, bcba, abacb, abcba, bacba, abacba = 0aa → a ca → ac bab → aba cbac → bcbabb → b cc → c cbc → bcb abacba → 0 Alexander Tiskin (Warwick) Semi-local string comparison 17 / 132
  29. 29. Matrix distance multiplicationSeaweed braidsComputation in the seaweed monoid: a confluent rewriting system can beobtained by software (Semigroupe, GAP)T3 : 1, a = g1 , b = g2 ; ab, ba, aba = 0aa → a bb → b bab → 0 aba → 0T4 : 1, a = g1 , b = g2 , c = g3 ; ab, ac, ba, bc, cb, aba, abc, acb, bac,bcb, cba, abac, abcb, acba, bacb, bcba, abacb, abcba, bacba, abacba = 0aa → a ca → ac bab → aba cbac → bcbabb → b cc → c cbc → bcb abacba → 0Easy to use, but not an efficient algorithm Alexander Tiskin (Warwick) Semi-local string comparison 17 / 132
  30. 30. Matrix distance multiplicationSeaweed matrix multiplicationThe implicit unit-Monge matrix -multiplication problemGiven permutation matrices PA , PB , compute PC , such that Σ Σ ΣPA PB = PC (equivalently, PA PB = PC ) Alexander Tiskin (Warwick) Semi-local string comparison 18 / 132
  31. 31. Matrix distance multiplicationSeaweed matrix multiplicationThe implicit unit-Monge matrix -multiplication problemGiven permutation matrices PA , PB , compute PC , such that Σ Σ ΣPA PB = PC (equivalently, PA PB = PC )Matrix -multiplication: running timetype timegeneral O(n3 ) standard 3 3 O n (log log n) log2 n [Chan: 2007]Monge O(n2 ) via [Aggarwal+: 1987]implicit unit-Monge O(n1.5 ) [T: 2006] O(n log n) [T: 2010] Alexander Tiskin (Warwick) Semi-local string comparison 18 / 132
  32. 32. Matrix distance multiplicationSeaweed matrix multiplication PB •• • •• •• • • • • • • • •• • • • • •• • • • • • • • •• •• • •• ? • • • • PA PC Alexander Tiskin (Warwick) Semi-local string comparison 19 / 132
  33. 33. Matrix distance multiplicationSeaweed matrix multiplication PB,lo , PB,hi •• • •• •• • • • • • • • •• • • • • •• • • • • • • • •• •• •• • • • • • PA,lo , PA,hi Alexander Tiskin (Warwick) Semi-local string comparison 20 / 132
  34. 34. Matrix distance multiplicationSeaweed matrix multiplication PB,lo , PB,hi •• • •• •• • • • • • • • •• • • • • •• • •• • • • • • • • • • •• •• •• • • • • • • • • • • PA,lo , PA,hi Alexander Tiskin (Warwick) Semi-local string comparison 20 / 132
  35. 35. Matrix distance multiplicationSeaweed matrix multiplication PB,lo , PB,hi •• • •• •• • • • • • • • •• • • • • •• • • • • • • • • • • • • •• • •• •• • • • • • • • • • PA,lo , PA,hi Alexander Tiskin (Warwick) Semi-local string comparison 20 / 132
  36. 36. Matrix distance multiplicationSeaweed matrix multiplication PB,lo , PB,hi •• • •• •• • • • • • • • •• • • • • •• • •• • • • • • • • • • • • • •• • • • •• •• • • • • • • • • • • •• • • PA,lo , PA,hi PC ,lo + PC ,hi Alexander Tiskin (Warwick) Semi-local string comparison 20 / 132
  37. 37. Matrix distance multiplicationSeaweed matrix multiplication PB,lo , PB,hi •• • •• •• • • • • • • • •• • • • • •• • •• • • • • • • • • • • • • •• • • • •• •• • • • • • • • • • • •• • • PA,lo , PA,hi PC ,lo + PC ,hi Alexander Tiskin (Warwick) Semi-local string comparison 21 / 132
  38. 38. Matrix distance multiplicationSeaweed matrix multiplication PB,lo , PB,hi •• • •• •• • • • • • • • •• • • • • •• • •• • • • • • • • • • •• • • • •• • •• •• • • • • • • • • • • •• • • PA,lo , PA,hi PC Alexander Tiskin (Warwick) Semi-local string comparison 21 / 132
  39. 39. Matrix distance multiplicationSeaweed matrix multiplication PB •• • •• •• • • • • • • • •• • • • • •• • •• • • • • • • • • • •• • • • •• • •• •• • • • • • • • • • • •• • • PA PC Alexander Tiskin (Warwick) Semi-local string comparison 22 / 132
  40. 40. Matrix distance multiplicationSeaweed matrix multiplicationImplicit unit-Monge matrix -multiplication: the algorithm Σ Σ ΣPC (i, k) = minj PA (i, j) + PB (j, k)Divide-and-conquer on the range of jDivide PA horizontally, PB vertically: two subproblems of effective size n/2 ΣPA,lo Σ Σ PB,lo = PC ,lo Σ PA,hi Σ Σ PB,hi = PC ,hiConquer: -low nonzeros of PC ,lo and -high nonzeros of PC ,hi appear in PCThe remaining nonzeros of PC ,lo and PC ,hi are “wrong”, and need to becorrected to obtain the remaining nonzeros of PCCorrection can be done in time O(n) using the unit-Monge propertyOverall time O(n log n) Alexander Tiskin (Warwick) Semi-local string comparison 23 / 132
  41. 41. Matrix distance multiplicationBruhat orderComparing permutations by the “degree of sortedness”Bruhat orderPermutation A is lower (“more sorted”) than permutation B in the Bruhatorder (A B), if B can be transformed to A by successive pairwise sortingbetween arbitrary pairs of elements.Permutation matrices: PA PB , if PB can be transformed to PA bysuccessive submatrix substitution: ( 0 1 ) 10 (1 0) 01 Alexander Tiskin (Warwick) Semi-local string comparison 24 / 132
  42. 42. Matrix distance multiplicationBruhat orderBruhat comparability: running timeO(n2 ) folkloreO(n log n) [T: NEW]PA PB iff PA ≤ PB elementwise, time O(n2 ) Σ Σ folklore R RPA PB iff PA PB = Id , time O(n log n) [T: NEW]where P R denotes clockwise rotation of matrix P Alexander Tiskin (Warwick) Semi-local string comparison 25 / 132
  43. 43. 1 Introduction 7 Sparse string comparison2 Matrix distance multiplication 8 Compressed string comparison3 Semi-local string comparison 9 Beyond semi-locality4 The seaweed method 10 Conclusions and future work5 Periodic string comparison6 The transposition network method Alexander Tiskin (Warwick) Semi-local string comparison 26 / 132
  44. 44. Semi-local string comparisonSemi-local LCS and edit distanceConsider strings (= sequences) over an alphabet of size σDistinguish contiguous substrings and not necessarily contiguoussubsequencesSpecial cases of substring: prefix, suffixNotation: strings a, b of length m, n respectivelyAssume where necessary: m ≤ n; m, n reasonably close Alexander Tiskin (Warwick) Semi-local string comparison 27 / 132
  45. 45. Semi-local string comparisonSemi-local LCS and edit distanceConsider strings (= sequences) over an alphabet of size σDistinguish contiguous substrings and not necessarily contiguoussubsequencesSpecial cases of substring: prefix, suffixNotation: strings a, b of length m, n respectivelyAssume where necessary: m ≤ n; m, n reasonably closeThe longest common subsequence (LCS) score: length of longest string that is a subsequence of both a and b equivalently, alignment score, where score(match) = 1 and score(mismatch) = 0In biological terms, “loss-free alignment” (unlike “lossy” BLAST) Alexander Tiskin (Warwick) Semi-local string comparison 27 / 132
  46. 46. Semi-local string comparisonSemi-local LCS and edit distanceThe LCS problemGive the LCS score for a vs b Alexander Tiskin (Warwick) Semi-local string comparison 28 / 132
  47. 47. Semi-local string comparisonSemi-local LCS and edit distanceThe LCS problemGive the LCS score for a vs bLCS: running timeO(mn) [Wagner, Fischer: 1974] mnO log2 n σ = O(1) [Masek, Paterson: 1980] [Crochemore+: 2003] mn(log log n)2O log2 n [Paterson, Danˇ´cık: 1994] [Bille, Farach-Colton: 2008]Running time varies depending on the RAM model versionWe assume word-RAM with word size log n (where it matters) Alexander Tiskin (Warwick) Semi-local string comparison 28 / 132
  48. 48. Semi-local string comparisonSemi-local LCS and edit distanceLCS on the alignment graph (directed, acyclic) B A A B C A B C A B A C A blue = 0B red = 1AABCBCAscore(“BAABCBCA”, “BAABCABCABACA”) = len(“BAABCBCA”) = 8LCS = highest-score path from top-left to bottom-right Alexander Tiskin (Warwick) Semi-local string comparison 29 / 132
  49. 49. Semi-local string comparisonSemi-local LCS and edit distanceLCS: dynamic programming [WF: 1974]Sweep cells in any -compatible orderCell update: time O(1)Overall time O(mn) Alexander Tiskin (Warwick) Semi-local string comparison 30 / 132
  50. 50. Semi-local string comparisonSemi-local LCS and edit distance ‘Begin at the beginning,’ the King said gravely, ‘and go on till you come to the end: then stop.’ L. Carroll, Alice in Wonderland (The standard approach in dynamic programming) Alexander Tiskin (Warwick) Semi-local string comparison 31 / 132
  51. 51. Semi-local string comparisonSemi-local LCS and edit distanceSometimes dynamic programming can be run from both ends for extraflexibility Alexander Tiskin (Warwick) Semi-local string comparison 32 / 132
  52. 52. Semi-local string comparisonSemi-local LCS and edit distanceSometimes dynamic programming can be run from both ends for extraflexibilityIs there a better, fully flexible alternative (e.g. for comparing compressedstrings, comparing strings dynamically or in parallel, etc.)? Alexander Tiskin (Warwick) Semi-local string comparison 32 / 132
  53. 53. Semi-local string comparisonSemi-local LCS and edit distanceLCS: micro-block dynamic programming [MP: 1980; BF: 2008]Sweep cells in micro-blocks, in any -compatible orderMicro-block size: t = O(log n) when σ = O(1) log n t=O log log n otherwiseMicro-block interface: O(t) characters, each O(log σ) bits, can be reduced to O(log t) bits O(t) small integers, each O(1) bitsMicro-block update: time O(1), by precomputing all possible interfaces mn mn(log log n)2Overall time O log2 n when σ = O(1), O log2 n otherwise Alexander Tiskin (Warwick) Semi-local string comparison 33 / 132
  54. 54. Semi-local string comparisonSemi-local LCS and edit distanceThe semi-local LCS problemGive the (implicit) matrix of O (m + n)2 LCS scores: string-substring LCS: string a vs every substring of b prefix-suffix LCS: every prefix of a vs every suffix of b suffix-prefix LCS: every suffix of a vs every prefix of b substring-string LCS: every substring of a vs string b Alexander Tiskin (Warwick) Semi-local string comparison 34 / 132
  55. 55. Semi-local string comparisonSemi-local LCS and edit distanceThe semi-local LCS problemGive the (implicit) matrix of O (m + n)2 LCS scores: string-substring LCS: string a vs every substring of b prefix-suffix LCS: every prefix of a vs every suffix of b suffix-prefix LCS: every suffix of a vs every prefix of b substring-string LCS: every substring of a vs string bCf.: dynamic programming gives prefix-prefix LCS Alexander Tiskin (Warwick) Semi-local string comparison 34 / 132
  56. 56. Semi-local string comparisonSemi-local LCS and edit distanceSemi-local LCS on the alignment graph B A A B C A B C A B A C A blue = 0B red = 1AABCBCAscore(“BAABCBCA”, “CABCABA”) = len(“ABCBA”) = 5String-substring LCS: all highest-score top-to-bottom pathsSemi-local LCS: all highest-score boundary-to-boundary paths Alexander Tiskin (Warwick) Semi-local string comparison 35 / 132
  57. 57. Semi-local string comparisonScore matrices and seaweed matricesThe score matrix H 0 1 2 3 4 5 6 6 7 8 8 8 8 8 a = “BAABCBCA” -1 0 1 2 3 4 5 5 6 7 7 7 7 7 -2 -1 0 1 2 3 4 4 5 6 6 6 6 7 b = “BAABCABCABACA” -3 -2 -1 0 1 2 3 3 4 5 5 6 6 7 H(i, j) = score(a, b i : j ) -4 -3 -2 -1 0 1 2 2 3 4 4 5 5 6 H(4, 11) = 5 -5 -4 -3 -2 -1 0 1 2 3 4 4 5 5 6 -6 -5 -4 -3 -2 -1 0 1 2 3 3 4 4 5 H(i, j) = j − i if i > j -7 -6 -5 -4 -3 -2 -1 0 1 2 2 3 3 4 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 3 4 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1-13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 Alexander Tiskin (Warwick) Semi-local string comparison 36 / 132
  58. 58. Semi-local string comparisonScore matrices and seaweed matricesSemi-local LCS: output representation and running timesize query timeO(n2 ) O(1) trivialO(m1/2 n) O(log n) string-substring [Alves+: 2003]O(n) O(n) string-substring [Alves+: 2005]O(n log n) O(log2 n) [T: 2006] . . . or any 2D orthogonal range counting data structurerunning timeO(mn2 ) naiveO(mn) string-substring [Schmidt: 1998; Alves+: 2005]O(mn) [T: 2006] mnO log0.5 n [T: 2006] mn(log log n)2O log2 n [T: 2007] Alexander Tiskin (Warwick) Semi-local string comparison 37 / 132
  59. 59. Semi-local string comparisonScore matrices and seaweed matricesThe score matrix H and the seaweed matrix PH(i, j): the number of matched characters for a vs substring b i : jj − i − H(i, j): the number of unmatched charactersProperties of matrix j − i − H(i, j): simple unit-Monge therefore, = P Σ , where P = −H is a permutation matrixP is the seaweed matrix, giving an implicit representation of HRange tree for P: memory O(n log n), query time O(log2 n) Alexander Tiskin (Warwick) Semi-local string comparison 38 / 132
  60. 60. Semi-local string comparisonScore matrices and seaweed matricesThe score matrix H and the seaweed matrix P 0 1 2 3 4 5 6 6 7 8 8 8 8 8 a = “BAABCBCA” -1 0 1 2 3 4 5 5 6 7 7 7 7 7 • b = “BAABCABCABACA” -2 -1 0 1 2 3 4 4 5 6 6 6 6 7 • -3 -2 -1 0 1 2 3 3 4 5 5 6 6 7 H(i, j) = score(a, b i : j ) -4 -3 -2 -1 0 1 2 2 3 4 4 5 5 6 • H(4, 11) = 5 -5 -4 -3 -2 -1 0 1 2 3 4 4 5 5 6 -6 -5 -4 -3 -2 -1 0 1 2 3 3 4 4 5 H(i, j) = j − i if i > j -7 -6 -5 -4 -3 -2 -1 0 1 2 2 3 3 4 • -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 3 4 • -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1-13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 Alexander Tiskin (Warwick) Semi-local string comparison 39 / 132
  61. 61. Semi-local string comparisonScore matrices and seaweed matricesThe score matrix H and the seaweed matrix P 0 1 2 3 4 5 6 6 7 8 8 8 8 8 a = “BAABCBCA” -1 0 1 2 3 4 5 5 6 7 7 7 7 7 • b = “BAABCABCABACA” -2 -1 0 1 2 3 4 4 5 6 6 6 6 7 • -3 -2 -1 0 1 2 3 3 4 5 5 6 6 7 H(i, j) = score(a, b i : j ) -4 -3 -2 -1 0 1 2 2 3 4 4 5 5 6 • H(4, 11) = 5 -5 -4 -3 -2 -1 0 1 2 3 4 4 5 5 6 -6 -5 -4 -3 -2 -1 0 1 2 3 3 4 4 5 H(i, j) = j − i if i > j -7 -6 -5 -4 -3 -2 -1 0 1 2 2 3 3 4 • blue: difference in H is 0 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 3 4 • red: difference in H is 1 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1-13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 Alexander Tiskin (Warwick) Semi-local string comparison 39 / 132
  62. 62. Semi-local string comparisonScore matrices and seaweed matricesThe score matrix H and the seaweed matrix P 0 1 2 3 4 5 6 6 7 8 8 8 8 8 a = “BAABCBCA” -1 0 1 2 3 4 5 5 6 7 7 7 7 7 • b = “BAABCABCABACA” -2 -1 0 1 2 3 4 4 5 6 6 6 6 7 • -3 -2 -1 0 1 2 3 3 4 5 5 6 6 7 H(i, j) = score(a, b i : j ) -4 -3 -2 -1 0 1 2 2 3 4 4 5 5 6 • H(4, 11) = 5 -5 -4 -3 -2 -1 0 1 2 3 4 4 5 5 6 -6 -5 -4 -3 -2 -1 0 1 2 3 3 4 4 5 H(i, j) = j − i if i > j -7 -6 -5 -4 -3 -2 -1 0 1 2 2 3 3 4 • blue: difference in H is 0 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 3 4 • red: difference in H is 1 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 green: P(i, j) = 1-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 H(i, j) = j − i − P Σ (i, j)-13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 Alexander Tiskin (Warwick) Semi-local string comparison 39 / 132
  63. 63. Semi-local string comparisonScore matrices and seaweed matricesThe score matrix H and the seaweed matrix P a = “BAABCBCA” • b = “BAABCABCABACA” • H(4, 11) = • 11 − 4 − P Σ (i, j) = 11 − 4 − 2 = 5 • • Alexander Tiskin (Warwick) Semi-local string comparison 40 / 132
  64. 64. Semi-local string comparisonScore matrices and seaweed matricesThe seaweed braid in the alignment graph B A A B C A B C A B A C A a = “BAABCBCA”BA b = “BAABCABCABACA”A H(4, 11) =B 11 − 4 − P Σ (i, j) =C 11 − 4 − 2 = 5BCAP(i, j) = 1 corresponds to seaweed top i bottom j Alexander Tiskin (Warwick) Semi-local string comparison 41 / 132
  65. 65. Semi-local string comparisonScore matrices and seaweed matricesThe seaweed braid in the alignment graph B A A B C A B C A B A C A a = “BAABCBCA”BA b = “BAABCABCABACA”A H(4, 11) =B 11 − 4 − P Σ (i, j) =C 11 − 4 − 2 = 5BCAP(i, j) = 1 corresponds to seaweed top i bottom jAlso define top right, left right, left bottom seaweedsGives bijection between top-left and bottom-right graph boundaries Alexander Tiskin (Warwick) Semi-local string comparison 41 / 132
  66. 66. Semi-local string comparisonScore matrices and seaweed matricesSeaweed braid: a highly symmetric object (element of the 0-Hecke monoidof the symmetric group)Can be built recursively by assembling subbraids from separate partsHighly flexible: local alignment, compression, parallel computation. . . Alexander Tiskin (Warwick) Semi-local string comparison 42 / 132
  67. 67. Semi-local string comparisonWeighted alignmentThe LCS problem is a special case of the weighted alignment scoreproblem with weighted matches (wM ), mismatches (wX ) and gaps (wG ) LCS score: wM = 1, wX = wG = 0 Levenshtein score: wM = 2, wX = 1, wG = 0 Alexander Tiskin (Warwick) Semi-local string comparison 43 / 132
  68. 68. Semi-local string comparisonWeighted alignmentThe LCS problem is a special case of the weighted alignment scoreproblem with weighted matches (wM ), mismatches (wX ) and gaps (wG ) LCS score: wM = 1, wX = wG = 0 Levenshtein score: wM = 2, wX = 1, wG = 0Alignment score is rational, if wM , wX , wG are rational numbersEquivalent to LCS score on blown-up strings Alexander Tiskin (Warwick) Semi-local string comparison 43 / 132
  69. 69. Semi-local string comparisonWeighted alignmentThe LCS problem is a special case of the weighted alignment scoreproblem with weighted matches (wM ), mismatches (wX ) and gaps (wG ) LCS score: wM = 1, wX = wG = 0 Levenshtein score: wM = 2, wX = 1, wG = 0Alignment score is rational, if wM , wX , wG are rational numbersEquivalent to LCS score on blown-up stringsEdit distance: minimum cost to transform a into b by weighted characteredits (insertion, deletion, substitution)Corresponds to weighted alignment score with wM = 0, insertion/deletionweight −wG , substitution weight −wX Alexander Tiskin (Warwick) Semi-local string comparison 43 / 132
  70. 70. Semi-local string comparisonWeighted alignmentWeighted alignment graph B A A B C A B C A B A C A blue = 0B red (solid) = 2A red (dotted) = 1ABCBCALevenshtein score(“BAABCBCA”, “CABCABA”) = 11 Alexander Tiskin (Warwick) Semi-local string comparison 44 / 132
  71. 71. Semi-local string comparisonWeighted alignmentAlignment graph for blown-up strings $B $A $A $B $C $A $B $C $A $B $A $C $A blue = 0$B red = 0.5 or 1$A$A$B$C$B$C$ALevenshtein score(“BAABCBCA”, “CABCABA”) = 2 · 5.5 Alexander Tiskin (Warwick) Semi-local string comparison 45 / 132
  72. 72. Semi-local string comparisonWeighted alignmentRational-weighted semi-local alignment reduced to semi-local LCS $B $A $A $B $C $A $B $C $A $B $A $C $A$B$A$A$B$C$B$C$ALet wM = 1, wX = µ , wG = 0 νIncrease × ν 2 in complexity (can be reduced to ν) Alexander Tiskin (Warwick) Semi-local string comparison 46 / 132
  73. 73. Alexander Tiskin (Warwick) Semi-local string comparison 47 / 132
  74. 74. 1 Introduction 7 Sparse string comparison2 Matrix distance multiplication 8 Compressed string comparison3 Semi-local string comparison 9 Beyond semi-locality4 The seaweed method 10 Conclusions and future work5 Periodic string comparison6 The transposition network method Alexander Tiskin (Warwick) Semi-local string comparison 48 / 132
  75. 75. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 49 / 132
  76. 76. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  77. 77. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  78. 78. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  79. 79. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  80. 80. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  81. 81. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  82. 82. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  83. 83. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  84. 84. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  85. 85. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  86. 86. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  87. 87. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  88. 88. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  89. 89. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  90. 90. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  91. 91. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  92. 92. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  93. 93. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  94. 94. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  95. 95. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  96. 96. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  97. 97. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  98. 98. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  99. 99. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  100. 100. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  101. 101. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  102. 102. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  103. 103. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  104. 104. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  105. 105. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  106. 106. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  107. 107. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  108. 108. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  109. 109. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  110. 110. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  111. 111. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  112. 112. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
  113. 113. The seaweed methodSeaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 51 / 132
  114. 114. The seaweed methodSeaweed combingSemi-local LCS: seaweed combing [T: 2006]Initialise seaweed braid: crossings in all mismatch cellsSweep cells in any -compatible orderMatch cell: two seaweeds uncrossed; skipMismatch cell: two seaweeds cross if the same seaweeds crossed before, uncross them otherwise skip, keep seaweeds crossedCell update: time O(1)Overall time O(mn) Alexander Tiskin (Warwick) Semi-local string comparison 52 / 132
  115. 115. The seaweed methodMicro-block seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 53 / 132
  116. 116. The seaweed methodMicro-block seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
  117. 117. The seaweed methodMicro-block seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
  118. 118. The seaweed methodMicro-block seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
  119. 119. The seaweed methodMicro-block seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
  120. 120. The seaweed methodMicro-block seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
  121. 121. The seaweed methodMicro-block seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
  122. 122. The seaweed methodMicro-block seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
  123. 123. The seaweed methodMicro-block seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
  124. 124. The seaweed methodMicro-block seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
  125. 125. The seaweed methodMicro-block seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
  126. 126. The seaweed methodMicro-block seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
  127. 127. The seaweed methodMicro-block seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
  128. 128. The seaweed methodMicro-block seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
  129. 129. The seaweed methodMicro-block seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 55 / 132
  130. 130. The seaweed methodMicro-block seaweed combingSemi-local LCS: micro-block seaweed combing [T: 2007]Initialise seaweed braid: crossings in all mismatch cellsSweep cells in micro-blocks, in any -compatible order log nMicro-block size: t = O log log nMicro-block interface: O(t) characters, each O(log σ) bits, can be reduced to O(log t) bits O(t) integers, each O(log n) bits, can be reduced to O(log t) bitsMicro-block update: time O(1), by precomputing all possible interfaces mn(log log n)2Overall time O log2 n Alexander Tiskin (Warwick) Semi-local string comparison 56 / 132
  131. 131. The seaweed methodCyclic LCSThe cyclic LCS problemGive the maximum LCS score for a vs all cyclic rotations of b Alexander Tiskin (Warwick) Semi-local string comparison 57 / 132
  132. 132. The seaweed methodCyclic LCSThe cyclic LCS problemGive the maximum LCS score for a vs all cyclic rotations of bCyclic LCS: running time mn 2O log n naiveO(mn log m) [Maes: 1990]O(mn) [Bunke, B¨hler: 1993; Landau+: 1998; Schmidt: 1998] u 2O mn(log 2log n) log n [T: 2007] Alexander Tiskin (Warwick) Semi-local string comparison 57 / 132
  133. 133. The seaweed methodCyclic LCSCyclic LCS: the algorithm mn(log log n)2Micro-block seaweed combing on a vs bb, time O log2 nMake n string-substring LCS queries, time negligible Alexander Tiskin (Warwick) Semi-local string comparison 58 / 132
  134. 134. The seaweed methodLongest repeating subsequenceThe longest repeating subsequence problemFind the longest subsequence of a that is a square (a repetition of twoidentical strings) Alexander Tiskin (Warwick) Semi-local string comparison 59 / 132
  135. 135. The seaweed methodLongest repeating subsequenceThe longest repeating subsequence problemFind the longest subsequence of a that is a square (a repetition of twoidentical strings)Longest repeating subsequence: running timeO(m3 ) naiveO(m2 ) [Kosowski: 2004] 2 log 2O m (log 2 m m) log [T: 2007] Alexander Tiskin (Warwick) Semi-local string comparison 59 / 132
  136. 136. The seaweed methodLongest repeating subsequenceLongest repeating subsequence: the algorithm m2 (log log m)2Micro-block seaweed combing on a vs a, time O log2 mMake m − 1 suffix-prefix LCS queries, time negligible Alexander Tiskin (Warwick) Semi-local string comparison 60 / 132
  137. 137. The seaweed methodApproximate matchingThe approximate pattern matching problemGive the substring closest to a by alignment score, starting at eachposition in bAssume rational alignment scoreApproximate pattern matching: running timeO(mn) [Sellers: 1980] mnO log n σ = O(1) via [Masek, Paterson: 1980] mn(log log n)2O log2 n via [Bille, Farach-Colton: 2008] Alexander Tiskin (Warwick) Semi-local string comparison 61 / 132
  138. 138. The seaweed methodApproximate matchingApproximate pattern matching: the algorithmMicro-block seaweed combing on a vs b (with blow-up), time 2O mn(log 2log n) log nThe implicit semi-local edit score matrix: an anti-Monge matrix approximate pattern matching ∼ row minimaRow minima in O(n) element queries [Aggarwal+: 1987]Each query in time O(log2 n) using the range tree representation,combined query time negligible mn(log log n)2Overall running time O log2 n , same as [Bille, Farach-Colton: 2008] Alexander Tiskin (Warwick) Semi-local string comparison 62 / 132
  139. 139. Alexander Tiskin (Warwick) Semi-local string comparison 63 / 132
  140. 140. 1 Introduction 7 Sparse string comparison2 Matrix distance multiplication 8 Compressed string comparison3 Semi-local string comparison 9 Beyond semi-locality4 The seaweed method 10 Conclusions and future work5 Periodic string comparison6 The transposition network method Alexander Tiskin (Warwick) Semi-local string comparison 64 / 132
  141. 141. Periodic string comparisonWraparound seaweed combingThe periodic string-substring LCS problemGive (implicit) LCS scores for a vs each substring of b = . . . uuu . . . = u ±∞Let u be of length pMay assume that every character of a occurs in uOnly substrings of b of length at most mp (otherwise LCS score is m) Alexander Tiskin (Warwick) Semi-local string comparison 65 / 132
  142. 142. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 66 / 132
  143. 143. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  144. 144. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  145. 145. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  146. 146. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  147. 147. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  148. 148. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  149. 149. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  150. 150. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  151. 151. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  152. 152. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  153. 153. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  154. 154. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  155. 155. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  156. 156. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  157. 157. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  158. 158. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  159. 159. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  160. 160. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  161. 161. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  162. 162. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  163. 163. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  164. 164. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  165. 165. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  166. 166. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  167. 167. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  168. 168. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  169. 169. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  170. 170. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  171. 171. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  172. 172. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  173. 173. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  174. 174. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  175. 175. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  176. 176. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  177. 177. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  178. 178. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  179. 179. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  180. 180. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  181. 181. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  182. 182. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
  183. 183. Periodic string comparisonWraparound seaweed combing B A A B C A B C A B A C ABAABCBCA Alexander Tiskin (Warwick) Semi-local string comparison 68 / 132
  184. 184. Periodic string comparisonWraparound seaweed combingPeriodic string-substring LCS: Wraparound seaweed combingInitialise seaweed braid: crossings in all mismatch cellsSweep cells row-by-row: each row starts at match cell, wraps at boundaryMatch cell: two seaweeds uncrossed; skipMismatch cell: two seaweeds cross if the same seaweeds crossed before (with wrapping), uncross them otherwise skip, keep seaweeds crossedCell update: time O(1)Overall time O(mn)String-substring LCS score: count seaweeds with multiplicities Alexander Tiskin (Warwick) Semi-local string comparison 69 / 132
  185. 185. Periodic string comparisonWraparound seaweed combingThe tandem LCS problemGive LCS score for a vs b = u kWe have n = kp; may assume k ≤ mTandem LCS: running timeO(mkp) naiveO m(k + p) [Landau, Ziv-Ukelson: 2001]O(mp) [T: 2009]Direct application of wraparound seaweed combing Alexander Tiskin (Warwick) Semi-local string comparison 70 / 132
  186. 186. Periodic string comparisonWraparound seaweed combingThe tandem alignment problemGive the substring closest to a by alignment score among certainsubstrings of b = u ±∞ : global: substrings u k of length kp across all k cyclic: substrings of length kp across all k local: substrings of any lengthTandem alignment: running timeO(m2 p) all naiveO(mp) global [Myers, Miller: 1989]O(mp log p) cyclic [Benson: 2005]O(mp) cyclic [T: 2009]O(mp) local [Myers, Miller: 1989] Alexander Tiskin (Warwick) Semi-local string comparison 71 / 132
  187. 187. Periodic string comparisonWraparound seaweed combingCyclic tandem alignment: the algorithmPeriodic seaweed combing for a vs b (with blow-up), time O(mp)For each k ∈ [1 : m]: solve tandem LCS (under given alignment score) for a vs u k obtain scores for a vs p successive substrings of b of length kp by LCS batch query: time O(1) per substringRunning time O(mp) Alexander Tiskin (Warwick) Semi-local string comparison 72 / 132
  188. 188. Alexander Tiskin (Warwick) Semi-local string comparison 73 / 132
  189. 189. 1 Introduction 7 Sparse string comparison2 Matrix distance multiplication 8 Compressed string comparison3 Semi-local string comparison 9 Beyond semi-locality4 The seaweed method 10 Conclusions and future work5 Periodic string comparison6 The transposition network method Alexander Tiskin (Warwick) Semi-local string comparison 74 / 132
  190. 190. The transposition network methodTransposition networksComparison network: a circuit of comparatorsA comparator sorts two inputs and outputs them in prescribed orderComparison networks traditionally used for non-branching merging/sortingClassical comparison networks # comparatorsmerging O(n log n) [Batcher: 1968]sorting O(n log2 n) [Batcher: 1968] O(n log n) [Ajtai+: 1983] Alexander Tiskin (Warwick) Semi-local string comparison 75 / 132
  191. 191. The transposition network methodTransposition networksComparison network: a circuit of comparatorsA comparator sorts two inputs and outputs them in prescribed orderComparison networks traditionally used for non-branching merging/sortingClassical comparison networks # comparatorsmerging O(n log n) [Batcher: 1968]sorting O(n log2 n) [Batcher: 1968] O(n log n) [Ajtai+: 1983]Comparison networks are visualised by wire diagramsTransposition network: all comparisons are between adjacent wires Alexander Tiskin (Warwick) Semi-local string comparison 75 / 132
  192. 192. The transposition network methodTransposition networksSeaweed combing as a transposition network −7 −5 −3 −1 A B C A +1 A +3 +5 C +7 B −7 C −1 +3 −3 −5 +7 +5 +1Character mismatches correspond to comparatorsInputs anti-sorted (sorted in reverse); each value traces a seaweed Alexander Tiskin (Warwick) Semi-local string comparison 76 / 132
  193. 193. The transposition network methodTransposition networksGlobal LCS: transposition network with binary input 0 0 0 0 A B C A 1 A 1 1 C 1 B 0 0 C 1 0 0 1 1 1Inputs still anti-sorted, but may not be distinctComparison between equal values is indeterminate Alexander Tiskin (Warwick) Semi-local string comparison 77 / 132
  194. 194. The transposition network methodParameterised string comparisonParameterised string comparisonString comparison sensitive e.g. to low similarity: small λ = LCS(a, b) high similarity: small κ = dist LCS (a, b) = m + n − 2λCan also use weighted alignment score or edit distanceAssume m = n, therefore κ = 2(n − λ) Alexander Tiskin (Warwick) Semi-local string comparison 78 / 132
  195. 195. The transposition network methodParameterised string comparisonLow-similarity comparison: small λ sparse set of matches, may need to look at them all preprocess matches for fast searching, time O(n log σ)High-similarity comparison: small κ set of matches may be dense, but only need to look at small subset no need to preprocess, linear search is OKFlexible comparison: sensitive to both high and low similarity, e.g. by bothcomparison types running alongside each other Alexander Tiskin (Warwick) Semi-local string comparison 79 / 132
  196. 196. The transposition network methodParameterised string comparisonParameterised string comparison: running timeLow-similarity, after preprocessing in O(n log σ)O(nλ) [Hirschberg: 1977] [Apostolico, Guerra: 1985] [Apostolico+: 1992]High-similarity, no preprocessingO(n · κ) [Ukkonen: 1985] [Myers: 1986]FlexibleO(λ · κ · log n) no preproc [Myers: 1986; Wu+: 1990]O(λ · κ) after preproc [Rick: 1995] Alexander Tiskin (Warwick) Semi-local string comparison 80 / 132
  197. 197. The transposition network methodParameterised string comparisonParameterised string comparison: the waterfall algorithmLow-similarity: O(n · λ) High-similarity: O(n · κ) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 0 0 1 1 0 0 1 0 1 1 1 0 1 1 0 0Trace 0s through network in contiguous blocks and gaps Alexander Tiskin (Warwick) Semi-local string comparison 81 / 132
  198. 198. The transposition network methodDynamic string comparisonThe dynamic LCS problemMaintain current LCS score under updates to one or both input stringsBoth input strings are streams, updated on-line: appending characters at left or right deleting characters at left or rightAssume for simplicity m ≈ n, i.e. m = Θ(n)Goal: linear time per update O(n) per update of a (n = |b|) O(m) per update of b (m = |a|) Alexander Tiskin (Warwick) Semi-local string comparison 82 / 132
  199. 199. The transposition network methodDynamic string comparisonDynamic LCS in linear time: update modelsleft right– app+del standard DP [Wagner, Fischer: 1974]app app a fixed [Landau+: 1998], [Kim, Park: 2004]app app [Ishida+: 2005]app+del app+del [T: NEW]Main idea: for append only, maintain seaweed matrix Pa,b for append+delete, maintain partial seaweed layout by tracing a transposition network Alexander Tiskin (Warwick) Semi-local string comparison 83 / 132
  200. 200. The transposition network methodBit-parallel string comparisonBit-parallel string comparisonString comparison using standard instructions on words of size wBit-parallel string comparison: running timeO(mn/w ) [Allison, Dix: 1986; Myers: 1999; Crochemore+: 2001] Alexander Tiskin (Warwick) Semi-local string comparison 84 / 132
  201. 201. The transposition network methodBit-parallel string comparisonBit-parallel string comparison: binary transposition networkIn every cell: input bits s, c; output bits s , c ; match/mismatch flag µ c s 0 1 0 1 0 1 0 1 µ ¬ c 0 0 1 1 0 0 1 1 µ 0 0 0 0 1 1 1 1 s s s 0 1 1 1 0 0 1 1 c 0 0 0 1 0 1 0 1 c c s 0 1 0 1 0 1 0 1 µ ∧ c 0 0 1 1 0 0 1 1 µ 0 0 0 0 1 1 1 1 s + s s 0 1 1 0 0 0 1 1 c 0 0 0 1 0 1 0 1 c Alexander Tiskin (Warwick) Semi-local string comparison 85 / 132
  202. 202. The transposition network methodBit-parallel string comparisonBit-parallel string comparison: binary transposition networkIn every cell: input bits s, c; output bits s , c ; match/mismatch flag µ c s 0 1 0 1 0 1 0 1 µ ¬ c 0 0 1 1 0 0 1 1 µ 0 0 0 0 1 1 1 1 s s s 0 1 1 1 0 0 1 1 c 0 0 0 1 0 1 0 1 c c s 0 1 0 1 0 1 0 1 µ ∧ c 0 0 1 1 0 0 1 1 µ 0 0 0 0 1 1 1 1 s + s s 0 1 1 0 0 0 1 1 c 0 0 0 1 0 1 0 1 c2c + s ← (s + (s ∧ µ) + c) ∨ (s ∧ ¬µ)S ← (S + (S ∧ M)) ∨ (S ∧ ¬M), where S, M are words of bits s, µ Alexander Tiskin (Warwick) Semi-local string comparison 85 / 132
  203. 203. Alexander Tiskin (Warwick) Semi-local string comparison 86 / 132
  204. 204. 1 Introduction 7 Sparse string comparison2 Matrix distance multiplication 8 Compressed string comparison3 Semi-local string comparison 9 Beyond semi-locality4 The seaweed method 10 Conclusions and future work5 Periodic string comparison6 The transposition network method Alexander Tiskin (Warwick) Semi-local string comparison 87 / 132
  205. 205. Sparse string comparisonSemi-local LCS between permutationsThe LCS problem on permutation stringsGive LCS score for a vs bIn each of a, b all characters distinct: total m = n matchesEquivalent to longest increasing subsequence (LIS) in a string maximum clique in a permutation graph maximum planar matching in an embedded bipartite graphLCS on permutation strings: running timeO(n log n) implicit in [Erd¨s, Szekeres: o 1935] [Robinson: 1938; Knuth: 1970; Dijkstra: 1980]O(n log log n) unit-RAM [Chang, Wang: 1992] [Bespamyatnikh, Segal: 2000] Alexander Tiskin (Warwick) Semi-local string comparison 88 / 132
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×