The document describes string comparison techniques using matrix algebra and seaweed matrices. It introduces the concept of semi-local string comparison, which involves comparing a whole string to substrings of another string. The key idea is representing string comparison matrices implicitly using seaweed matrices, which represent unit-Monge matrices. This allows developing algebraic techniques for efficiently multiplying such matrices using the algebra of braids and the seaweed monoid. These multiplication techniques can then be applied to problems like dynamic programming string comparison and comparing compressed strings.
The document discusses different single-source shortest path algorithms. It begins by defining shortest path and different variants of shortest path problems. It then describes Dijkstra's algorithm and Bellman-Ford algorithm for solving the single-source shortest paths problem, even in graphs with negative edge weights. Dijkstra's algorithm uses relaxation and a priority queue to efficiently solve the problem in graphs with non-negative edge weights. Bellman-Ford can handle graphs with negative edge weights but requires multiple relaxation passes to converge. Pseudocode and examples are provided to illustrate the algorithms.
As many things in the history of analysis of algorithms the all-pairs shortest path has a long history (From the point of view of Computer Science). We can see the initial results from the book “Studies in the Economics of Transportation” by Beckmann, McGuire, and Winsten (1956) where the notation that we use for
the matrix multiplication alike was first used.
In this slides, we go trough several all pairs Shortest path problem solutions from a slow version to the Johnson algorithm.
The document discusses algorithms for finding shortest paths between all pairs of vertices in a directed graph, including:
- Floyd-Warshall algorithm, which uses dynamic programming and matrix multiplication to compute the shortest paths matrix in O(V^3) time.
- Johnson's algorithm, which first reweights the graph to make all edge weights nonnegative, allowing it to use Dijkstra's algorithm repeatedly to solve the all-pairs shortest paths problem more efficiently for sparse graphs.
- Reweighting transforms the original graph in a way that preserves shortest path distances while ensuring nonnegative edge weights.
The document discusses shortest path problems and algorithms. It defines the shortest path problem as finding the minimum weight path between two vertices in a weighted graph. It presents the Bellman-Ford algorithm, which can handle graphs with negative edge weights but detects negative cycles. It also presents Dijkstra's algorithm, which only works for graphs without negative edge weights. Key steps of the algorithms include initialization, relaxation of edges to update distance estimates, and ensuring the shortest path property is satisfied.
The document discusses shortest path algorithms for graphs. It defines different variants of shortest path problems like single-source, single-destination, and all-pairs shortest paths. It presents algorithms like Bellman-Ford, Dijkstra's algorithm, and Floyd-Warshal algorithm to solve these problems. Bellman-Ford handles negative edge weights but has a higher time complexity of O(V^3) compared to Dijkstra's which only works for positive edges. Floyd-Warshal solves the all-pairs shortest paths problem in O(V^3) time using dynamic programming and matrix multiplication.
The document discusses the longest common subsequence (LCS) problem and presents a dynamic programming approach to solve it. It defines key terms like subsequence and common subsequence. It then presents a theorem that characterizes an LCS and shows it has optimal substructure. A recursive solution and algorithm to compute the length of an LCS are provided, with a running time of O(mn). The b table constructed enables constructing an LCS in O(m+n) time.
The document describes the single-source shortest paths algorithm for directed acyclic graphs (DAGs). It involves topologically sorting the vertices of the DAG, initializing distances from the source vertex s, and then relaxing edges in topologically sorted order. This guarantees that when a vertex u is relaxed, the shortest path distances from s to its neighbors will be accurate. The algorithm runs in O(V+E) time. It is used to find critical paths in PERT charts by finding the longest path after negating or reversing edge weights.
The document discusses different single-source shortest path algorithms. It begins by defining shortest path and different variants of shortest path problems. It then describes Dijkstra's algorithm and Bellman-Ford algorithm for solving the single-source shortest paths problem, even in graphs with negative edge weights. Dijkstra's algorithm uses relaxation and a priority queue to efficiently solve the problem in graphs with non-negative edge weights. Bellman-Ford can handle graphs with negative edge weights but requires multiple relaxation passes to converge. Pseudocode and examples are provided to illustrate the algorithms.
As many things in the history of analysis of algorithms the all-pairs shortest path has a long history (From the point of view of Computer Science). We can see the initial results from the book “Studies in the Economics of Transportation” by Beckmann, McGuire, and Winsten (1956) where the notation that we use for
the matrix multiplication alike was first used.
In this slides, we go trough several all pairs Shortest path problem solutions from a slow version to the Johnson algorithm.
The document discusses algorithms for finding shortest paths between all pairs of vertices in a directed graph, including:
- Floyd-Warshall algorithm, which uses dynamic programming and matrix multiplication to compute the shortest paths matrix in O(V^3) time.
- Johnson's algorithm, which first reweights the graph to make all edge weights nonnegative, allowing it to use Dijkstra's algorithm repeatedly to solve the all-pairs shortest paths problem more efficiently for sparse graphs.
- Reweighting transforms the original graph in a way that preserves shortest path distances while ensuring nonnegative edge weights.
The document discusses shortest path problems and algorithms. It defines the shortest path problem as finding the minimum weight path between two vertices in a weighted graph. It presents the Bellman-Ford algorithm, which can handle graphs with negative edge weights but detects negative cycles. It also presents Dijkstra's algorithm, which only works for graphs without negative edge weights. Key steps of the algorithms include initialization, relaxation of edges to update distance estimates, and ensuring the shortest path property is satisfied.
The document discusses shortest path algorithms for graphs. It defines different variants of shortest path problems like single-source, single-destination, and all-pairs shortest paths. It presents algorithms like Bellman-Ford, Dijkstra's algorithm, and Floyd-Warshal algorithm to solve these problems. Bellman-Ford handles negative edge weights but has a higher time complexity of O(V^3) compared to Dijkstra's which only works for positive edges. Floyd-Warshal solves the all-pairs shortest paths problem in O(V^3) time using dynamic programming and matrix multiplication.
The document discusses the longest common subsequence (LCS) problem and presents a dynamic programming approach to solve it. It defines key terms like subsequence and common subsequence. It then presents a theorem that characterizes an LCS and shows it has optimal substructure. A recursive solution and algorithm to compute the length of an LCS are provided, with a running time of O(mn). The b table constructed enables constructing an LCS in O(m+n) time.
The document describes the single-source shortest paths algorithm for directed acyclic graphs (DAGs). It involves topologically sorting the vertices of the DAG, initializing distances from the source vertex s, and then relaxing edges in topologically sorted order. This guarantees that when a vertex u is relaxed, the shortest path distances from s to its neighbors will be accurate. The algorithm runs in O(V+E) time. It is used to find critical paths in PERT charts by finding the longest path after negating or reversing edge weights.
The document discusses algorithms for finding shortest paths in graphs. It describes Dijkstra's algorithm and Bellman-Ford algorithm for solving the single-source shortest paths problem and Floyd-Warshall algorithm for solving the all-pairs shortest paths problem. Dijkstra's algorithm uses a priority queue to efficiently find shortest paths from a single source node to all others, assuming non-negative edge weights. Bellman-Ford handles graphs with negative edge weights but is slower. Floyd-Warshall finds shortest paths between all pairs of nodes in a graph.
Algorithm Design and Complexity - Course 7Traian Rebedea
The document discusses algorithms for graphs, including breadth-first search (BFS) and depth-first search (DFS). BFS uses a queue to traverse nodes level-by-level from a starting node, computing the shortest path. DFS uses a stack, exploring as far as possible along each branch before backtracking, and computes discovery and finish times for nodes. Both algorithms color nodes white, gray, black to track explored status and maintain predecessor pointers to reconstruct paths. Common graph representations like adjacency lists and matrices are also covered.
The document describes Johnson's algorithm for finding shortest paths between all pairs of vertices in a sparse graph. It discusses how the algorithm uses reweighting to compute new edge weights that preserve shortest paths while making all weights nonnegative. It shows how Dijkstra's algorithm can then be run on the reweighted graph to find shortest paths between all pairs of vertices. The key steps are: (1) adding a source node and zero-weight edges, (2) running Bellman-Ford to compute distances from the source, (3) using these distances to reweight the edges while preserving shortest paths, resulting in nonnegative weights.
This document describes the shortest path algorithm for finding the shortest distance between nodes in a weighted directed graph. It involves creating distance and sequence tables with N rows and N columns for a graph with N nodes. The algorithm iterates N-1 times to fill these tables. It works by checking if the distance between two nodes i and j can be made smaller by going through another node k, and updates the distance table with the minimum value at each step. The complexity of the algorithm is O(n2) space and O(n3) time for a graph with n nodes.
Algorithm Design and Complexity - Course 9Traian Rebedea
The document describes Kruskal's algorithm for finding the minimum spanning tree (MST) of a connected, undirected graph. Kruskal's algorithm works by sorting the edges by weight and then greedily adding edges to the MST one by one, skipping any edges that would create cycles. It uses the property that a minimum edge between disjoint components of the partial MST must be part of the overall MST. The runtime is O(ElogE) where E is the number of edges. An example run on a graph is shown step-by-step to demonstrate the algorithm.
The document discusses algorithms for solving single-source shortest path problems on directed graphs. It begins by defining the problem and describing Bellman-Ford's algorithm, which can handle graphs with negative edge weights but runs in O(VE) time. It then describes how Dijkstra's algorithm improves this to O(ElogV) time using a priority queue, but requires non-negative edge weights. It also discusses how shortest paths can be found in O(V+E) time on directed acyclic graphs by relaxing edges topologically. Finally, it discusses how difference constraints can be modeled as shortest path problems on a corresponding constraint graph.
Algorithm Design and Complexity - Course 8Traian Rebedea
The document discusses algorithms for finding strongly connected components (SCCs) in directed graphs. It describes Kosaraju's algorithm, which uses two depth-first searches (DFS) to find the SCCs. The first DFS computes a finishing time ordering, while the second DFS uses the transpose graph and the finishing time ordering to find the SCCs, outputting each SCC as a separate DFS tree. The algorithm runs in O(V+E) time and uses the property that the first DFS provides a topological sorting of the SCCs graph.
Dijkstra's algorithm is used to find the shortest path between a starting vertex and any other vertex in a graph with positive edge weights. It works by maintaining a distance label for each vertex, with the starting vertex's label set to 0. It then iteratively selects the unprocessed vertex with the smallest distance label and relaxes any incident edges that improve neighboring vertices' distance labels, until all vertices have been processed. Storing predecessor vertices allows reconstruction of the shortest path.
Linear Combination, Span And Linearly Independent, Dependent SetDhaval Shukla
In this presentation, the topic of Linear Combination, Span and Linearly Independent and Linearly Dependent Sets have been discussed. The sums for each topic have been given to understand the concept clearly for viewers.
The Bellman–Ford algorithm is an algorithm that computes shortest paths from a single source vertex to all of the other vertices in a weighted digraph.
The Floyd-Warshall algorithm is used to find the shortest paths between all pairs of vertices in a weighted graph. It creates distance and sequence tables with rows and columns for each vertex. The distance table stores the weights of edges between vertices. The sequence table stores the path vertices to reach between each pair. It iterates through the tables, updating distances and paths at each iteration based on whether a new shortest path is found through a third vertex. When complete, the final distance table shows the shortest distance between each pair of vertices, and the sequence table shows the shortest path.
The document describes Dijkstra's algorithm for finding the shortest paths between nodes in a graph. It begins with an overview of how the algorithm works by solving subproblems to find the shortest path from the source to increasingly more nodes. It then provides pseudocode for the algorithm and analyzes its time complexity of O(E+V log V) when using a Fibonacci heap. Some applications of the algorithm include traffic information systems and routing protocols like OSPF.
The document discusses lower bounds on kernelization for parameterized problems. It provides examples of known kernelization results for problems like MaxSat, Vertex Cover, Dominating Set, and Path that have linear, quadratic, cubic, and exponential kernel sizes. It then outlines techniques like the Fortnow-Santhanam theorem that can be used to rule out polynomial kernels for problems. The document explains how to define notions like the OR of a language and distillation algorithms, and how the Fortnow-Santhanam theorem can be applied to show that if a problem has a distillation algorithm, it is unlikely to have a polynomial kernel unless unlikely complexity collapses occur. It provides details on how to construct advice strings and a nondeterministic algorithm
Hand-outs of a lecture on Aho-Corarick string matching algorithm on Biosequence Algorithms course at University of Eastern Finland, Kuopio, in Spring 2012
The document discusses several problems related to the exponential time hypothesis (ETH). It states that unless ETH fails, there is no algorithm that can solve problems like Permutation Clique, Permutation Hitting Set, or Closest String in less than 2o(k log k) time, where k is a parameter of the problem. It also discusses how 3-Colorability can be reduced to the [k]x[k] Clique problem, and that unless ETH fails, [k]x[k] Clique does not have a 2o(k log k) time algorithm.
Single source stortest path bellman ford and dijkstraRoshan Tailor
This document discusses algorithms for finding shortest paths in weighted graphs:
- Dijkstra's algorithm finds single-source shortest paths in graphs with non-negative edge weights using a greedy approach and priority queue. It runs in O(ElogV) time with a Fibonacci heap.
- Bellman-Ford algorithm can handle graphs with negative edge weights by relaxing all edges V-1 times to detect negative cycles. It runs in O(VE) time.
- Examples are provided to illustrate the relaxation process and execution of both algorithms on sample graphs. Key properties like handling of negative weights and cycles are also explained.
The document discusses algorithms for finding shortest paths in graphs. It describes Dijkstra's algorithm and Bellman-Ford algorithm for solving the single-source shortest paths problem and Floyd-Warshall algorithm for solving the all-pairs shortest paths problem. Dijkstra's algorithm uses a priority queue to efficiently find shortest paths from a single source node to all others, assuming non-negative edge weights. Bellman-Ford handles graphs with negative edge weights but is slower. Floyd-Warshall finds shortest paths between all pairs of nodes in a graph.
Algorithm Design and Complexity - Course 7Traian Rebedea
The document discusses algorithms for graphs, including breadth-first search (BFS) and depth-first search (DFS). BFS uses a queue to traverse nodes level-by-level from a starting node, computing the shortest path. DFS uses a stack, exploring as far as possible along each branch before backtracking, and computes discovery and finish times for nodes. Both algorithms color nodes white, gray, black to track explored status and maintain predecessor pointers to reconstruct paths. Common graph representations like adjacency lists and matrices are also covered.
The document describes Johnson's algorithm for finding shortest paths between all pairs of vertices in a sparse graph. It discusses how the algorithm uses reweighting to compute new edge weights that preserve shortest paths while making all weights nonnegative. It shows how Dijkstra's algorithm can then be run on the reweighted graph to find shortest paths between all pairs of vertices. The key steps are: (1) adding a source node and zero-weight edges, (2) running Bellman-Ford to compute distances from the source, (3) using these distances to reweight the edges while preserving shortest paths, resulting in nonnegative weights.
This document describes the shortest path algorithm for finding the shortest distance between nodes in a weighted directed graph. It involves creating distance and sequence tables with N rows and N columns for a graph with N nodes. The algorithm iterates N-1 times to fill these tables. It works by checking if the distance between two nodes i and j can be made smaller by going through another node k, and updates the distance table with the minimum value at each step. The complexity of the algorithm is O(n2) space and O(n3) time for a graph with n nodes.
Algorithm Design and Complexity - Course 9Traian Rebedea
The document describes Kruskal's algorithm for finding the minimum spanning tree (MST) of a connected, undirected graph. Kruskal's algorithm works by sorting the edges by weight and then greedily adding edges to the MST one by one, skipping any edges that would create cycles. It uses the property that a minimum edge between disjoint components of the partial MST must be part of the overall MST. The runtime is O(ElogE) where E is the number of edges. An example run on a graph is shown step-by-step to demonstrate the algorithm.
The document discusses algorithms for solving single-source shortest path problems on directed graphs. It begins by defining the problem and describing Bellman-Ford's algorithm, which can handle graphs with negative edge weights but runs in O(VE) time. It then describes how Dijkstra's algorithm improves this to O(ElogV) time using a priority queue, but requires non-negative edge weights. It also discusses how shortest paths can be found in O(V+E) time on directed acyclic graphs by relaxing edges topologically. Finally, it discusses how difference constraints can be modeled as shortest path problems on a corresponding constraint graph.
Algorithm Design and Complexity - Course 8Traian Rebedea
The document discusses algorithms for finding strongly connected components (SCCs) in directed graphs. It describes Kosaraju's algorithm, which uses two depth-first searches (DFS) to find the SCCs. The first DFS computes a finishing time ordering, while the second DFS uses the transpose graph and the finishing time ordering to find the SCCs, outputting each SCC as a separate DFS tree. The algorithm runs in O(V+E) time and uses the property that the first DFS provides a topological sorting of the SCCs graph.
Dijkstra's algorithm is used to find the shortest path between a starting vertex and any other vertex in a graph with positive edge weights. It works by maintaining a distance label for each vertex, with the starting vertex's label set to 0. It then iteratively selects the unprocessed vertex with the smallest distance label and relaxes any incident edges that improve neighboring vertices' distance labels, until all vertices have been processed. Storing predecessor vertices allows reconstruction of the shortest path.
Linear Combination, Span And Linearly Independent, Dependent SetDhaval Shukla
In this presentation, the topic of Linear Combination, Span and Linearly Independent and Linearly Dependent Sets have been discussed. The sums for each topic have been given to understand the concept clearly for viewers.
The Bellman–Ford algorithm is an algorithm that computes shortest paths from a single source vertex to all of the other vertices in a weighted digraph.
The Floyd-Warshall algorithm is used to find the shortest paths between all pairs of vertices in a weighted graph. It creates distance and sequence tables with rows and columns for each vertex. The distance table stores the weights of edges between vertices. The sequence table stores the path vertices to reach between each pair. It iterates through the tables, updating distances and paths at each iteration based on whether a new shortest path is found through a third vertex. When complete, the final distance table shows the shortest distance between each pair of vertices, and the sequence table shows the shortest path.
The document describes Dijkstra's algorithm for finding the shortest paths between nodes in a graph. It begins with an overview of how the algorithm works by solving subproblems to find the shortest path from the source to increasingly more nodes. It then provides pseudocode for the algorithm and analyzes its time complexity of O(E+V log V) when using a Fibonacci heap. Some applications of the algorithm include traffic information systems and routing protocols like OSPF.
The document discusses lower bounds on kernelization for parameterized problems. It provides examples of known kernelization results for problems like MaxSat, Vertex Cover, Dominating Set, and Path that have linear, quadratic, cubic, and exponential kernel sizes. It then outlines techniques like the Fortnow-Santhanam theorem that can be used to rule out polynomial kernels for problems. The document explains how to define notions like the OR of a language and distillation algorithms, and how the Fortnow-Santhanam theorem can be applied to show that if a problem has a distillation algorithm, it is unlikely to have a polynomial kernel unless unlikely complexity collapses occur. It provides details on how to construct advice strings and a nondeterministic algorithm
Hand-outs of a lecture on Aho-Corarick string matching algorithm on Biosequence Algorithms course at University of Eastern Finland, Kuopio, in Spring 2012
The document discusses several problems related to the exponential time hypothesis (ETH). It states that unless ETH fails, there is no algorithm that can solve problems like Permutation Clique, Permutation Hitting Set, or Closest String in less than 2o(k log k) time, where k is a parameter of the problem. It also discusses how 3-Colorability can be reduced to the [k]x[k] Clique problem, and that unless ETH fails, [k]x[k] Clique does not have a 2o(k log k) time algorithm.
Single source stortest path bellman ford and dijkstraRoshan Tailor
This document discusses algorithms for finding shortest paths in weighted graphs:
- Dijkstra's algorithm finds single-source shortest paths in graphs with non-negative edge weights using a greedy approach and priority queue. It runs in O(ElogV) time with a Fibonacci heap.
- Bellman-Ford algorithm can handle graphs with negative edge weights by relaxing all edges V-1 times to detect negative cycles. It runs in O(VE) time.
- Examples are provided to illustrate the relaxation process and execution of both algorithms on sample graphs. Key properties like handling of negative weights and cycles are also explained.
Historically, the medical device industry has been highly attractive and relatively stable. As a consequence, established players have been able to compete successfully across the device spectrum, applying common business models and processes without much need for differentiation.
The future, however, is very different as disruptive change is underway. Companies will need to look at new segments and offer end-to-end solutions to secure additional revenue and maintain their profit margins.
The Skolkovo Robotics V conference will take place on April 21st, 2017 at the Skolkovo Innovation Center in Moscow. The one-day event will feature talks from leading experts on trends in robotics and AI, as well as an exhibition of robotics projects from startups and universities. Masterclasses on robotics and education technologies will also be provided. Over 2,500 visitors are expected to attend the largest robotics event in Russia, which aims to demonstrate advances in the field and facilitate collaboration between Russian scientists and international colleagues.
How do we see the healthcare's digital future and its impact on our lives?Jane Vita
"Healthcare is undergoing major changes spurred on by, but not limited to, technology.
Digitalisation is changing the way we think about health, what taking care of it really entails, our personal role in healthcare systems and the way we interact with technology in the context of health.
In many ways, we are entering a post-institutional age of increased personal responsibility, which presents healthcare service providers and other players in the field with major opportunities and great risks. Technology has the potential to empower people and help them become more active in the management of their and their families’ health. This will change the relationship of the patient and the caregiver in profound ways." Mirkka Länsisalo
A co-creation with Mirkka Läansisalo and Sala Heinänen, at Futurice.
This document discusses techniques for semi-local string comparison, which involves comparing whole strings to substrings or prefixes to suffixes. It introduces implicit unit-Monge matrices, which allow efficient querying of string distances. Several algorithms are described that use these matrices, including the seaweed method and transposition network method.
X-ray diffraction is a technique used to analyze the crystal structure of materials. It works by firing X-rays at a crystalline sample and observing the scattered rays, which are determined by the positions of atoms in the crystal structure. Bragg's law describes the conditions under which constructive interference of X-rays scattered from crystal planes occurs, producing a diffraction pattern. Analysis of diffraction patterns can provide information about a material's lattice structure, symmetry, and chemical composition. Miller indices are used to describe crystallographic planes and directions within a crystal lattice.
Wang-Landau Monte Carlo simulation is a method for calculating the density of states function which can then be used to calculate thermodynamic properties like the mean value of variables. It improves on traditional Monte Carlo methods which struggle at low temperatures due to complicated energy landscapes with many local minima separated by large barriers. The Wang-Landau algorithm calculates the density of states function directly rather than relying on sampling configurations, allowing it to overcome barriers and fully explore the configuration space even at low temperatures.
COMPUTING WIENER INDEX OF FIBONACCI WEIGHTED TREEScscpconf
Given a simple connected undirected graph = , with || =
and || = , the Wiener
index of is defined as half the sum of the distances of the form
, between all
pairs of vertices u, v of . If ,
is an edge-weighted graph, then the Wiener
index ,
of ,
is defined as the usual Wiener index but the distances is now
computed in ,
. The paper proposes a new algorithm for computing the Wiener index of a
Fibonacci weighted trees with Fibonacci branching in place of the available naive algorithm for the same. It is found that the time complexity of the algorithm is logarithmic. KEYW
The document summarizes a computational geometry problem of finding the largest area axis-parallel rectangle contained within a polygon. The speaker presents an algorithm to solve this problem in O(n log2 n) time. This is achieved by characterizing the possible contacts between the rectangle and polygon, developing a general framework based on rectangular visibility and matrix monotonicity, and using divide-and-conquer. Lower bounds are also established to show the problem requires Ω(n log n) time.
This document provides an overview of fundamentals of electromagnetics. It introduces key concepts such as coordinate systems, vector analysis, Maxwell's equations, and electromagnetic waves. Coordinate systems discussed include Cartesian, cylindrical, and spherical. Vector analysis tools like del operator, gradient, and divergence are explained. Maxwell's equations govern electromagnetics and electromagnetic waves. Understanding these fundamental concepts is important for studying communication systems where signals are transmitted as electromagnetic waves using devices like antennas.
The document contains 16 multiple choice questions about algorithms, data structures, and graph theory. Each question has 4 possible answers and the correct answer is provided. The maximum number of comparisons needed to merge sorted sequences is 358, and depth first search on a graph represented with an adjacency matrix has a worst case time complexity of O(n^2).
One of many important generalizations of ordinary Voronoi diagrams is the higher-order Voronoi diagram. The order-k Voronoi diagram is the partitioning of the plane into regions, such that each point within a fixed region has the same k nearest sites. Many algorithms have been developed that construct the higher-order Voronoi diagram of point-sites. In this talk we will discuss randomized algorithms that can be used for a larger class of sites—specifically, polygonal objects and the abstract setting. We describe the algorithms in combinatorial rather than geometric terms, which makes it possible to construct higher-order Voronoi diagrams that have bisectors satisfying certain combinatorial properties.
UCSD NANO 266 Quantum Mechanical Modelling of Materials and Nanostructures is a graduate class that provides students with a highly practical introduction to the application of first principles quantum mechanical simulations to model, understand and predict the properties of materials and nano-structures. The syllabus includes: a brief introduction to quantum mechanics and the Hartree-Fock and density functional theory (DFT) formulations; practical simulation considerations such as convergence, selection of the appropriate functional and parameters; interpretation of the results from simulations, including the limits of accuracy of each method. Several lab sessions provide students with hands-on experience in the conduct of simulations. A key aspect of the course is in the use of programming to facilitate calculations and analysis.
This document summarizes research into lattice sphere numbers, which is the minimum radius of a lattice sphere containing a non-trivial knot. Various lattices were considered including simple cubic, face-centered cubic, and two variations of body-centered cubic lattices. Formulas for counting points in lattice spheres were provided. Through counting arguments, graph theory, and computer generation of knots, the lattice sphere numbers were determined to be 3 for simple cubic, 2 for face-centered cubic, 3 for body-centered cubic-8, 2 for body-centered cubic-14, and 3 for simple hexagonal lattice. Computer generation of all possible knots in spheres of radius 2 or less showed that all knots were trivial, establishing lower bounds for the
The document discusses landmark-based routing techniques for networks. It proposes a new routing scheme that improves upon an existing scheme with O(n2/3 log4/3(n)) table size and stretch of 3, by achieving a table size of O(n1/2 log(n)) with the same stretch of 3. This is done by using a new labeling technique that assigns each node a (1+o(1))log2n-bit label to enable constant time routing decisions based on the labels.
The document discusses interval trees and breadth-first search (BFS) algorithms. Interval trees are used to maintain a set of intervals and efficiently find overlapping intervals given a query. BFS is a graph search algorithm that explores all neighboring vertices of a starting node before moving to neighbors of neighbors. BFS builds a breadth-first tree and calculates the shortest path distances from the source node in O(V+E) time and space.
Finding similar items in high dimensional spaces locality sensitive hashingDmitriy Selivanov
This document discusses using locality sensitive hashing (LSH) to efficiently find similar items in high-dimensional spaces. It describes how LSH works by first representing items as sets of shingles/n-grams, then using minhashing to map these sets to compact signatures while preserving similarity. It explains that LSH further hashes the signatures into "bands" to generate candidate pairs that are likely similar and need direct comparison. The number of bands can be tuned to tradeoff between finding most similar pairs vs few dissimilar pairs.
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...Mail.ru Group
Дмитрий рассказал о методе снижения размерности многомерных данных – Locality Sensitive Hashing. На примере задачи поиска похожих текстовых документов гости был подробно разобран алгоритм Minhash.
This first lecture describes what EMT is. Its history of evolution. Main personalities how discovered theories relating to this theory. Applications of EMT . Scalars and vectors and there algebra. Coordinate systems. Field, Coulombs law and electric field intensity.volume charge distribution, electric flux density, gauss's law and divergence
In many scientific areas, systems can be described as interaction networks where elements correspond to vertices and interactions to edges. A variety of problems in those fields can deal with network comparison and characterization.
The problem of comparing and characterizing networks is the task of measuring their structural similarity and finding characteristics which capture structural information. In order to analyze complex networks, several methods can be combined, such as graph theory, information theory, and statistics.
In this project, we present methods for measuring Shannon’s entropy of graphs.
The document discusses the scattering matrix, which completely describes the behavior of a linear, multi-port microwave device at a given frequency. The scattering matrix relates the incident and reflected wave amplitudes at each port. It allows one to characterize the device using scattering parameters (S-parameters), which are the ratios of the wave amplitudes. Terminating unused ports in matched loads ensures only one incident wave is non-zero, simplifying S-parameter measurement. The scattering matrix and S-parameters provide useful information about the device and can be used to analyze microwave circuits.
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
The GraphNet (aka S-Lasso), as well as other “sparsity + structure” priors like TV (Total-Variation), TV-L1, etc., are not easily applicable to brain data because of technical problems
relating to the selection of the regularization parameters. Also, in
their own right, such models lead to challenging high-dimensional optimization problems. In this manuscript, we present some heuristics for speeding up the overall optimization process: (a) Early-stopping, whereby one halts the optimization process when the test score (performance on leftout data) for the internal cross-validation for model-selection stops improving, and (b) univariate feature-screening, whereby irrelevant (non-predictive) voxels are detected and eliminated before the optimization problem is entered, thus reducing the size of the problem. Empirical results with GraphNet on real MRI (Magnetic Resonance Imaging) datasets indicate that these heuristics are a win-win strategy, as they add speed without sacrificing the quality of the predictions. We expect the proposed heuristics to work on other models like TV-L1, etc.
Relaxation methods for the matrix exponential on large networksDavid Gleich
My talk from the Stanford ICME seminar series on doing network analysis and link prediction using the a fast algorithm for the matrix exponential on graph problems.
The document provides an overview of the KNIME analytics platform and its capabilities. It discusses:
- KNIME's origins, offices, codebase, and application areas including pharma, healthcare, finance, retail, and more.
- The key components of the KNIME platform including data access, transformation, analysis, visualization, and deployment capabilities.
- Integrations with tools like R, Weka, databases, and file formats.
- Community contributions expanding KNIME's functionality in areas like bioinformatics, chemistry, image processing, and more.
Ядерный век прошел, и становится все понятнее, что в фокусе науки 21-го века будут живые системы, медицина, и человек во всех его проявлениях. Здесь осуществляются самые масштабные финансовые вливания, и на эту отрасль человечество возлагает самые большие надежды. Все чаще слышатся предметные обсуждения тем, казавшихся еще недавно научной фантастикой: сможет ли человечество победить старение, рак, и другие смертельные заболевания? Сможет ли менять свой геном по собственному желанию? Будем ли мы хозяевами своим телам в той же мере, как мы хозяйничаем на Земле?
Многие десятилетия биология и медицина развивались как описательные науки. Однако по мере созревания и накопления информации, любая наука рано или поздно переходит на более точный язык - язык математики. Проект "Геном человека" обеспечил технологический прорыв, который будет питать науку о живом еще много лет - но который также поставил много новых глобальных вопросов перед современными учеными.
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...BioinformaticsInstitute
This document summarizes recent advances in cancer immunotherapy from the perspective of systems biology. It discusses how checkpoint blockade immunotherapy works by addressing the second co-inhibitory checkpoint signal needed for T cell activation. Computational methods are now able to identify tumor-specific neoantigens that can be targeted by immunotherapy. Mouse model studies showed that certain tumors are naturally rejected due to expression of a mutant antigen recognized by T cells, and that antigen-specific T cells are present before immunotherapy treatment. The high mutational load in melanoma makes it particularly responsive to checkpoint blockade. Early work in the 19th century by William Coley observed tumor regression following bacterial infection, which led to development of a toxin mixture that resembled modern vaccine formulations. Members of
http://bioinformaticsinstitute.ru/guests
В пятницу 10 октября в 19.00 Мария Шутова (ИоГЕН РАН) выступала в Институте биоинформатики с открытой лекцией, посвященной изучению рака.
Рак -- одна из наиболее распространенных причин смерти по всему миру. В лекции рассматривается, как знания об эволюции, работе генома, репрограммировании, а также использование биоинформатических методов помогли лучше понять, как развивается раковая опухоль и предложить новые методы лечения разнообразных типов рака. Рассмотрены мышиные модели развития рака и интересные результаты, которые были получены с их помощью.
http://bioinformaticsinstitute.ru/lectures
Гостевая лекция Института биоинформатики, 9 октября 2014. Лектор -- Мария Шутова (ИоГЕН РАН).
За последние десять лет плюрипонтентные клетки стали героями двух Нобелевских премий и многих тысяч научных и научно-популярных статей. Их уникальная возможность превращаться в любую клетку взрослого организма до сих пор дает пищу для ума как биологам развития, так и ученым, ищущим способы лечения генетических заболеваний. В лекции будет рассказано о двух типах плюрипотентных клеток: "естественных" (эмбриональные стволовые клетки) и "искусственных" (индуцированные плюрипотентные стволовые клетки). Отдельно мы остановимся на том, как знания о работе транскрипционных факторов помогли репрограммировать клетки, и как эти "искусственные" плюрипотентные клетки можно использовать в медицине.
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...BioinformaticsInstitute
This document summarizes genetic analyses of complex human phenotypes. It describes whole genome sequencing of individuals from bipolar disorder families and finding an association between genetic variation in a chromosome 6 region and amygdala volume. It also discusses rare variant sequencing of metabolic syndrome-related genes in Finnish cohorts, identifying new signals beyond existing GWAS hits. Additionally, it outlines exome and targeted sequencing of Tourette syndrome pedigrees, with a genome-wide significant result in a long non-coding RNA gene linked to the trait.
В своей лекции Андрей Афанасьев рассказал о стартапах в биотехе и биоинформатике и своем биоинформатическом проекте iBinom, разобрал несколько биотехнологических проектов глазами инноваторов и инвесторов, а также коснулся вопроса поиска инвестиций и поделился личным опытом взаимодействия с венчурными фондами и институтами развития.
This document provides an overview of the ENCODE project and how its data can be accessed through the UCSC Genome Browser. It discusses the different types of ENCODE data available, including mapping data, gene annotations, expression data, regulatory information, and genetic variation. It also explains how to find, view, and download ENCODE tracks from the Genome Browser and where to get more information about ENCODE. The overall goal of the ENCODE project is to identify all functional elements in the human genome.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Leonel Morgado
Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...AbdullaAlAsif1
The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
1. A superglue for string comparison
Alexander Tiskin
Department of Computer Science, University of Warwick
http://go.warwick.ac.uk/alextiskin
Alexander Tiskin (Warwick) Semi-local LCS 1 / 164
4. Introduction
String matching: finding an exact pattern in a string
String comparison: finding similar patterns in two strings
Alexander Tiskin (Warwick) Semi-local LCS 4 / 164
5. Introduction
String matching: finding an exact pattern in a string
String comparison: finding similar patterns in two strings
global: whole string a vs whole string b
semi-local: whole string a vs substrings in b (approximate matching);
prefixes in a vs suffixes in b
local: substrings in a vs substrings in b
Alexander Tiskin (Warwick) Semi-local LCS 4 / 164
6. Introduction
String matching: finding an exact pattern in a string
String comparison: finding similar patterns in two strings
global: whole string a vs whole string b
semi-local: whole string a vs substrings in b (approximate matching);
prefixes in a vs suffixes in b
local: substrings in a vs substrings in b
We focus on semi-local comparison:
fundamentally important for both global and local comparison
exciting mathematical properties
Alexander Tiskin (Warwick) Semi-local LCS 4 / 164
7. Introduction
Overview
Standard approach to string comparison: dynamic programming
Our approach: the algebra of seaweed matrices/braids
Can be used either for dynamic programming, or for divide-and-conquer
Divide-and-conquer is more efficient for:
comparing dynamic strings (truncation, concatenation)
comparing compressed strings (e.g. LZ-compression)
comparing strings in parallel
The “conquer” step involves a magic “superglue” (efficient multiplication
of seaweed matrices/braids)
Alexander Tiskin (Warwick) Semi-local LCS 5 / 164
8. Introduction
Terminology and notation
The “squared paper” notation:
Integers {. . . − 2, −1, 0, 1, 2, . . .} x− = x − 1
2 x+ = x + 1
2
Half-integers . . . − 3
2, −1
2, 1
2, 3
2, 5
2, . . . = . . . (−2)+, (−1)+, 0+, 1+, 2+
Planar dominance:
(i, j) (i , j ) iff i < i and j < j (the “above-left” relation)
(i, j) (i , j ) iff i > i and j < j (the “below-left” relation)
where “above/below” follow matrix convention (not the Cartesian one!)
Alexander Tiskin (Warwick) Semi-local LCS 6 / 164
9. Introduction
Terminology and notation
A permutation matrix is a 0/1 matrix with exactly one nonzero per row
and per column
0 1 0
1 0 0
0 0 1
Alexander Tiskin (Warwick) Semi-local LCS 7 / 164
10. Introduction
Terminology and notation
Given matrix D, its distribution matrix is made up of -dominance sums:
DΣ(i, j) = ˆı>i,ˆ<j D(i, j)
Alexander Tiskin (Warwick) Semi-local LCS 8 / 164
11. Introduction
Terminology and notation
Given matrix D, its distribution matrix is made up of -dominance sums:
DΣ(i, j) = ˆı>i,ˆ<j D(i, j)
Given matrix E, its density matrix is made up of quadrangle differences:
E (ˆı, ˆ) = E(ˆı−, ˆ+) − E(ˆı−, ˆ−) − E(ˆı+, ˆ+) + E(ˆı+, ˆ−)
where DΣ, E over integers; D, E over half-integers
Alexander Tiskin (Warwick) Semi-local LCS 8 / 164
12. Introduction
Terminology and notation
Given matrix D, its distribution matrix is made up of -dominance sums:
DΣ(i, j) = ˆı>i,ˆ<j D(i, j)
Given matrix E, its density matrix is made up of quadrangle differences:
E (ˆı, ˆ) = E(ˆı−, ˆ+) − E(ˆı−, ˆ−) − E(ˆı+, ˆ+) + E(ˆı+, ˆ−)
where DΣ, E over integers; D, E over half-integers
0 1 0
1 0 0
0 0 1
Σ
=
0 1 2 3
0 1 1 2
0 0 0 1
0 0 0 0
0 1 2 3
0 1 1 2
0 0 0 1
0 0 0 0
=
0 1 0
1 0 0
0 0 1
Alexander Tiskin (Warwick) Semi-local LCS 8 / 164
13. Introduction
Terminology and notation
Given matrix D, its distribution matrix is made up of -dominance sums:
DΣ(i, j) = ˆı>i,ˆ<j D(i, j)
Given matrix E, its density matrix is made up of quadrangle differences:
E (ˆı, ˆ) = E(ˆı−, ˆ+) − E(ˆı−, ˆ−) − E(ˆı+, ˆ+) + E(ˆı+, ˆ−)
where DΣ, E over integers; D, E over half-integers
0 1 0
1 0 0
0 0 1
Σ
=
0 1 2 3
0 1 1 2
0 0 0 1
0 0 0 0
0 1 2 3
0 1 1 2
0 0 0 1
0 0 0 0
=
0 1 0
1 0 0
0 0 1
(DΣ) = D for all D
Matrix E is simple, if (E )Σ = E; equivalently, if it has all zeros in the left
column and bottom row
Alexander Tiskin (Warwick) Semi-local LCS 8 / 164
14. Introduction
Terminology and notation
Matrix E is Monge, if E is nonnegative
Intuition: boundary-to-boundary distances in a (weighted) planar graph
G. Monge (1746–1818)
G (planar)
i
j
0
1 2 3
4
0
1 2 3
4
Alexander Tiskin (Warwick) Semi-local LCS 9 / 164
15. Introduction
Terminology and notation
Matrix E is Monge, if E is nonnegative
Intuition: boundary-to-boundary distances in a (weighted) planar graph
G. Monge (1746–1818)
i
j
0
1 2 3
4
0
1 2 3
4
blue + blue ≤ red + red
Alexander Tiskin (Warwick) Semi-local LCS 10 / 164
16. Introduction
Terminology and notation
Matrix E is unit-Monge, if E is a permutation matrix
Intuition: boundary-to-boundary distances in a grid-like graph (in
particular, an alignment graph for a pair of strings)
Alexander Tiskin (Warwick) Semi-local LCS 11 / 164
17. Introduction
Terminology and notation
Matrix E is unit-Monge, if E is a permutation matrix
Intuition: boundary-to-boundary distances in a grid-like graph (in
particular, an alignment graph for a pair of strings)
Simple unit-Monge matrix (sometimes called “rank function”): PΣ, where
P is a permutation matrix
Seaweed matrix: P used as an implicit representation of PΣ
0 1 0
1 0 0
0 0 1
Σ
=
0 1 2 3
0 1 1 2
0 0 0 1
0 0 0 0
Alexander Tiskin (Warwick) Semi-local LCS 11 / 164
18. Introduction
Implicit unit-Monge matrices
Efficient PΣ queries: range tree on nonzeros of P [Bentley: 1980]
binary search tree by i-coordinate
under every node, binary search tree by j-coordinate
•
•
•
•
−→ •
•
•
•
−→ •
•
•
•
↓
•
•
•
•
−→ •
•
•
•
−→ •
•
•
•
↓
•
•
•
•
−→ •
•
•
•
−→ •
•
•
•
Alexander Tiskin (Warwick) Semi-local LCS 12 / 164
19. Introduction
Implicit unit-Monge matrices
Efficient PΣ queries: (contd.)
Every node of the range tree represents a canonical range (rectangular
region), and stores its nonzero count
Overall, ≤ n log n canonical ranges are non-empty
A PΣ query is equivalent to -dominance counting: how many nonzeros
are -dominated by query point?
Answer: sum up nonzero counts in ≤ log2
n disjoint canonical ranges
Total size O(n log n), query time O(log2
n)
Alexander Tiskin (Warwick) Semi-local LCS 13 / 164
20. Introduction
Implicit unit-Monge matrices
Efficient PΣ queries: (contd.)
Every node of the range tree represents a canonical range (rectangular
region), and stores its nonzero count
Overall, ≤ n log n canonical ranges are non-empty
A PΣ query is equivalent to -dominance counting: how many nonzeros
are -dominated by query point?
Answer: sum up nonzero counts in ≤ log2
n disjoint canonical ranges
Total size O(n log n), query time O(log2
n)
There are asymptotically more efficient (but less practical) data structures
Total size O(n), query time O log n
log log n [J´aJ´a+: 2004]
[Chan, Pˇatra¸scu: 2010]
Alexander Tiskin (Warwick) Semi-local LCS 13 / 164
22. Matrix distance multiplication
Seaweed braids
Distance semiring (or (min, +)-semiring, special case of tropical algebra):
addition ⊕ given by min, multiplication given by +
Matrix -multiplication
A B = C C(i, k) = j A(i, j) B(j, k) = minj A(i, j) + B(j, k)
Intuition: gluing distances in a composition of planar graphs
i
G
j
G
k
Alexander Tiskin (Warwick) Semi-local LCS 15 / 164
23. Matrix distance multiplication
Seaweed braids
Matrix classes closed under -multiplication (for given n):
general (integer, real) matrices ∼ general weighted graphs
Monge matrices ∼ planar weighted graphs
simple unit-Monge matrices ∼ grid-like graphs
Implicit -multiplication: define P Q = R as PΣ QΣ = RΣ
The seaweed monoid Tn:
simple unit-Monge matrices under
equivalently, permutation (seaweed) matrices under
Isomorphic to the 0-Hecke monoid of the symmetric group H0(Sn)
Alexander Tiskin (Warwick) Semi-local LCS 16 / 164
24. Matrix distance multiplication
Seaweed braids
P Q = R can be seen as combing of seaweed braids
P Q
=
R
Alexander Tiskin (Warwick) Semi-local LCS 17 / 164
25. Matrix distance multiplication
Seaweed braids
P Q = R can be seen as combing of seaweed braids
P Q
=
R
P
Q
=
Alexander Tiskin (Warwick) Semi-local LCS 17 / 164
26. Matrix distance multiplication
Seaweed braids
P Q = R can be seen as combing of seaweed braids
P Q
=
R
P
Q
= =
Alexander Tiskin (Warwick) Semi-local LCS 17 / 164
27. Matrix distance multiplication
Seaweed braids
P Q = R can be seen as combing of seaweed braids
P Q
=
R
P
Q
= = R
Alexander Tiskin (Warwick) Semi-local LCS 17 / 164
28. Matrix distance multiplication
Seaweed braids
The seaweed monoid Tn:
n! elements (permutations of size n)
n − 1 generators g1, g2, . . . , gn−1 (elementary crossings)
Idempotence:
g2
i = gi for all i =
Far commutativity:
gi gj = gj gi j − i > 1 · · · = · · ·
Braid relations:
gi gj gi = gj gi gj j − i = 1 =
Alexander Tiskin (Warwick) Semi-local LCS 18 / 164
30. Matrix distance multiplication
Seaweed braids
Seaweed monoid: g2
i = gi ; far comm; braid
Related structures:
classical braid group: gi g−1
i = 1; far comm; braid
Sn (Coxeter presentation): g2
i = 1; far comm; braid
positive braid monoid: far comm; braid
locally free idempotent monoid: g2
i = gi ; far comm
nil-Hecke monoid: g2
i = 0; far comm; braid
Generalisations:
0-Hecke monoid of a Coxeter group
Hecke–Kiselman monoid
J -trivial monoid
Alexander Tiskin (Warwick) Semi-local LCS 20 / 164
31. Matrix distance multiplication
Seaweed braids
Computation in the seaweed monoid: a confluent rewriting system can be
obtained by software (Semigroupe, GAP, Sage)
Alexander Tiskin (Warwick) Semi-local LCS 21 / 164
32. Matrix distance multiplication
Seaweed braids
Computation in the seaweed monoid: a confluent rewriting system can be
obtained by software (Semigroupe, GAP, Sage)
T3: 1, a = g1, b = g2; ab, ba; aba = 0
aa → a bb → b bab → 0 aba → 0
Alexander Tiskin (Warwick) Semi-local LCS 21 / 164
33. Matrix distance multiplication
Seaweed braids
Computation in the seaweed monoid: a confluent rewriting system can be
obtained by software (Semigroupe, GAP, Sage)
T3: 1, a = g1, b = g2; ab, ba; aba = 0
aa → a bb → b bab → 0 aba → 0
T4: 1, a = g1, b = g2, c = g3; ab, ac, ba, bc, cb, aba, abc, acb, bac,
bcb, cba, abac, abcb, acba, bacb, bcba, abacb, abcba, bacba; abacba = 0
aa → a
bb → b
ca → ac
cc → c
bab → aba
cbc → bcb
cbac → bcba
abacba → 0
Alexander Tiskin (Warwick) Semi-local LCS 21 / 164
34. Matrix distance multiplication
Seaweed braids
Computation in the seaweed monoid: a confluent rewriting system can be
obtained by software (Semigroupe, GAP, Sage)
T3: 1, a = g1, b = g2; ab, ba; aba = 0
aa → a bb → b bab → 0 aba → 0
T4: 1, a = g1, b = g2, c = g3; ab, ac, ba, bc, cb, aba, abc, acb, bac,
bcb, cba, abac, abcb, acba, bacb, bcba, abacb, abcba, bacba; abacba = 0
aa → a
bb → b
ca → ac
cc → c
bab → aba
cbc → bcb
cbac → bcba
abacba → 0
Easy to use as a “black box”, but not efficient
Alexander Tiskin (Warwick) Semi-local LCS 21 / 164
35. Matrix distance multiplication
Seaweed matrix multiplication
Seaweed matrix -multiplication
Given permutation matrices P, Q, compute R, such that PΣ QΣ = RΣ
(equivalently, P Q = R)
Alexander Tiskin (Warwick) Semi-local LCS 22 / 164
36. Matrix distance multiplication
Seaweed matrix multiplication
Seaweed matrix -multiplication
Given permutation matrices P, Q, compute R, such that PΣ QΣ = RΣ
(equivalently, P Q = R)
Seaweed matrix -multiplication: running time
type time
general O(n3) standard
O n3(log log n)3
log2
n
[Chan: 2007]
Monge O(n2) via [Aggarwal+: 1987]
implicit unit-Monge O(n1.5) [T: 2006]
O(n log n) [T: 2010]
Alexander Tiskin (Warwick) Semi-local LCS 22 / 164
46. Matrix distance multiplication
Seaweed matrix multiplication
Seaweed matrix -multiplication: Steady Ant algorithm
RΣ(i, k) = minj PΣ(i, j) + QΣ(j, k)
Divide-and-conquer on the range of j: two subproblems of size n/2
PΣ
lo QΣ
lo = RΣ
lo PΣ
hi QΣ
hi = RΣ
hi
Conquer: tracing a balanced trail from bottom-left to top-right R
Trail invariant: equal number of -dominated nonzeros in PC,hi and
-dominating nonzeros in PC,lo
All such nonzeros are “errors” in the output, must be removed
Must compensate by placing nonzeros on the path, time O(n)
Overall time O(n log n)
Alexander Tiskin (Warwick) Semi-local LCS 27 / 164
47. Matrix distance multiplication
Bruhat order
Bruhat order
Permutation A is lower (“more sorted”) than permutation B in the Bruhat
order (A B), if B A by successive pairwise sorting (equivalently,
A B by anti-sorting) of arbitrary pairs
Permutation matrices: P Q, if Q P by successive 2 × 2 submatrix
sorting: ( 0 1
1 0 ) → ( 1 0
0 1 )
Alexander Tiskin (Warwick) Semi-local LCS 28 / 164
48. Matrix distance multiplication
Bruhat order
Bruhat order
Permutation A is lower (“more sorted”) than permutation B in the Bruhat
order (A B), if B A by successive pairwise sorting (equivalently,
A B by anti-sorting) of arbitrary pairs
Permutation matrices: P Q, if Q P by successive 2 × 2 submatrix
sorting: ( 0 1
1 0 ) → ( 1 0
0 1 )
Plays an important role in group theory and algebraic geometry (inclusion
order of Schubert varieties)
Describes pivoting order in Gaussian elimination (matrix Bruhat
decomposition)
Alexander Tiskin (Warwick) Semi-local LCS 28 / 164
49. Matrix distance multiplication
Bruhat order
Bruhat comparability: running time
O(n2) [Ehresmann: 1934; Proctor: 1982; Grigoriev: 1982]
O(n log n) [T: 2013]
O n log n
log log n [Gawrychowski: NEW]
Alexander Tiskin (Warwick) Semi-local LCS 29 / 164
51. Matrix distance multiplication
Bruhat order
Seaweed criterion
P Q iff PR Q = IdR
, where PR = counterclockwise rotation of P
Intuition: permutations represented by seaweed braids
P Q, iff
no pair of seaweeds is crossed in P, while the “corresponding” pair is
uncrossed in Q
equivalently, no pair is uncrossed in PR, while the “corresponding”
pair is uncrossed in Q
equivalently, PR Q = IdR
Time O(n log n) by seaweed matrix multiplication
Alexander Tiskin (Warwick) Semi-local LCS 31 / 164
60. Matrix distance multiplication
Bruhat order
Alternative solution: clever implementation of Ehresmann’s criterion
[Gawrychowski: 2013]
The online partial sums problem: maintain array X[1 : n], subject to
update(k, ∆): X[k] ← X[k] + ∆
prefixsum(k): return 1≤i≤k X[i]
Query time:
Θ(log n) in semigroup or group model
Θ log n
log log n in RAM model on integers [Pˇatra¸scu, Demaine: 2004]
Gives Bruhat comparability in time O n log n
log log n in RAM model
Open problem: seaweed multiplication in time O n log n
log log n ?
Alexander Tiskin (Warwick) Semi-local LCS 34 / 164
62. Semi-local string comparison
Semi-local LCS and edit distance
Consider strings (= sequences) over an alphabet of size σ
Contiguous substrings vs not necessarily contiguous subsequences
Special cases of substring: prefix, suffix
Notation: strings a, b of length m, n respectively
Assume where necessary: m ≤ n; m, n reasonably close
Alexander Tiskin (Warwick) Semi-local LCS 36 / 164
63. Semi-local string comparison
Semi-local LCS and edit distance
Consider strings (= sequences) over an alphabet of size σ
Contiguous substrings vs not necessarily contiguous subsequences
Special cases of substring: prefix, suffix
Notation: strings a, b of length m, n respectively
Assume where necessary: m ≤ n; m, n reasonably close
The longest common subsequence (LCS) score:
length of longest string that is a subsequence of both a and b
equivalently, alignment score, where score(match) = 1 and
score(mismatch) = 0
In biological terms, “loss-free alignment” (unlike efficient but “lossy”
BLAST)
Alexander Tiskin (Warwick) Semi-local LCS 36 / 164
64. Semi-local string comparison
Semi-local LCS and edit distance
The LCS problem
Give the LCS score for a vs b
Alexander Tiskin (Warwick) Semi-local LCS 37 / 164
65. Semi-local string comparison
Semi-local LCS and edit distance
The LCS problem
Give the LCS score for a vs b
LCS: running time
O(mn) [Wagner, Fischer: 1974]
O mn
log2
n
σ = O(1) [Masek, Paterson: 1980]
[Crochemore+: 2003]
O mn(log log n)2
log2
n
[Paterson, Danˇc´ık: 1994]
[Bille, Farach-Colton: 2008]
Running time varies depending on the RAM model version
We assume word-RAM with word size log n (where it matters)
Alexander Tiskin (Warwick) Semi-local LCS 37 / 164
66. Semi-local string comparison
Semi-local LCS and edit distance
LCS computation by dynamic programming
lcs(a, ∅) = 0
lcs(∅, b) = 0
lcs(aα, bβ) =
max(lcs(aα, b), lcs(a, bβ)) if α = β
lcs(a, b) + 1 if α = β
Alexander Tiskin (Warwick) Semi-local LCS 38 / 164
67. Semi-local string comparison
Semi-local LCS and edit distance
LCS computation by dynamic programming
lcs(a, ∅) = 0
lcs(∅, b) = 0
lcs(aα, bβ) =
max(lcs(aα, b), lcs(a, bβ)) if α = β
lcs(a, b) + 1 if α = β
∗ D E F I N E
∗ 0 0 0 0 0 0 0
D 0
E 0
S 0
I 0
G 0
N 0
Alexander Tiskin (Warwick) Semi-local LCS 38 / 164
68. Semi-local string comparison
Semi-local LCS and edit distance
LCS computation by dynamic programming
lcs(a, ∅) = 0
lcs(∅, b) = 0
lcs(aα, bβ) =
max(lcs(aα, b), lcs(a, bβ)) if α = β
lcs(a, b) + 1 if α = β
∗ D E F I N E
∗ 0 0 0 0 0 0 0
D 0 1
E 0
S 0
I 0
G 0
N 0
Alexander Tiskin (Warwick) Semi-local LCS 38 / 164
69. Semi-local string comparison
Semi-local LCS and edit distance
LCS computation by dynamic programming
lcs(a, ∅) = 0
lcs(∅, b) = 0
lcs(aα, bβ) =
max(lcs(aα, b), lcs(a, bβ)) if α = β
lcs(a, b) + 1 if α = β
∗ D E F I N E
∗ 0 0 0 0 0 0 0
D 0 1 1
E 0
S 0
I 0
G 0
N 0
Alexander Tiskin (Warwick) Semi-local LCS 38 / 164
70. Semi-local string comparison
Semi-local LCS and edit distance
LCS computation by dynamic programming
lcs(a, ∅) = 0
lcs(∅, b) = 0
lcs(aα, bβ) =
max(lcs(aα, b), lcs(a, bβ)) if α = β
lcs(a, b) + 1 if α = β
∗ D E F I N E
∗ 0 0 0 0 0 0 0
D 0 1 1 1 1 1 1
E 0
S 0
I 0
G 0
N 0
Alexander Tiskin (Warwick) Semi-local LCS 38 / 164
71. Semi-local string comparison
Semi-local LCS and edit distance
LCS computation by dynamic programming
lcs(a, ∅) = 0
lcs(∅, b) = 0
lcs(aα, bβ) =
max(lcs(aα, b), lcs(a, bβ)) if α = β
lcs(a, b) + 1 if α = β
∗ D E F I N E
∗ 0 0 0 0 0 0 0
D 0 1 1 1 1 1 1
E 0 1
S 0
I 0
G 0
N 0
Alexander Tiskin (Warwick) Semi-local LCS 38 / 164
72. Semi-local string comparison
Semi-local LCS and edit distance
LCS computation by dynamic programming
lcs(a, ∅) = 0
lcs(∅, b) = 0
lcs(aα, bβ) =
max(lcs(aα, b), lcs(a, bβ)) if α = β
lcs(a, b) + 1 if α = β
∗ D E F I N E
∗ 0 0 0 0 0 0 0
D 0 1 1 1 1 1 1
E 0 1 2
S 0
I 0
G 0
N 0
Alexander Tiskin (Warwick) Semi-local LCS 38 / 164
73. Semi-local string comparison
Semi-local LCS and edit distance
LCS computation by dynamic programming
lcs(a, ∅) = 0
lcs(∅, b) = 0
lcs(aα, bβ) =
max(lcs(aα, b), lcs(a, bβ)) if α = β
lcs(a, b) + 1 if α = β
∗ D E F I N E
∗ 0 0 0 0 0 0 0
D 0 1 1 1 1 1 1
E 0 1 2 2 2 2 2
S 0
I 0
G 0
N 0
Alexander Tiskin (Warwick) Semi-local LCS 38 / 164
74. Semi-local string comparison
Semi-local LCS and edit distance
LCS computation by dynamic programming
lcs(a, ∅) = 0
lcs(∅, b) = 0
lcs(aα, bβ) =
max(lcs(aα, b), lcs(a, bβ)) if α = β
lcs(a, b) + 1 if α = β
∗ D E F I N E
∗ 0 0 0 0 0 0 0
D 0 1 1 1 1 1 1
E 0 1 2 2 2 2 2
S 0 1 2 2 2 2 2
I 0
G 0
N 0
Alexander Tiskin (Warwick) Semi-local LCS 38 / 164
75. Semi-local string comparison
Semi-local LCS and edit distance
LCS computation by dynamic programming
lcs(a, ∅) = 0
lcs(∅, b) = 0
lcs(aα, bβ) =
max(lcs(aα, b), lcs(a, bβ)) if α = β
lcs(a, b) + 1 if α = β
∗ D E F I N E
∗ 0 0 0 0 0 0 0
D 0 1 1 1 1 1 1
E 0 1 2 2 2 2 2
S 0 1 2 2 2 2 2
I 0 1 2 2 3 3 3
G 0
N 0
Alexander Tiskin (Warwick) Semi-local LCS 38 / 164
76. Semi-local string comparison
Semi-local LCS and edit distance
LCS computation by dynamic programming
lcs(a, ∅) = 0
lcs(∅, b) = 0
lcs(aα, bβ) =
max(lcs(aα, b), lcs(a, bβ)) if α = β
lcs(a, b) + 1 if α = β
∗ D E F I N E
∗ 0 0 0 0 0 0 0
D 0 1 1 1 1 1 1
E 0 1 2 2 2 2 2
S 0 1 2 2 2 2 2
I 0 1 2 2 3 3 3
G 0 1 2 2 3 3 3
N 0 1 2 2 3 4 4
Alexander Tiskin (Warwick) Semi-local LCS 38 / 164
77. Semi-local string comparison
Semi-local LCS and edit distance
LCS computation by dynamic programming
lcs(a, ∅) = 0
lcs(∅, b) = 0
lcs(aα, bβ) =
max(lcs(aα, b), lcs(a, bβ)) if α = β
lcs(a, b) + 1 if α = β
∗ D E F I N E
∗ 0 0 0 0 0 0 0
D 0 1 1 1 1 1 1
E 0 1 2 2 2 2 2
S 0 1 2 2 2 2 2
I 0 1 2 2 3 3 3
G 0 1 2 2 3 3 3
N 0 1 2 2 3 4 4
lcs(“DEFINE”, “DESIGN”) = 4
Alexander Tiskin (Warwick) Semi-local LCS 38 / 164
78. Semi-local string comparison
Semi-local LCS and edit distance
LCS computation by dynamic programming
lcs(a, ∅) = 0
lcs(∅, b) = 0
lcs(aα, bβ) =
max(lcs(aα, b), lcs(a, bβ)) if α = β
lcs(a, b) + 1 if α = β
∗ D E F I N E
∗ 0 0 0 0 0 0 0
D 0 1 1 1 1 1 1
E 0 1 2 2 2 2 2
S 0 1 2 2 2 2 2
I 0 1 2 2 3 3 3
G 0 1 2 2 3 3 3
N 0 1 2 2 3 4 4
lcs(“DEFINE”, “DESIGN”) = 4
LCS(a, b) can be traced back through
the dynamic programming table at no
extra asymptotic time cost
Alexander Tiskin (Warwick) Semi-local LCS 38 / 164
79. Semi-local string comparison
Semi-local LCS and edit distance
LCS on the alignment graph: directed grid of match and mismatch cells
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A blue = 0
red = 1
score(“BAABCBCA”, “BAABCABCABACA”) = len(“BAABCBCA”) = 8
LCS = highest-score path from top-left to bottom-right
Alexander Tiskin (Warwick) Semi-local LCS 39 / 164
80. Semi-local string comparison
Semi-local LCS and edit distance
LCS: dynamic programming [WF: 1974]
Sweep cells in any -compatible order
Cell update: time O(1)
Overall time O(mn)
Alexander Tiskin (Warwick) Semi-local LCS 40 / 164
81. Semi-local string comparison
Semi-local LCS and edit distance
LCS: micro-block dynamic programming [MP: 1980; BF: 2008]
Sweep cells in micro-blocks, in any -compatible order
Micro-block size:
t = O(log n) when σ = O(1)
t = O log n
log log n otherwise
Micro-block interface:
O(t) characters, each O(log σ) bits, can be reduced to O(log t) bits
O(t) small integers, each O(1) bits
Micro-block update: time O(1), by precomputing all possible interfaces
Overall time O mn
log2
n
when σ = O(1), O mn(log log n)2
log2
n
otherwise
Alexander Tiskin (Warwick) Semi-local LCS 41 / 164
82. Semi-local string comparison
Semi-local LCS and edit distance
‘Begin at the beginning,’ the King said gravely, ‘and
go on till you come to the end: then stop.’
L. Carroll, Alice in Wonderland
Alexander Tiskin (Warwick) Semi-local LCS 42 / 164
83. Semi-local string comparison
Semi-local LCS and edit distance
‘Begin at the beginning,’ the King said gravely, ‘and
go on till you come to the end: then stop.’
L. Carroll, Alice in Wonderland
Dynamic programming: begins at empty strings,
proceeds by appending characters, then stops
What about:
prepending/deleting characters (dynamic LCS)
concatenating strings (LCS on compressed
strings; parallel LCS)
taking substrings (= local alignment)
Alexander Tiskin (Warwick) Semi-local LCS 42 / 164
84. Semi-local string comparison
Semi-local LCS and edit distance
Dynamic programming from both ends:
better by ×2, but still not good enough
Is dynamic programming strictly necessary to
solve sequence alignment problems?
Eppstein+, Efficient algorithms for sequence
analysis, 1991
Alexander Tiskin (Warwick) Semi-local LCS 43 / 164
85. Semi-local string comparison
Semi-local LCS and edit distance
The semi-local LCS problem
Give the (implicit) matrix of O (m + n)2 LCS scores:
string-substring LCS: string a vs every substring of b
prefix-suffix LCS: every prefix of a vs every suffix of b
suffix-prefix LCS: every suffix of a vs every prefix of b
substring-string LCS: every substring of a vs string b
Alexander Tiskin (Warwick) Semi-local LCS 44 / 164
86. Semi-local string comparison
Semi-local LCS and edit distance
The semi-local LCS problem
Give the (implicit) matrix of O (m + n)2 LCS scores:
string-substring LCS: string a vs every substring of b
prefix-suffix LCS: every prefix of a vs every suffix of b
suffix-prefix LCS: every suffix of a vs every prefix of b
substring-string LCS: every substring of a vs string b
Alexander Tiskin (Warwick) Semi-local LCS 44 / 164
87. Semi-local string comparison
Semi-local LCS and edit distance
Semi-local LCS on the alignment graph
B
A
A
B
C
B
C
A
B A A B C A B C A B A C AC A B C A B A blue = 0
red = 1
score(“BAABCBCA”, “CABCABA”) = len(“ABCBA”) = 5
String-substring LCS: all highest-score top-to-bottom paths
Semi-local LCS: all highest-score boundary-to-boundary paths
Alexander Tiskin (Warwick) Semi-local LCS 45 / 164
89. Semi-local string comparison
Score matrices and seaweed matrices
Semi-local LCS: output representation and running time
size query time
O(n2) O(1) trivial
O(m1/2n) O(log n) string-substring [Alves+: 2003]
O(n) O(n) string-substring [Alves+: 2005]
O(n log n) O(log2
n) [T: 2006]
. . . or any 2D orthogonal range counting data structure
running time
O(mn2) naive
O(mn) string-substring [Schmidt: 1998; Alves+: 2005]
O(mn) [T: 2006]
O mn
log0.5
n
[T: 2006]
O mn(log log n)2
log2
n
[T: 2007]
Alexander Tiskin (Warwick) Semi-local LCS 47 / 164
90. Semi-local string comparison
Score matrices and seaweed matrices
The score matrix H and the seaweed matrix P
H(i, j): the number of matched characters for a vs substring b i : j
j − i − H(i, j): the number of unmatched characters
Properties of matrix j − i − H(i, j):
simple unit-Monge
therefore, = PΣ, where P = −H is a permutation matrix
P is the seaweed matrix, giving an implicit representation of H
Range tree for P: memory O(n log n), query time O(log2
n)
Alexander Tiskin (Warwick) Semi-local LCS 48 / 164
94. Semi-local string comparison
Score matrices and seaweed matrices
The score matrix H and the seaweed matrix P
a = “BAABCBCA”
b = “BAABCABCABACA”
H(4, 11) =
11 − 4 − PΣ(i, j) =
11 − 4 − 2 = 5
Alexander Tiskin (Warwick) Semi-local LCS 50 / 164
95. Semi-local string comparison
Score matrices and seaweed matrices
The (combed) seaweed braid in the alignment graph
B
A
A
B
C
B
C
A
B A A B C A B C A B A C AC A B C A B A a = “BAABCBCA”
b = “BAABCABCABACA”
H(4, 11) =
11 − 4 − PΣ(i, j) =
11 − 4 − 2 = 5
P(i, j) = 1 corresponds to seaweed top i bottom j
Alexander Tiskin (Warwick) Semi-local LCS 51 / 164
96. Semi-local string comparison
Score matrices and seaweed matrices
The (combed) seaweed braid in the alignment graph
B
A
A
B
C
B
C
A
B A A B C A B C A B A C AC A B C A B A a = “BAABCBCA”
b = “BAABCABCABACA”
H(4, 11) =
11 − 4 − PΣ(i, j) =
11 − 4 − 2 = 5
P(i, j) = 1 corresponds to seaweed top i bottom j
Also define seaweeds top right, left right, left bottom
Represent implicitly semi-local LCS for each prefix of a vs b
Alexander Tiskin (Warwick) Semi-local LCS 51 / 164
97. Semi-local string comparison
Score matrices and seaweed matrices
Seaweed braid: a highly symmetric object (element of H0(Sn))
Can be built by assembling subbraids: divide-and-conquer
Flexible approach to local alignment, compressed approximate matching,
parallel computation. . .
Alexander Tiskin (Warwick) Semi-local LCS 52 / 164
98. Semi-local string comparison
Weighted alignment
The LCS problem is a special case of the weighted alignment problem
Scoring scheme: match score wM , mismatch score wX , gap score wG
LCS score: wM = 1, wX = wG = 0
Levenshtein score: wM = 2, wX = 1, wG = 0
Alexander Tiskin (Warwick) Semi-local LCS 53 / 164
99. Semi-local string comparison
Weighted alignment
The LCS problem is a special case of the weighted alignment problem
Scoring scheme: match score wM , mismatch score wX , gap score wG
LCS score: wM = 1, wX = wG = 0
Levenshtein score: wM = 2, wX = 1, wG = 0
Scoring scheme is rational, if wM , wX , wG are rational numbers
Alexander Tiskin (Warwick) Semi-local LCS 53 / 164
100. Semi-local string comparison
Weighted alignment
The LCS problem is a special case of the weighted alignment problem
Scoring scheme: match score wM , mismatch score wX , gap score wG
LCS score: wM = 1, wX = wG = 0
Levenshtein score: wM = 2, wX = 1, wG = 0
Scoring scheme is rational, if wM , wX , wG are rational numbers
Edit distance: minimum cost to transform a into b by weighted character
edits (insertion, deletion, substitution)
Corresponds to a scoring scheme wM = 0:
insertion/deletion cost −wG
substitution cost −wX
Alexander Tiskin (Warwick) Semi-local LCS 53 / 164
101. Semi-local string comparison
Weighted alignment
Weighted alignment graph for a, b
B
A
A
B
C
B
C
A
B A A B C A B C A B A C AC A B C A B A blue = 0
red (solid) = 2
red (dotted) = 1
Levenshtein(“BAABCBCA”, “CABCABA”) = 11
Alexander Tiskin (Warwick) Semi-local LCS 54 / 164
102. Semi-local string comparison
Weighted alignment
Reduction: ordinary alignment graph for blown-up a, b
$B
$A
$A
$B
$C
$B
$C
$A
$B $A $A $B $C $A $B $C $A $B $A $C $A$C $A$B $C $A$B$A blue = 0
red = 1 or 2
Levenshtein(“BAABCBCA”, “CABCABA”) =
lcs(“$B$A$A$B$C$B$C$A”, “$C$A$B$C$A$B$A”) = 11
Alexander Tiskin (Warwick) Semi-local LCS 55 / 164
103. Semi-local string comparison
Weighted alignment
Reduction: rational-weighted semi-local alignment to semi-local LCS
$B
$A
$A
$B
$C
$B
$C
$A
$B $A $A $B $C $A $B $C $A $B $A $C $A$C $A$B $C $A$B$A
Let wM = 1, wX = µ
ν , wG = 0
Increase × ν2 in complexity (can be reduced to ν)
Alexander Tiskin (Warwick) Semi-local LCS 56 / 164
105. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 58 / 164
106. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
107. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
108. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
109. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
110. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
111. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
112. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
113. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
114. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
115. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
116. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
117. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
118. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
119. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
120. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
121. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
122. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
123. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
124. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
125. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
126. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
127. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
128. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
129. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
130. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
131. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
132. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
133. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
134. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
135. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
136. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
137. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
138. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
139. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
140. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
141. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
142. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 59 / 164
143. The seaweed method
Seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 60 / 164
144. The seaweed method
Seaweed combing
Semi-local LCS: seaweed combing [T: 2006]
Initialise uncombed seaweed braid: mismatch cell = crossing
Sweep cells in any -compatible order
match cell: skip (keep uncrossed)
mismatch cell: comb (uncross) iff the same seaweed pair already
crossed before
Cell update: time O(1)
Overall time O(mn)
Correctness: by seaweed monoid relations
Alexander Tiskin (Warwick) Semi-local LCS 61 / 164
145. The seaweed method
Micro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 62 / 164
146. The seaweed method
Micro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 63 / 164
147. The seaweed method
Micro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 63 / 164
148. The seaweed method
Micro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 63 / 164
149. The seaweed method
Micro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 63 / 164
150. The seaweed method
Micro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 63 / 164
151. The seaweed method
Micro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 63 / 164
152. The seaweed method
Micro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 63 / 164
153. The seaweed method
Micro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 63 / 164
154. The seaweed method
Micro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 63 / 164
155. The seaweed method
Micro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 63 / 164
156. The seaweed method
Micro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 63 / 164
157. The seaweed method
Micro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 63 / 164
158. The seaweed method
Micro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 63 / 164
159. The seaweed method
Micro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local LCS 64 / 164
160. The seaweed method
Micro-block seaweed combing
Semi-local LCS: micro-block seaweed combing [T: 2007]
Initialise uncombed seaweed braid: mismatch cell = crossing
Sweep cells in micro-blocks, in any -compatible order
Micro-block size: t = O log n
log log n
Micro-block interface:
O(t) characters, each O(log σ) bits, can be reduced to O(log t) bits
O(t) integers, each O(log n) bits, can be reduced to O(log t) bits
Micro-block update: time O(1), by precomputing all possible interfaces
Overall time O mn(log log n)2
log2
n
Alexander Tiskin (Warwick) Semi-local LCS 65 / 164
161. The seaweed method
Cyclic LCS
The cyclic LCS problem
Give the maximum LCS score for a vs all cyclic rotations of b
Alexander Tiskin (Warwick) Semi-local LCS 66 / 164
162. The seaweed method
Cyclic LCS
The cyclic LCS problem
Give the maximum LCS score for a vs all cyclic rotations of b
Cyclic LCS: running time
O mn2
log n naive
O(mn log m) [Maes: 1990]
O(mn) [Bunke, B¨uhler: 1993; Landau+: 1998; Schmidt: 1998]
O mn(log log n)2
log2
n
[T: 2007]
Alexander Tiskin (Warwick) Semi-local LCS 66 / 164
163. The seaweed method
Cyclic LCS
Cyclic LCS: the algorithm
Micro-block seaweed combing on a vs bb, time O mn(log log n)2
log2
n
Make n string-substring LCS queries, time negligible
Alexander Tiskin (Warwick) Semi-local LCS 67 / 164
164. The seaweed method
Longest repeating subsequence
The longest repeating subsequence problem
Find the longest subsequence of a that is a square (a repetition of two
identical strings)
Alexander Tiskin (Warwick) Semi-local LCS 68 / 164
165. The seaweed method
Longest repeating subsequence
The longest repeating subsequence problem
Find the longest subsequence of a that is a square (a repetition of two
identical strings)
Longest repeating subsequence: running time
O(m3) naive
O(m2) [Kosowski: 2004]
O m2(log log m)2
log2
m
[T: 2007]
Alexander Tiskin (Warwick) Semi-local LCS 68 / 164
166. The seaweed method
Longest repeating subsequence
Longest repeating subsequence: the algorithm
Micro-block seaweed combing on a vs a, time O m2(log log m)2
log2
m
Make m − 1 prefix-suffix LCS queries, time negligible
Open question: generalise to longest subsequence that is a cube, etc.
Alexander Tiskin (Warwick) Semi-local LCS 69 / 164
167. The seaweed method
Approximate matching
The approximate pattern matching problem
Give the substring closest to a by alignment score, starting at each
position in b
Assume rational scoring scheme
Approximate pattern matching: running time
O(mn) [Sellers: 1980]
O mn
log n σ = O(1) via [Masek, Paterson: 1980]
O mn(log log n)2
log2
n
via [Bille, Farach-Colton: 2008]
Alexander Tiskin (Warwick) Semi-local LCS 70 / 164
168. The seaweed method
Approximate matching
Approximate pattern matching: the algorithm
Micro-block seaweed combing on a vs b (with blow-up), time
O mn(log log n)2
log2
n
The implicit semi-local edit score matrix:
an anti-Monge matrix
approximate pattern matching ∼ row minima
Row minima in O(n) element queries [Aggarwal+: 1987]
Each query in time O(log2
n) using the range tree representation,
combined query time negligible
Overall running time O mn(log log n)2
log2
n
, same as [Bille, Farach-Colton: 2008]
Alexander Tiskin (Warwick) Semi-local LCS 71 / 164
170. Periodic string comparison
Wraparound seaweed combing
The periodic string-substring LCS problem
Give (implicit) LCS scores for a vs each substring of b = . . . uuu . . . = u±∞
Let u be of length p
May assume that every character of a occurs in u (otherwise delete it)
Only substrings of b of length ≤ mp (otherwise LCS score = m)
Alexander Tiskin (Warwick) Semi-local LCS 73 / 164
213. Periodic string comparison
Wraparound seaweed combing
Periodic string-substring LCS: Wraparound seaweed combing
Initialise uncombed seaweed braid: mismatch cell = crossing
Sweep cells row-by-row: each row starts at match cell, wraps at boundary
Sweep cells in any -compatible order
match cell: skip (keep uncrossed)
mismatch cell: comb (uncross) iff the same seaweed pair already
crossed before, possibly in a different period
Cell update: time O(1)
Overall time O(mn)
String-substring LCS score: count seaweeds with multiplicities
Alexander Tiskin (Warwick) Semi-local LCS 77 / 164
214. Periodic string comparison
Wraparound seaweed combing
The tandem LCS problem
Give LCS score for a vs b = uk
We have n = kp; may assume k ≤ m
Tandem LCS: running time
O(mkp) naive
O m(k + p) [Landau, Ziv-Ukelson: 2001]
O(mp) [T: 2009]
Direct application of wraparound seaweed combing
Alexander Tiskin (Warwick) Semi-local LCS 78 / 164
215. Periodic string comparison
Wraparound seaweed combing
The tandem alignment problem
Give the substring closest to a in b = u±∞ by alignment score among
global: substrings uk of length kp across all k
cyclic: substrings of length kp across all k
local: substrings of any length
Tandem alignment: running time
O(m2p) all naive
O(mp) global [Myers, Miller: 1989]
O(mp log p) cyclic [Benson: 2005]
O(mp) cyclic [T: 2009]
O(mp) local [Myers, Miller: 1989]
Alexander Tiskin (Warwick) Semi-local LCS 79 / 164
216. Periodic string comparison
Wraparound seaweed combing
Cyclic tandem alignment: the algorithm
Periodic seaweed combing for a vs b (with blow-up), time O(mp)
For each k ∈ [1 : m]:
solve tandem LCS (under given alignment score) for a vs uk
obtain scores for a vs p successive substrings of b of length kp by LCS
batch query: time O(1) per substring
Running time O(mp)
Alexander Tiskin (Warwick) Semi-local LCS 80 / 164
218. Sparse string comparison
Semi-local LCS between permutations
The LCS problem on permutation strings (LCSP)
Give LCS score for a vs b; in each of a, b all characters distinct
Equivalent to
longest increasing subsequence (LIS) in a string
maximum clique in a permutation graph
maximum planar matching in an embedded bipartite graph
LCSP: running time
O(n log n) implicit in [Erd¨os, Szekeres: 1935]
[Robinson: 1938; Knuth: 1970; Dijkstra: 1980]
O(n log log n) unit-RAM [Chang, Wang: 1992]
[Bespamyatnikh, Segal: 2000]
Alexander Tiskin (Warwick) Semi-local LCS 82 / 164
219. Sparse string comparison
Semi-local LCS between permutations
Semi-local LCSP
Give semi-local LCS scores for a vs b; in each of a, b all characters distinct
Equivalent to
longest increasing subsequence (LIS) in every substring of a string
Semi-local LCSP: running time
O(n2 log n) naive
O(n2) restricted [Albert+: 2003; Chen+: 2005]
O(n1.5 log n) randomised, restricted [Albert+: 2007]
O(n1.5) [T: 2006]
O(n log2
n) [T: 2010]
Alexander Tiskin (Warwick) Semi-local LCS 83 / 164
220. Sparse string comparison
Semi-local LCS between permutations
C
F
A
E
D
H
G
B
D E H C B A F G
Alexander Tiskin (Warwick) Semi-local LCS 84 / 164
221. Sparse string comparison
Semi-local LCS between permutations
C
F
A
E
D
H
G
B
D E H C B A F G
Alexander Tiskin (Warwick) Semi-local LCS 85 / 164
222. Sparse string comparison
Semi-local LCS between permutations
C
F
A
E
D
H
G
B
D E H C B A F G
Alexander Tiskin (Warwick) Semi-local LCS 86 / 164
223. Sparse string comparison
Semi-local LCS between permutations
C
F
A
E
D
H
G
B
D E H C B A F G
Alexander Tiskin (Warwick) Semi-local LCS 87 / 164
224. Sparse string comparison
Semi-local LCS between permutations
Semi-local LCSP: the algorithm
Divide-and-conquer on the alignment graph
Divide graph (say) horizontally; two subproblems of effective size n/2
Conquer: seaweed matrix -multiplication, time O(n log n)
Overall time O(n log2
n)
Alexander Tiskin (Warwick) Semi-local LCS 88 / 164
225. Sparse string comparison
Longest piecewise monotone subsequences
A k-increasing sequence: a concatenation of k increasing sequences
A (k − 1)-modal sequence: a concatenation of k alternating increasing and
decreasing sequences
The longest k-increasing (or (k − 1)-modal) subsequence problem
Give the longest k-increasing ((k − 1)-modal) subsequence of string b
Longest k-increasing (or (k − 1)-modal) subsequence: running time
O(nk log n) (k − 1)-modal [Demange+: 2007]
O(nk log n) via [Hunt, Szymanski: 1977]
O(n log2
n) [T: 2010]
Main idea: lcs(idk
, b) (respectively, lcs((idid)k/2, b))
Alexander Tiskin (Warwick) Semi-local LCS 89 / 164
226. Sparse string comparison
Longest piecewise monotone subsequences
Longest k-increasing subsequence: algorithm A
Sparse LCS for idk
vs b: time O(nk log n)
Alexander Tiskin (Warwick) Semi-local LCS 90 / 164
227. Sparse string comparison
Longest piecewise monotone subsequences
Longest k-increasing subsequence: algorithm A
Sparse LCS for idk
vs b: time O(nk log n)
Longest k-increasing subsequence: algorithm B
Semi-local LCSP for id vs b: time O(n log2
n)
Extract string-substring LCSP for id vs b
String-substring LCS for idk
vs b by at most 2 log k instances of
-square/multiply: time log k · O(n log n) = O(n log2
n)
Query the LCS score for idk
vs b: time negligible
Overall time O(n log2
n)
Algorithm B faster than Algorithm A for k ≥ log n
Alexander Tiskin (Warwick) Semi-local LCS 90 / 164
228. Sparse string comparison
Maximum clique in a circle graph
The maximum clique problem in a circle graph
Given a circle with n chords, find the maximum-size subset of pairwise
intersecting chords
Alexander Tiskin (Warwick) Semi-local LCS 91 / 164
229. Sparse string comparison
Maximum clique in a circle graph
The maximum clique problem in a circle graph
Given a circle with n chords, find the maximum-size subset of pairwise
intersecting chords
Standard reduction to an interval model: cut the circle and lay it out on
the line; chords become intervals (here drawn as square diagonals)
Chords intersect iff intervals overlap, i.e. intersect without containment
Alexander Tiskin (Warwick) Semi-local LCS 91 / 164
230. Sparse string comparison
Maximum clique in a circle graph
Maximum clique in a circle graph: running time
exp(n) naive
O(n3) [Gavril: 1973]
O(n2) [Rotem, Urrutia: 1981; Hsu: 1985]
[Masuda+: 1990; Apostolico+: 1992]
O(n1.5) [T: 2006]
O(n log2
n) [T: 2010]
Alexander Tiskin (Warwick) Semi-local LCS 92 / 164
242. Sparse string comparison
Maximum clique in a circle graph
Maximum clique in a circle graph: the algorithm
Helly property: if any set of intervals intersect pairwise, then they all
intersect at a common point
Compute seaweed matrix, build range tree: time O(n log2
n)
Run through all 2n + 1 possible common intersection points
For each point, find a maximum subset of covering overlapping segments
by a prefix-suffix LCS query: time (2n + 1) · O(log2
n) = O(n log2
n)
Overall time O(n log2
n) + O(n log2
n) = O(n log2
n)
Alexander Tiskin (Warwick) Semi-local LCS 94 / 164
243. Sparse string comparison
Maximum clique in a circle graph
Parameterised maximum clique in a circle graph
The maximum clique problem in a circle graph, sensitive e.g. to
number e of edges; e ≤ n2
size l of maximum clique; l ≤ n
cutwidth d of interval model (max number of intervals covering a
point); l ≤ d ≤ n
Parameterised maximum clique in a circle graph: running time
O(n log n + e) [Apostolico+: 1992]
O(n log n + nl log(n/l)) [Apostolico+: 1992]
O(n log n + n log2
d) NEW
Alexander Tiskin (Warwick) Semi-local LCS 95 / 164
244. Sparse string comparison
Maximum clique in a circle graph
Parameterised maximum clique in a circle graph: the algorithm
For each diagonal block of size d, compute seaweed matrix, build range
tree: time n/d · O(d log2
d) = O(n log2
d)
Extend each diagonal block to a quadrant: time O(n log2
d)
Run through all 2n + 1 possible common intersection points
For each point, find a maximum subset of covering overlapping segments
by a prefix-suffix LCS query: time O(n log2
d)
Overall time O(n log2
d)
Alexander Tiskin (Warwick) Semi-local LCS 96 / 164
246. Compressed string comparison
Grammar compression
Notation: pattern p of length m; text t of length n
A GC-string (grammar-compressed string) t is a straight-line program
(context-free grammar) generating t = t¯n by ¯n assignments of the form
tk = α, where α is an alphabet character
tk = uv, where each of u, v is an alphabet character, or ti for i < k
In general, n = O(2¯n)
Example: Fibonacci string “ABAABABAABAAB”
t1 = A t2 = t1B t3 = t2t1 t4 = t3t2 t5 = t4t3 t6 = t5t4
Alexander Tiskin (Warwick) Semi-local LCS 98 / 164
247. Compressed string comparison
Grammar compression
Grammar-compression covers various compression types, e.g. LZ78, LZW
(not LZ77 directly)
Simplifying assumption: arithmetic up to n runs in O(1)
This assumption can be removed by careful index remapping
Alexander Tiskin (Warwick) Semi-local LCS 99 / 164
248. Compressed string comparison
Extended substring-string LCS on GC-strings
LCS: running time (r = m + n, ¯r = ¯m + ¯n)
p t
plain plain O(mn) [Wagner, Fischer: 1974]
O mn
log2
m
[Masek, Paterson: 1980]
[Crochemore+: 2003]
plain GC O(m3 ¯n + . . .) gen. CFG [Myers: 1995]
O(m1.5 ¯n) ext subs-s [T: 2008]
O(m log m · ¯n) ext subs-s [T: 2010]
GC GC NP-hard [Lifshits: 2005]
O(r1.2¯r1.4) R weights [Hermelin+: 2009]
O(r log r · ¯r) [T: 2010]
O r log(r/¯r) · ¯r [Hermelin+: 2010]
O r log1/2
(r/¯r) · ¯r [Gawrychowski: 2012]
Alexander Tiskin (Warwick) Semi-local LCS 100 / 164
249. Compressed string comparison
Extended substring-string LCS on GC-strings
Substring-string LCS (plain pattern, GC text): the algorithm
For every k, compute by recursion the appropriate part of semi-local LCS
for p vs tk, using seaweed matrix -multiplication: time O(m log m · ¯n)
Overall time O(m log m · ¯n)
Alexander Tiskin (Warwick) Semi-local LCS 101 / 164
250. Compressed string comparison
Subsequence recognition on GC-strings
The global subsequence recognition problem
Does pattern p appear in text t as a subsequence?
Global subsequence recognition: running time
p t
plain plain O(n) greedy
plain GC O(m¯n) greedy
GC GC NP-hard [Lifshits: 2005]
Alexander Tiskin (Warwick) Semi-local LCS 102 / 164
251. Compressed string comparison
Subsequence recognition on GC-strings
The local subsequence recognition problem
Find all minimally matching substrings of t with respect to p
Substring of t is matching, if p is a subsequence of t
Matching substring of t is minimally matching, if none of its proper
substrings are matching
Alexander Tiskin (Warwick) Semi-local LCS 103 / 164
252. Compressed string comparison
Subsequence recognition on GC-strings
Local subsequence recognition: running time ( + output)
p t
plain plain O(mn) [Mannila+: 1995]
O mn
log m [Das+: 1997]
O(cm + n) [Boasson+: 2001]
O(m + nσ) [Troniˇcek: 2001]
plain GC O(m2 log m¯n) [C´egielski+: 2006]
O(m1.5 ¯n) [T: 2008]
O(m log m · ¯n) [T: 2010]
O(m · ¯n) [Yamamoto+: 2011]
GC GC NP-hard [Lifshits: 2005]
Alexander Tiskin (Warwick) Semi-local LCS 104 / 164
253. Compressed string comparison
Subsequence recognition on GC-strings
ˆı0+ ˆı1+ ˆı2+
ˆ0+ ˆ1+ ˆ2+
0 n n
b i : j matching iff box [i : j] not pierced left-to-right
Determined by -chain of -maximal seaweeds
b i : j minimally matching iff (i, j) is in the interleaved skeleton -chain
Alexander Tiskin (Warwick) Semi-local LCS 105 / 164
254. Compressed string comparison
Subsequence recognition on GC-strings
ˆ0+ ˆ1+ ˆ2+n n m+n
−m
0
n
ˆı0+
ˆı1+
ˆı2+
Alexander Tiskin (Warwick) Semi-local LCS 106 / 164
255. Compressed string comparison
Subsequence recognition on GC-strings
Local subsequence recognition (plain pattern, GC text): the algorithm
For every k, compute by recursion the appropriate part of semi-local LCS
for p vs tk, using seaweed matrix -multiplication: time O(m log m · ¯n)
Alexander Tiskin (Warwick) Semi-local LCS 107 / 164
256. Compressed string comparison
Subsequence recognition on GC-strings
Local subsequence recognition (plain pattern, GC text): the algorithm
For every k, compute by recursion the appropriate part of semi-local LCS
for p vs tk, using seaweed matrix -multiplication: time O(m log m · ¯n)
Given an assignment t = t t , find by recursion
minimally matching substrings in t
minimally matching substrings in t
Alexander Tiskin (Warwick) Semi-local LCS 107 / 164
257. Compressed string comparison
Subsequence recognition on GC-strings
Local subsequence recognition (plain pattern, GC text): the algorithm
For every k, compute by recursion the appropriate part of semi-local LCS
for p vs tk, using seaweed matrix -multiplication: time O(m log m · ¯n)
Given an assignment t = t t , find by recursion
minimally matching substrings in t
minimally matching substrings in t
Then, find -chain of -maximal seaweeds in time ¯n · O(m) = O(m¯n)
Its skeleton -chain: minimally matching substrings in t overlapping t , t
Overall time O(m log m · ¯n) + O(m¯n) = O(m log m · ¯n)
Alexander Tiskin (Warwick) Semi-local LCS 107 / 164
258. Compressed string comparison
Threshold approximate matching
The threshold approximate matching problem
Find all matching substrings of t with respect to p, according to a
threshold k
Substring of t is matching, if the edit distance for p vs t is at most k
Alexander Tiskin (Warwick) Semi-local LCS 108 / 164
259. Compressed string comparison
Threshold approximate matching
Threshold approximate matching: running time ( + output)
p t
plain plain O(mn) [Sellers: 1980]
O(mk) [Landau, Vishkin: 1989]
O(m + n + nk4
m ) [Cole, Hariharan: 2002]
plain GC O(m¯nk2) [K¨arkk¨ainen+: 2003]
O(m¯nk + ¯n log n) [LV: 1989] via [Bille+: 2010]
O(m¯n + ¯nk4 + ¯n log n) [CH: 2002] via [Bille+: 2010]
O(m log m · ¯n) [T: NEW]
GC GC NP-hard [Lifshits: 2005]
(Also many specialised variants for LZ compression)
Alexander Tiskin (Warwick) Semi-local LCS 109 / 164
260. Compressed string comparison
Threshold approximate matching
ˆı0+ ˆı1+ ˆı2+ ˆı3+ ˆı4+
ˆ0+ ˆ1+ ˆ2+ ˆ3+ ˆ4+
0 n n
Blow up: weighted alignment on strings p, t of size m, n equivalent to
LCS on strings p, t of size m = νm, n = νn
Alexander Tiskin (Warwick) Semi-local LCS 110 / 164
261. Compressed string comparison
Threshold approximate matching
ˆ0+ ˆ1+ ˆ2+ ˆ3+ ˆ4+n n m+n
ˆı0+
ˆı1+
ˆı2+
ˆı3+
ˆı4+
−m
0
n
Alexander Tiskin (Warwick) Semi-local LCS 111 / 164
262. Compressed string comparison
Threshold approximate matching
Threshold approx matching (plain pattern, GC text): the algorithm
Algorithm structure similar to local subsequence recognition
-chains replaced by m × m submatrices
Extra ingredients:
the blow-up technique: reduction of edit distances to LCS scores
implicit matrix searching, replaces -chain interleaving
Monge row minima: “SMAWK” O(m) [Aggarwal+: 1987]
Implicit unit-Monge row minima:
O(m log log m) [T: 2012]
O(m) [Gawrychowski: 2012]
Overall time O(m log m · ¯n) + O(m¯n) = O(m log m · ¯n)
Alexander Tiskin (Warwick) Semi-local LCS 112 / 164
264. Parallel string comparison
Parallel LCS
Bulk-Synchronous Parallel (BSP) computer [Valiant: 1990]
Simple, realistic general-purpose parallel
model
COMM. ENV . (g, l)
0
P
M
1
P
M
p − 1
P
M· · ·
Alexander Tiskin (Warwick) Semi-local LCS 114 / 164
265. Parallel string comparison
Parallel LCS
Bulk-Synchronous Parallel (BSP) computer [Valiant: 1990]
Simple, realistic general-purpose parallel
model
COMM. ENV . (g, l)
0
P
M
1
P
M
p − 1
P
M· · ·
Contains
p processors, each with local memory (1 time unit/operation)
communication environment, including a network and an external
memory (g time units/data unit communicated)
barrier synchronisation mechanism (l time units/synchronisation)
Alexander Tiskin (Warwick) Semi-local LCS 114 / 164
266. Parallel string comparison
Parallel LCS
BSP computation: sequence of parallel supersteps
0
1
p − 1
Alexander Tiskin (Warwick) Semi-local LCS 115 / 164
267. Parallel string comparison
Parallel LCS
BSP computation: sequence of parallel supersteps
0
1
p − 1
Alexander Tiskin (Warwick) Semi-local LCS 115 / 164
268. Parallel string comparison
Parallel LCS
BSP computation: sequence of parallel supersteps
0
1
p − 1
Alexander Tiskin (Warwick) Semi-local LCS 115 / 164
269. Parallel string comparison
Parallel LCS
BSP computation: sequence of parallel supersteps
0
1
p − 1
Alexander Tiskin (Warwick) Semi-local LCS 115 / 164
271. Parallel string comparison
Parallel LCS
BSP computation: sequence of parallel supersteps
0
1
p − 1 · · ·
· · ·
· · ·
· · ·
Asynchronous computation/communication within supersteps (includes
data exchange with external memory)
Synchronisation before/after each superstep
Alexander Tiskin (Warwick) Semi-local LCS 115 / 164
272. Parallel string comparison
Parallel LCS
BSP computation: sequence of parallel supersteps
0
1
p − 1 · · ·
· · ·
· · ·
· · ·
Asynchronous computation/communication within supersteps (includes
data exchange with external memory)
Synchronisation before/after each superstep
Cf. CSP: parallel collection of sequential processes
Alexander Tiskin (Warwick) Semi-local LCS 115 / 164
273. Parallel string comparison
Parallel LCS
Compositional cost model
For individual processor proc in superstep sstep:
comp(sstep, proc): the amount of local computation and local
memory operations by processor proc in superstep sstep
comm(sstep, proc): the amount of data sent and received by
processor proc in superstep sstep
Alexander Tiskin (Warwick) Semi-local LCS 116 / 164
274. Parallel string comparison
Parallel LCS
Compositional cost model
For individual processor proc in superstep sstep:
comp(sstep, proc): the amount of local computation and local
memory operations by processor proc in superstep sstep
comm(sstep, proc): the amount of data sent and received by
processor proc in superstep sstep
For the whole BSP computer in one superstep sstep:
comp(sstep) = max0≤proc<p comp(sstep, proc)
comm(sstep) = max0≤proc<p comm(sstep, proc)
cost(sstep) = comp(sstep) + comm(sstep) · g + l
Alexander Tiskin (Warwick) Semi-local LCS 116 / 164
275. Parallel string comparison
Parallel LCS
For the whole BSP computation with sync supersteps:
comp = 0≤sstep<sync comp(sstep)
comm = 0≤sstep<sync comm(sstep)
cost = 0≤sstep<sync cost(sstep) = comp + comm · g + sync · l
Alexander Tiskin (Warwick) Semi-local LCS 117 / 164
276. Parallel string comparison
Parallel LCS
For the whole BSP computation with sync supersteps:
comp = 0≤sstep<sync comp(sstep)
comm = 0≤sstep<sync comm(sstep)
cost = 0≤sstep<sync cost(sstep) = comp + comm · g + sync · l
The input/output data are stored in the external memory; the cost of
input/output is included in comm
Alexander Tiskin (Warwick) Semi-local LCS 117 / 164
277. Parallel string comparison
Parallel LCS
For the whole BSP computation with sync supersteps:
comp = 0≤sstep<sync comp(sstep)
comm = 0≤sstep<sync comm(sstep)
cost = 0≤sstep<sync cost(sstep) = comp + comm · g + sync · l
The input/output data are stored in the external memory; the cost of
input/output is included in comm
E.g. for a particular linear system solver with an n × n matrix:
comp = O(n3/p) comm = O(n2/p1/2) sync = O(p1/2)
Alexander Tiskin (Warwick) Semi-local LCS 117 / 164
279. Parallel string comparison
Parallel LCS
The ordered 2D grid dag
grid2(n)
nodes arranged in an n × n grid
edges directed top-to-bottom, left-to-right
≤ 2n inputs (to left/top borders)
≤ 2n outputs (from right/bottom borders)
size n2 depth 2n − 1
Alexander Tiskin (Warwick) Semi-local LCS 119 / 164
280. Parallel string comparison
Parallel LCS
The ordered 2D grid dag
grid2(n)
nodes arranged in an n × n grid
edges directed top-to-bottom, left-to-right
≤ 2n inputs (to left/top borders)
≤ 2n outputs (from right/bottom borders)
size n2 depth 2n − 1
Applications: triangular linear system; discretised PDE via Gauss–Seidel
iteration (single step); 1D cellular automata; dynamic programming
Sequential work O(n2)
Alexander Tiskin (Warwick) Semi-local LCS 119 / 164
281. Parallel string comparison
Parallel LCS
Parallel ordered 2D grid computation
grid2(n)
Consists of a p × p grid of blocks, each
isomorphic to grid2(n/p)
The blocks can be arranged into 2p − 1
anti-diagonal layers, with ≤ p independent
blocks in each layer
Alexander Tiskin (Warwick) Semi-local LCS 120 / 164
282. Parallel string comparison
Parallel LCS
Parallel ordered 2D grid computation
grid2(n)
Consists of a p × p grid of blocks, each
isomorphic to grid2(n/p)
The blocks can be arranged into 2p − 1
anti-diagonal layers, with ≤ p independent
blocks in each layer
Alexander Tiskin (Warwick) Semi-local LCS 120 / 164
283. Parallel string comparison
Parallel LCS
Parallel ordered 2D grid computation
grid2(n)
Consists of a p × p grid of blocks, each
isomorphic to grid2(n/p)
The blocks can be arranged into 2p − 1
anti-diagonal layers, with ≤ p independent
blocks in each layer
Alexander Tiskin (Warwick) Semi-local LCS 120 / 164
284. Parallel string comparison
Parallel LCS
Parallel ordered 2D grid computation
grid2(n)
Consists of a p × p grid of blocks, each
isomorphic to grid2(n/p)
The blocks can be arranged into 2p − 1
anti-diagonal layers, with ≤ p independent
blocks in each layer
Alexander Tiskin (Warwick) Semi-local LCS 120 / 164
285. Parallel string comparison
Parallel LCS
Parallel ordered 2D grid computation
grid2(n)
Consists of a p × p grid of blocks, each
isomorphic to grid2(n/p)
The blocks can be arranged into 2p − 1
anti-diagonal layers, with ≤ p independent
blocks in each layer
Alexander Tiskin (Warwick) Semi-local LCS 120 / 164
286. Parallel string comparison
Parallel LCS
Parallel ordered 2D grid computation
grid2(n)
Consists of a p × p grid of blocks, each
isomorphic to grid2(n/p)
The blocks can be arranged into 2p − 1
anti-diagonal layers, with ≤ p independent
blocks in each layer
Alexander Tiskin (Warwick) Semi-local LCS 120 / 164
287. Parallel string comparison
Parallel LCS
Parallel ordered 2D grid computation
grid2(n)
Consists of a p × p grid of blocks, each
isomorphic to grid2(n/p)
The blocks can be arranged into 2p − 1
anti-diagonal layers, with ≤ p independent
blocks in each layer
Alexander Tiskin (Warwick) Semi-local LCS 120 / 164
288. Parallel string comparison
Parallel LCS
Parallel ordered 2D grid computation (contd.)
The computation proceeds in 2p − 1 stages, each computing a layer of
blocks. In a stage:
every block assigned to a different processor (some processors idle)
the processor reads the 2n/p block inputs, computes the block, and
writes back the 2n/p block outputs
Alexander Tiskin (Warwick) Semi-local LCS 121 / 164
289. Parallel string comparison
Parallel LCS
Parallel ordered 2D grid computation (contd.)
The computation proceeds in 2p − 1 stages, each computing a layer of
blocks. In a stage:
every block assigned to a different processor (some processors idle)
the processor reads the 2n/p block inputs, computes the block, and
writes back the 2n/p block outputs
comp: (2p − 1) · O (n/p)2 = O(p · n2/p2) = O(n2/p)
comm: (2p − 1) · O(n/p) = O(n)
Alexander Tiskin (Warwick) Semi-local LCS 121 / 164
290. Parallel string comparison
Parallel LCS
Parallel ordered 2D grid computation (contd.)
The computation proceeds in 2p − 1 stages, each computing a layer of
blocks. In a stage:
every block assigned to a different processor (some processors idle)
the processor reads the 2n/p block inputs, computes the block, and
writes back the 2n/p block outputs
comp: (2p − 1) · O (n/p)2 = O(p · n2/p2) = O(n2/p)
comm: (2p − 1) · O(n/p) = O(n)
n ≥ p
comp O(n2/p) comm O(n) sync O(p)
Alexander Tiskin (Warwick) Semi-local LCS 121 / 164
291. Parallel string comparison
Parallel LCS
Parallel LCS computation
The 2D grid algorithm solves the LCS problem (and many others) by
dynamic programming
comp = O(n2/p) comm = O(n) sync = O(p)
Alexander Tiskin (Warwick) Semi-local LCS 122 / 164
292. Parallel string comparison
Parallel LCS
Parallel LCS computation
The 2D grid algorithm solves the LCS problem (and many others) by
dynamic programming
comp = O(n2/p) comm = O(n) sync = O(p)
comm is not scalable (i.e. does not decrease with increasing p) :-(
Can scalable comm be achieved for the LCS problem?
Alexander Tiskin (Warwick) Semi-local LCS 122 / 164
293. Parallel string comparison
Parallel LCS
Parallel LCS computation
Solve the more general semi-local LCS problem:
each string vs all substrings of the other string
all prefixes of each string against all suffixes of the other string
Divide-and-conquer on substrings of a, b: log p recursion levels
Conquer phase: assembles substring semi-local LCS from smaller ones by
parallel seaweed multiplication
Base level: p semi-local LCS subproblems on n
p1/2 -strings
Sequential time still O(n2)
Alexander Tiskin (Warwick) Semi-local LCS 123 / 164
294. Parallel string comparison
Parallel LCS
Parallel LCS computation (cont.)
Parallel semi-local LCS: running time, comm, sync
comp comm sync
O(n2/p) O(n) O(p) 2D grid
O(n2/p) O n
p1/2 O(log2
p) [Krusche, T: 2007]
O n log p
p1/2 O(log p) [Krusche, T: 2010]
O n
p1/2 O(log p) [T: NEW]
Based on parallel Steady Ant algorithm
Alexander Tiskin (Warwick) Semi-local LCS 124 / 164
295. Parallel string comparison
Parallel LCS
Seaweed matrix -multiplication: parallel Steady Ant
RΣ(i, k) = minj PΣ(i, j) + QΣ(j, k)
Divide-and-conquer on the range of j: two subproblems on n/2-matrices
PΣ
lo QΣ
lo = RΣ
lo PΣ
hi QΣ
hi = RΣ
hi
Conquer: tracing a balanced trail from bottom-left to top-right R
Partition n × n index space into a regular p × p grid of blocks
Sample R in block corners: total p2 samples
Steady Ant’s trail passes through ≤ 2p blocks
Partition blocks across processors: comp, comm = O(n/p)
comp = O n log n
p comm = O n log p
p sync = O(log p)
Alexander Tiskin (Warwick) Semi-local LCS 125 / 164
296. Parallel string comparison
Parallel LCS
Seaweed matrix -multiplication: generalised parallel Steady Ant
RΣ(i, k) = minj PΣ(i, j) + QΣ(j, k)
Divide-and-conquer on the range of j: p subproblems on n
p -matrices
Conquer: tracing p balanced trails from bottom-left to top-right R
Partition n × n index space into a regular p × p grid of blocks
Sample R in block corners: total p2 samples
Steady Ant’s trails pass through ≤ 2p1+ blocks
Partition blocks across processors: comp, comm = O n
p1−
comp = O n log n
p + n
p1− comm = O n
p1− sync = O(1/ )
Alexander Tiskin (Warwick) Semi-local LCS 126 / 164
297. Parallel string comparison
Parallel LCS
Parallel LCS computation (cont.)
Divide-and-conquer on substrings of a, b: log p recursion levels
In every level, semi-local LCS subproblems on input substrings
Conquer phase: generalised parallel Steady Ant algorithm, = 1/3
base level: p = p1/2 × p1/2 subproblems on n
p1/2 -substrings; one
proc/subproblem; comm = O n
p1/2
middle level: p1/2 = p1/4 × p1/4 subproblems on n
p1/4 -substrings; p1/2
procs/subproblem; comm = O n/p1/4
(p1/2)1−1/3 = O n
p7/12
top level: one subproblem on n-strings; p procs;
comm = O n
p1−1/3 = O n
p2/3
sync = O 1
1/3 = O(1) in every level
Alexander Tiskin (Warwick) Semi-local LCS 127 / 164
298. Parallel string comparison
Parallel LCS
Parallel LCS computation (cont.)
comm dominated by base level:
O n
p1/2 + . . . + O n
p7/12 + . . . + O n
p2/3 = O n
p1/2
comp = O n2
p comm = O n
p1/2 sync = O(log p)
Alexander Tiskin (Warwick) Semi-local LCS 128 / 164
299. Parallel string comparison
Parallel LCS
Parallel LCS on permutation strings (LCSP = LIS)
Sequential time O(n log n), no good way known to parallelise directly
Solution via the more general semi-local LCSP is no longer efficient:
sequential time O(n log2
n)
Divide-and-conquer on substrings/subsequences of a, b
Parallelising normal O(n log n) seaweed multiplication: [Krusche, T: 2010]
comp O n log2
n
p comm O n log p
p sync O(log2
p)
Open problem: can we achieve
comp O n log n
p for LCSP?
comp O n log2
n
p , comm O(n
p ), sync O(log2
p) for semi-local LCSP?
Alexander Tiskin (Warwick) Semi-local LCS 129 / 164
301. The transposition network method
Transposition networks
Comparison network: a circuit of comparators
A comparator sorts two inputs and outputs them in prescribed order
Comparison networks traditionally used for non-branching merging/sorting
Classical comparison networks
# comparators
merging O(n log n) [Batcher: 1968]
sorting O(n log2
n) [Batcher: 1968]
O(n log n) [Ajtai+: 1983]
Alexander Tiskin (Warwick) Semi-local LCS 131 / 164
302. The transposition network method
Transposition networks
Comparison network: a circuit of comparators
A comparator sorts two inputs and outputs them in prescribed order
Comparison networks traditionally used for non-branching merging/sorting
Classical comparison networks
# comparators
merging O(n log n) [Batcher: 1968]
sorting O(n log2
n) [Batcher: 1968]
O(n log n) [Ajtai+: 1983]
Comparison networks are visualised by wire diagrams
Transposition network: all comparisons are between adjacent wires
Alexander Tiskin (Warwick) Semi-local LCS 131 / 164
303. The transposition network method
Transposition networks
Seaweed combing as a transposition network
A
C
B
C
A B C A
+7
+1
+5
+5
+3
+7
+1
−5
−1
−3
−3
+3
−5
−1
−7
−7
Character mismatches correspond to comparators
Inputs anti-sorted (sorted in reverse); each value traces a seaweed
Alexander Tiskin (Warwick) Semi-local LCS 132 / 164
304. The transposition network method
Transposition networks
Global LCS: transposition network with binary input
A
C
B
C
A B C A
1
1
1
1
1
1
1
0
0
0
0
1
0
0
0
0
Inputs still anti-sorted, but may not be distinct
Comparison between equal values is indeterminate
Alexander Tiskin (Warwick) Semi-local LCS 133 / 164
305. The transposition network method
Parameterised string comparison
Parameterised string comparison
String comparison sensitive e.g. to
low similarity: small λ = LCS(a, b)
high similarity: small κ = distLCS (a, b) = m + n − 2λ
Can also use weighted alignment score or edit distance
Assume m = n, therefore κ = 2(n − λ)
Alexander Tiskin (Warwick) Semi-local LCS 134 / 164
306. The transposition network method
Parameterised string comparison
Low-similarity comparison: small λ
sparse set of matches, may need to look at them all
preprocess matches for fast searching, time O(n log σ)
High-similarity comparison: small κ
set of matches may be dense, but only need to look at small subset
no need to preprocess, linear search is OK
Flexible comparison: sensitive to both high and low similarity, e.g. by both
comparison types running alongside each other
Alexander Tiskin (Warwick) Semi-local LCS 135 / 164
307. The transposition network method
Parameterised string comparison
Parameterised string comparison: running time
Low-similarity, after preprocessing in O(n log σ)
O(nλ) [Hirschberg: 1977]
[Apostolico, Guerra: 1985]
[Apostolico+: 1992]
High-similarity, no preprocessing
O(n · κ) [Ukkonen: 1985]
[Myers: 1986]
Flexible
O(λ · κ · log n) no preproc [Myers: 1986; Wu+: 1990]
O(λ · κ) after preproc [Rick: 1995]
Alexander Tiskin (Warwick) Semi-local LCS 136 / 164