The document discusses comparing genomes through permutations. It begins by introducing genomes and how they can be represented as signed permutations if they only differ by gene order. It then discusses two main ways of comparing genomes: by identifying common segments and by measuring evolutionary distances. The remainder of the talk will overview problems related to comparing signed and unsigned permutations, counting problems, reconstructing evolution, and open questions.
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Comparing Genomes through Permutations
1. Introduction
Comparing signed permutations
Comparing unsigned permutations
Counting problems
Median problems
Conclusions
Permutations in comparative genomics
Anthony Labarre
Anthony.Labarre@cs.kuleuven.be
Katholieke Universiteit Leuven (K.U.L.)
February 21st, 2012
Anthony Labarre Permutations in comparative genomics
2. Introduction
Comparing signed permutations Motivation
Comparing unsigned permutations From genomes to (signed) permutations
Counting problems Comparing genomes
Median problems Overview of the talk
Conclusions
A few definitions
Deoxyribonucleic acide: double
helix of nucleotides (A, C, G, T);
Complementarity (A-T, C-G): one
strand is enough;
Gene = sequence of nucleotides
(that codes for a specific protein);
Chromosome = ordered set of
genes;
Genome = set of chromosomes;
Goal: compare genomes;
Anthony Labarre Permutations in comparative genomics
3. Introduction
Comparing signed permutations Motivation
Comparing unsigned permutations From genomes to (signed) permutations
Counting problems Comparing genomes
Median problems Overview of the talk
Conclusions
Comparing genomes at the nucleotide level
Most common comparisons: at the nucleotide level;
Example (sequence alignment)
S1 : · · · T C C G C C A − − C T A ···
| | | | | |
S2 : · · · T C G G A C T G G C − A ···
Matches, substitutions, insertions and deletions;
Correspond to mutations;
Anthony Labarre Permutations in comparative genomics
4. Introduction
Comparing signed permutations Motivation
Comparing unsigned permutations From genomes to (signed) permutations
Counting problems Comparing genomes
Median problems Overview of the talk
Conclusions
At the gene level
Some mutations involve whole segments of nucleotides;
Genomes = signed permutations if genomes:
1 are totally ordered sets of genes, and
2 only differ by gene order (no duplications, no deletions).
Anthony Labarre Permutations in comparative genomics
5. Introduction
Comparing signed permutations Motivation
Comparing unsigned permutations From genomes to (signed) permutations
Counting problems Comparing genomes
Median problems Overview of the talk
Conclusions
At the gene level
Some mutations involve whole segments of nucleotides;
Genomes = signed permutations if genomes:
1 are totally ordered sets of genes, and
2 only differ by gene order (no duplications, no deletions).
Example (genomes → signed permutations)
(A)
(B)
Anthony Labarre Permutations in comparative genomics
6. Introduction
Comparing signed permutations Motivation
Comparing unsigned permutations From genomes to (signed) permutations
Counting problems Comparing genomes
Median problems Overview of the talk
Conclusions
At the gene level
Some mutations involve whole segments of nucleotides;
Genomes = signed permutations if genomes:
1 are totally ordered sets of genes, and
2 only differ by gene order (no duplications, no deletions).
Example (genomes → signed permutations)
(A)
+1 +2 +3 +4 +5 +6 +7 (B)
Anthony Labarre Permutations in comparative genomics
7. Introduction
Comparing signed permutations Motivation
Comparing unsigned permutations From genomes to (signed) permutations
Counting problems Comparing genomes
Median problems Overview of the talk
Conclusions
At the gene level
Some mutations involve whole segments of nucleotides;
Genomes = signed permutations if genomes:
1 are totally ordered sets of genes, and
2 only differ by gene order (no duplications, no deletions).
Example (genomes → signed permutations)
−5 +1 +2 +4 −7 −3 +6 (A)
+1 +2 +3 +4 +5 +6 +7 (B)
Anthony Labarre Permutations in comparative genomics
8. Introduction
Comparing signed permutations Motivation
Comparing unsigned permutations From genomes to (signed) permutations
Counting problems Comparing genomes
Median problems Overview of the talk
Conclusions
Comparing genomes
Two main ways of comparing genomes:
1 by identifying “common” or close content;
2 by measuring their distance according to evolutionary events;
Both approaches yield measures of (dis)similarity between
permutations;
(Many other measures are available, but they’re generally not
biologically relevant (see e.g. [Estivill-Castro and Wood, 1992]));
Anthony Labarre Permutations in comparative genomics
9. Introduction
Comparing signed permutations Motivation
Comparing unsigned permutations From genomes to (signed) permutations
Counting problems Comparing genomes
Median problems Overview of the talk
Conclusions
What lies ahead
I will give an overview of several representative problems:
comparing signed and unsigned permutations;
enumeration problems, with applications;
using comparisons to reconstruct evolution;
Links with other interesting areas;
Open problems and suggestions for future research;
Anthony Labarre Permutations in comparative genomics
10. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Measuring similarity
Search for segments that are “alike”;
This is a form of approximate pattern matching:
1 at the nucleotide level: look for subsequences that are “almost
the same”;
(this is how genomes are partitioned to yield permutations)
2 at the gene level: look for subsequences that have exactly the
same content BUT in a possibly different order;
Point 2 motivates the next definition;
Anthony Labarre Permutations in comparative genomics
11. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Common intervals
Definition
±
A common interval of permutations π 1 , π 2 , . . . , π m in Sn is a
subset of {1, 2, . . . , n} whose elements form a substring of
π 1 , π 2 , . . . , π m (up to reordering and sign changes).
Biological motivation: genes that stuck together in an ancestor
and in the present species are not likely to have been separated
during evolution.
Anthony Labarre Permutations in comparative genomics
12. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Common intervals
Definition
±
A common interval of permutations π 1 , π 2 , . . . , π m in Sn is a
subset of {1, 2, . . . , n} whose elements form a substring of
π 1 , π 2 , . . . , π m (up to reordering and sign changes).
Biological motivation: genes that stuck together in an ancestor
and in the present species are not likely to have been separated
during evolution.
Example (some common intervals of two given permutations)
9 −8 4 −5 −6 7 1 2 3 {4, 5, 6, 7, 8, 9}, {1, 2, 3}
1 2 3 −8 7 −4 −5 6 −9 {4, 5, 8}: NO
Anthony Labarre Permutations in comparative genomics
13. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Intervals and measures
Other variants exist (e.g. conserved intervals);
Measures of (dis)similarity can be built and computed
efficiently [Bergeron and Stoye, 2006];
simple to compute;
biologically relevant;
× do not correspond to evolutionary events;
... which is why we’ll now have a look at “event-based”
measures;
Anthony Labarre Permutations in comparative genomics
14. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Genome rearrangements
Genomes evolve by point mutations, but also by means of
mutations involving whole segments;
Parsimony principle in biology: a shorter scenario of
mutations is more likely;
This motivates the study of the following problem;
Anthony Labarre Permutations in comparative genomics
15. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Genome rearrangements
Problem (pairwise genome rearrangement problem)
± ±
Given: permutations π, σ in Sn , and a generating set S of Sn .
Find: a sequence of t elements of S that:
1 transforms π into σ (or conversely), and
2 such that t is as small as possible.
This yields a distance dS (·, ·) between permutations;
Anthony Labarre Permutations in comparative genomics
16. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Genome rearrangements
Problem (pairwise genome rearrangement problem)
± ±
Given: permutations π, σ in Sn , and a generating set S of Sn .
Find: a sequence of t elements of S that:
1 transforms π into σ (or conversely), and
2 such that t is as small as possible.
This yields a distance dS (·, ·) between permutations;
dS (·, ·) is usually left-invariant, so we assume σ = ι;
Anthony Labarre Permutations in comparative genomics
17. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Genome rearrangements
Problem (pairwise genome rearrangement problem)
± ±
Given: permutations π, σ in Sn , and a generating set S of Sn .
Find: a sequence of t elements of S that:
1 transforms π into σ (or conversely), and
2 such that t is as small as possible.
This yields a distance dS (·, ·) between permutations;
dS (·, ·) is usually left-invariant, so we assume σ = ι;
S is closed under inverses (x ∈ S ⇔ x −1 ∈ S);
Anthony Labarre Permutations in comparative genomics
18. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Genome rearrangements
Problem (pairwise genome rearrangement problem)
± ±
Given: permutations π, σ in Sn , and a generating set S of Sn .
Find: a sequence of t elements of S that:
1 transforms π into σ (or conversely), and
2 such that t is as small as possible.
This yields a distance dS (·, ·) between permutations;
dS (·, ·) is usually left-invariant, so we assume σ = ι;
S is closed under inverses (x ∈ S ⇔ x −1 ∈ S);
⇒ find a minimum-length factorisation of π into the product
of elements of S;
Anthony Labarre Permutations in comparative genomics
19. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Example: sorting by signed reversals
π= −5 +1 +2 +4 −7 −3 +6
σ= +1 +2 +3 +4 +5 +6 +7
Anthony Labarre Permutations in comparative genomics
20. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Example: sorting by signed reversals
π= −5 +1 +2 +4 −7 −3 +6
σ= +1 +2 +3 +4 +5 +6 +7
Anthony Labarre Permutations in comparative genomics
21. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Example: sorting by signed reversals
π= −5 +1 +2 +4 −7 −3 +6
−5 +1 +2 −4 −7 −3 +6
σ= +1 +2 +3 +4 +5 +6 +7
Anthony Labarre Permutations in comparative genomics
22. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Example: sorting by signed reversals
π= −5 +1 +2 +4 −7 −3 +6
−5 +1 +2 −4 −7 −3 +6
σ= +1 +2 +3 +4 +5 +6 +7
Anthony Labarre Permutations in comparative genomics
23. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Example: sorting by signed reversals
π= −5 +1 +2 +4 −7 −3 +6
−5 +1 +2 −4 −7 −3 +6
−5 +1 +2 −4 −7 −6 +3
σ= +1 +2 +3 +4 +5 +6 +7
Anthony Labarre Permutations in comparative genomics
24. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Example: sorting by signed reversals
π= −5 +1 +2 +4 −7 −3 +6
−5 +1 +2 −4 −7 −3 +6
−5 +1 +2 −4 −7 −6 +3
σ= +1 +2 +3 +4 +5 +6 +7
Anthony Labarre Permutations in comparative genomics
25. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Example: sorting by signed reversals
π= −5 +1 +2 +4 −7 −3 +6
−5 +1 +2 −4 −7 −3 +6
−5 +1 +2 −4 −7 −6 +3
−5 +1 +2 −4 −3 +6 +7
σ= +1 +2 +3 +4 +5 +6 +7
Anthony Labarre Permutations in comparative genomics
26. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Example: sorting by signed reversals
π= −5 +1 +2 +4 −7 −3 +6
−5 +1 +2 −4 −7 −3 +6
−5 +1 +2 −4 −7 −6 +3
−5 +1 +2 −4 −3 +6 +7
σ= +1 +2 +3 +4 +5 +6 +7
Anthony Labarre Permutations in comparative genomics
27. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Example: sorting by signed reversals
π= −5 +1 +2 +4 −7 −3 +6
−5 +1 +2 −4 −7 −3 +6
−5 +1 +2 −4 −7 −6 +3
−5 +1 +2 −4 −3 +6 +7
−5 +1 +2 +3 +4 +6 +7
σ= +1 +2 +3 +4 +5 +6 +7
Anthony Labarre Permutations in comparative genomics
28. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Example: sorting by signed reversals
π= −5 +1 +2 +4 −7 −3 +6
−5 +1 +2 −4 −7 −3 +6
−5 +1 +2 −4 −7 −6 +3
−5 +1 +2 −4 −3 +6 +7
−5 +1 +2 +3 +4 +6 +7
σ= +1 +2 +3 +4 +5 +6 +7
Anthony Labarre Permutations in comparative genomics
29. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Example: sorting by signed reversals
π= −5 +1 +2 +4 −7 −3 +6
−5 +1 +2 −4 −7 −3 +6
−5 +1 +2 −4 −7 −6 +3
−5 +1 +2 −4 −3 +6 +7
−5 +1 +2 +3 +4 +6 +7
−5 −4 −3 −2 −1 +6 +7
σ= +1 +2 +3 +4 +5 +6 +7
Anthony Labarre Permutations in comparative genomics
30. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Example: sorting by signed reversals
π= −5 +1 +2 +4 −7 −3 +6
−5 +1 +2 −4 −7 −3 +6
−5 +1 +2 −4 −7 −6 +3
−5 +1 +2 −4 −3 +6 +7
−5 +1 +2 +3 +4 +6 +7
−5 −4 −3 −2 −1 +6 +7
σ= +1 +2 +3 +4 +5 +6 +7
srd(π, σ) ≤ 6
Anthony Labarre Permutations in comparative genomics
31. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Computing genome rearrangement distances
Proving upper bounds is “easy” (find a sequence that works);
But how do we guarantee that a given sequence is optimal?
We usually rely on variants of the breakpoint graph, first
introduced by [Bafna and Pevzner, 1996];
That structure proved very useful in:
1 obtaining bounds and approximations;
2 computing distances exactly in polynomial time;
3 proving complexity results;
Anthony Labarre Permutations in comparative genomics
32. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
The breakpoint graph
Going back to our example:
π= −5 +1 +2 +4 −7 −3 +6
Anthony Labarre Permutations in comparative genomics
33. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
The breakpoint graph
Going back to our example:
π= −5 +1 +2 +4 −7 −3 +6
π = 0 10 9 1 2 3 4 7 8 14 13 6 5 11 12 15
1 double π’s elements
(i → {2|i| − 1, 2|i|}) and
add 0 and 2n + 1
Anthony Labarre Permutations in comparative genomics
34. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
The breakpoint graph
Going back to our example:
π= −5 +1 +2 +4 −7 −3 +6
π = 0 10 9 1 2 3 4 7 8 14 13 6 5 11 12 15
1 double π’s elements π14 π15 π0
15
(i → {2|i| − 1, 2|i|}) and π13 12 0 π1
11 10
add 0 and 2n + 1 π12 π2
5 9
2 elements of π = vertices
π11 6 1 π3
π10 13 2π
4
14 3
π9 8 7 4 π5
π8 π π6
7
Anthony Labarre Permutations in comparative genomics
35. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
The breakpoint graph
Going back to our example:
π= −5 +1 +2 +4 −7 −3 +6
π = 0 10 9 1 2 3 4 7 8 14 13 6 5 11 12 15
1 double π’s elements π14 π15 π0
15
(i → {2|i| − 1, 2|i|}) and π13 12 0 π1
11 10
add 0 and 2n + 1 π12 π2
5 9
2 elements of π = vertices
π11 6 1 π3
3 black edges connect
distinct adjacent genes π10 13 2π
4
14 3
π9 8 7 4 π5
π8 π π6
7
Anthony Labarre Permutations in comparative genomics
36. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
The breakpoint graph
Going back to our example:
π= −5 +1 +2 +4 −7 −3 +6
π = 0 10 9 1 2 3 4 7 8 14 13 6 5 11 12 15
1 double π’s elements π14 π15 π0
15
(i → {2|i| − 1, 2|i|}) and π13 12 0 π1
11 10
add 0 and 2n + 1 π12 π2
5 9
2 elements of π = vertices
π11 6 1 π3
3 black edges connect
distinct adjacent genes π10 13 2π
4
4 grey edges connect distinct 14 3
π9 8 7 4 π5
consecutive genes
π8 π π6
7
Anthony Labarre Permutations in comparative genomics
37. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Using the breakpoint graph
The breakpoint graph is 2-regular and decomposes as such
into alternating cycles in a unique way;
The breakpoint graph of 1 2 · · · n contains the largest
number of cycles:
15 15
12 0 14 0
11 10 13 1
5 9 12 2
6 1 11 3
13 2 10 4
14 3 9 5
8 4 8 6
7 7
⇒ goal: create new cycles in as few moves as possible;
Anthony Labarre Permutations in comparative genomics
38. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
A lower bound on the signed reversal distance
A signed reversal involves black edges belonging to at most
two cycles;
Anthony Labarre Permutations in comparative genomics
39. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
A lower bound on the signed reversal distance
A signed reversal involves black edges belonging to at most
two cycles;
The only way to increase c(BG (π)) is to split cycles:
··· ···
π2i π2i+1 π2j π2j+1 π2i π2j π2i+1 π2j+1
Anthony Labarre Permutations in comparative genomics
40. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
A lower bound on the signed reversal distance
A signed reversal involves black edges belonging to at most
two cycles;
The only way to increase c(BG (π)) is to split cycles:
··· ···
π2i π2i+1 π2j π2j+1 π2i π2j π2i+1 π2j+1
±
Therefore, for all π in Sn :
srd(π) ≥ n + 1 − c(BG (π)).
Anthony Labarre Permutations in comparative genomics
41. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
An exact formula for computing the signed reversal distance
It is not always possible to split cycles;
Anthony Labarre Permutations in comparative genomics
42. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
An exact formula for computing the signed reversal distance
It is not always possible to split cycles;
Worse: some collections of cycles in the breakpoint graph,
called hurdles, each require an additional move;
0 13 14 9 10 4 3 6 5 11 12 1 2 7 8 15 16 21 22 19 20 17 18 23
splittable ⇒ component is “ok” hurdle (no splittable cycle)
Anthony Labarre Permutations in comparative genomics
43. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
An exact formula for computing the signed reversal distance
It is not always possible to split cycles;
Worse: some collections of cycles in the breakpoint graph,
called hurdles, each require an additional move;
0 13 14 9 10 4 3 2 1 12 11 5 6 7 8 15 16 21 22 19 20 17 18 23
splittable ⇒ component is “ok” hurdle (no splittable cycle)
Anthony Labarre Permutations in comparative genomics
44. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
An exact formula for computing the signed reversal distance
It is not always possible to split cycles;
Worse: some collections of cycles in the breakpoint graph,
called hurdles, each require an additional move;
Theorem ([Hannenhalli and Pevzner, 1999])
±
For all π in Sn :
srd(π) = n + 1 − c(BG (π)) + h(BG (π)) + f (BG (π)) .
number of hurdles special “fortress” case
Anthony Labarre Permutations in comparative genomics
45. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
An exact formula for computing the signed reversal distance
It is not always possible to split cycles;
Worse: some collections of cycles in the breakpoint graph,
called hurdles, each require an additional move;
Theorem ([Hannenhalli and Pevzner, 1999])
±
For all π in Sn :
srd(π) = n + 1 − c(BG (π)) + h(BG (π)) + f (BG (π)) .
number of hurdles special “fortress” case
Computing srd(·) / sorting can be done in polynomial time;
Anthony Labarre Permutations in comparative genomics
46. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Integrating other constraints: “perfection”
(recall Mathilde Bouvel’s talk)
As seen before, some genes “stick together”;
Problem (perfect sorting by signed reversals)
±
Given: a permutation π in Sn , and a set S of intervals of π.
Find: a minimum-length sequence of signed reversals that sorts π
and whose elements do not overlap with intervals from S.
Example (not sorting sequences)
−3 2 − 1 4 5 −3 2 − 1 4 5
invalid valid
Anthony Labarre Permutations in comparative genomics
47. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
A more general view: double cut-and-join
The “double-cut-and-join” (DCJ) operation is defined directly
on the breakpoint graph:
Idea: cut two black edges and join their endpoints;
Simulates signed reversals and block-interchanges (2 DCJs for
the latter);
⇒ sorting by DCJs ≡ sorting by signed reversals with weight 1
and block-interchanges with weight 2;
Theorem ([Yancopoulos et al., 2005])
±
For any π in Sn : dcj(π) = n + 1 − c(BG (π)).
Anthony Labarre Permutations in comparative genomics
48. Introduction
Searching for similarities
Comparing signed permutations
Genome rearrangements
Comparing unsigned permutations
Sorting by signed reversals
Counting problems
The breakpoint graph and its uses
Median problems
Selected results and open problems
Conclusions
Some results and some questions (non exhaustive)
What has been done:
Operation Sorting Distance Best approximation
signed reversal O(n3/2 ) [Han, 2006] O(n) [Bader et al., 2001] 1
perfect signed reversal NP-hard [Figeac and Varr´, 2004]
e ?
prefix signed reversal ? ? 2 [Cohen and Blum, 1995]
double cut and join O(n) [Yancopoulos et al., 2005] 1
What could be done:
Prefix signed reversals:
1 complexity of sorting / computing the distance?
2 largest value the distance can reach?
3 “better-than-2”-approximation?
Characterise “hard instances” using pattern
matching/avoidance;
Anthony Labarre Permutations in comparative genomics
49. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
Unsigned permutations
In some situations, gene orientation may be unknown or can
be disregarded;
Genomes are then modelled by the more traditional unsigned
permutations;
Of course, rearrangements in that setting do not affect
orientation;
Anthony Labarre Permutations in comparative genomics
50. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
The breakpoint graph in the unsigned case
More direct construction than in the signed case;
Let’s build the breakpoint graph of π = 4 1 6 2 5 7 3 :
Anthony Labarre Permutations in comparative genomics
51. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
The breakpoint graph in the unsigned case
More direct construction than in the signed case;
Let’s build the breakpoint graph of π = 4 1 6 2 5 7 3 :
0
1 build the ordered vertex set
3 4 (π0 = 0, π1 , π2 , . . . , πn );
7 1
5 6
2
Anthony Labarre Permutations in comparative genomics
52. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
The breakpoint graph in the unsigned case
More direct construction than in the signed case;
Let’s build the breakpoint graph of π = 4 1 6 2 5 7 3 :
0
1 build the ordered vertex set
3 4 (π0 = 0, π1 , π2 , . . . , πn );
2 add black arcs for every
7 1 ordered pair
(πi , πi−1 (mod n+1) );
5 6
2
Anthony Labarre Permutations in comparative genomics
53. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
The breakpoint graph in the unsigned case
More direct construction than in the signed case;
Let’s build the breakpoint graph of π = 4 1 6 2 5 7 3 :
0
1 build the ordered vertex set
3 4 (π0 = 0, π1 , π2 , . . . , πn );
2 add black arcs for every
7 1 ordered pair
(πi , πi−1 (mod n+1) );
3 add grey arcs for every ordered
5 6 pair (i, i + 1 (mod n + 1));
2
Anthony Labarre Permutations in comparative genomics
54. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
The breakpoint graph in the unsigned case
More direct construction than in the signed case;
Let’s build the breakpoint graph of π = 4 1 6 2 5 7 3 :
0
1 build the ordered vertex set
3 4 (π0 = 0, π1 , π2 , . . . , πn );
2 add black arcs for every
7 1 ordered pair
(πi , πi−1 (mod n+1) );
3 add grey arcs for every ordered
5 6 pair (i, i + 1 (mod n + 1));
2
BG (π) decomposes in a unique way into alternating cycles
Anthony Labarre Permutations in comparative genomics
55. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
Breakpoint graphs as permutations
Nice property of BG (π): its alternating cycle decomposition
matches the traditional disjoint cycle decomposition of
π = (0, 1, 2, . . . , n) ◦ (0, πn , πn−1 , . . . , π1 );
As a consequence, we can express the action of any
rearrangement σ on π using π:
Lemma ([Labarre, 2012])
For all π, σ in Sn , we have π ◦ σ = π ◦ σ π .
... and we can recycle known results in this context;
Anthony Labarre Permutations in comparative genomics
56. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
Obtaining lower bounds
Lower bounds can be automatically obtained as follows:
π π ◦ x1 π ◦ x1 ◦ x2 ι
Initial problem: ···
computing dS (π):
Anthony Labarre Permutations in comparative genomics
57. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
Obtaining lower bounds
Lower bounds can be automatically obtained as follows:
π π ◦ x1 π ◦ x1 ◦ x2 ι
Initial problem: ···
computing dS (π):
New problem:
computing dC (π): π ι=ι
Anthony Labarre Permutations in comparative genomics
58. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
Obtaining lower bounds
Lower bounds can be automatically obtained as follows:
π π ◦ x1 π ◦ x1 ◦ x2 ι
Initial problem: ···
computing dS (π):
dC (π) ≤ dS (π)
New problem: ···
computing dC (π): π π ◦ y1 π ◦ y1 ◦ y2 ι=ι
Anthony Labarre Permutations in comparative genomics
59. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
Obtaining lower bounds
Lower bounds can be automatically obtained as follows:
π π ◦ x1 π ◦ x1 ◦ x2 ι
Initial problem: ···
computing dS (π):
dC (π) ≤ dS (π)
New problem: ···
computing dC (π): π π ◦ y1 π ◦ y1 ◦ y2 ι=ι
This yields tight lower bounds on:
1 the block-interchange distance [Christie, 1996];
2 the transposition distance [Bafna and Pevzner, 1998];
3 the prefix transposition distance [Labarre, 2012];
Anthony Labarre Permutations in comparative genomics
60. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
Three examples
How one can obtain bounds on the following distances:
1 block-interchange distance (bid(·)):
Anthony Labarre Permutations in comparative genomics
61. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
Three examples
How one can obtain bounds on the following distances:
1 block-interchange distance (bid(·)):
1 block-interchange = pair of 2-cycles;
Anthony Labarre Permutations in comparative genomics
62. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
Three examples
How one can obtain bounds on the following distances:
1 block-interchange distance (bid(·)):
1 block-interchange = pair of 2-cycles;
2 bid(π) ≥ dC (π) = n+1−c(BG (π)) ;
2
Anthony Labarre Permutations in comparative genomics
63. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
Three examples
How one can obtain bounds on the following distances:
1 block-interchange distance (bid(·)):
1 block-interchange = pair of 2-cycles;
2 bid(π) ≥ dC (π) = n+1−c(BG (π)) ;
2
2 transposition distance (td(·)):
Anthony Labarre Permutations in comparative genomics
64. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
Three examples
How one can obtain bounds on the following distances:
1 block-interchange distance (bid(·)):
1 block-interchange = pair of 2-cycles;
2 bid(π) ≥ dC (π) = n+1−c(BG (π)) ;
2
2 transposition distance (td(·)):
1 transposition = 3-cycle;
Anthony Labarre Permutations in comparative genomics
65. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
Three examples
How one can obtain bounds on the following distances:
1 block-interchange distance (bid(·)):
1 block-interchange = pair of 2-cycles;
2 bid(π) ≥ dC (π) = n+1−c(BG (π)) ;
2
2 transposition distance (td(·)):
1 transposition = 3-cycle;
2 td(π) ≥ dC (π) = n+1−codd (BG (π)) ;
2
Anthony Labarre Permutations in comparative genomics
66. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
Three examples
How one can obtain bounds on the following distances:
1 block-interchange distance (bid(·)):
1 block-interchange = pair of 2-cycles;
2 bid(π) ≥ dC (π) = n+1−c(BG (π)) ;
2
2 transposition distance (td(·)):
1 transposition = 3-cycle;
2 td(π) ≥ dC (π) = n+1−codd (BG (π)) ;
2
3 prefix transposition distance (ptd(·)):
Anthony Labarre Permutations in comparative genomics
67. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
Three examples
How one can obtain bounds on the following distances:
1 block-interchange distance (bid(·)):
1 block-interchange = pair of 2-cycles;
2 bid(π) ≥ dC (π) = n+1−c(BG (π)) ;
2
2 transposition distance (td(·)):
1 transposition = 3-cycle;
2 td(π) ≥ dC (π) = n+1−codd (BG (π)) ;
2
3 prefix transposition distance (ptd(·)):
1 prefix transposition = 3-cycle containing 0;
Anthony Labarre Permutations in comparative genomics
68. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
Three examples
How one can obtain bounds on the following distances:
1 block-interchange distance (bid(·)):
1 block-interchange = pair of 2-cycles;
2 bid(π) ≥ dC (π) = n+1−c(BG (π)) ;
2
2 transposition distance (td(·)):
1 transposition = 3-cycle;
2 td(π) ≥ dC (π) = n+1−codd (BG (π)) ;
2
3 prefix transposition distance (ptd(·)):
1 prefix transposition = 3-cycle containing 0;
2 ptd(π) ≥ dC (π) =
n+1+c(BG (π)) 0 if π1 = 1,
2
− 2c1 (BG (π)) − ;
1 otherwise.
Anthony Labarre Permutations in comparative genomics
69. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
A word of caution
Don’t think that sorting unsigned permutations is trivial;
bid(·) and exc(·) are indeed easy to compute, but:
1 sorting by (prefix or arbitrary) reversals becomes NP-hard;
2 sorting by transpositions is NP-hard;
3 sorting by double cut-and-joins is NP-hard;
4 sorting by prefix transpositions is open;
Anthony Labarre Permutations in comparative genomics
70. Introduction
Comparing signed permutations
An “unsigned variant” of the breakpoint graph
Comparing unsigned permutations
Lower bounds on unsigned distances
Counting problems
Selected results and open problems
Median problems
Conclusions
Results on sorting unsigned permutations
What has been done:
Operation Sorting Distance Best approximation
exchange O(n) [Knuth, 1995] 1
block-interchange O(n) [Christie, 1996] 1
double cut-and-joins NP-hard [Chen, 2010] ?
reversal NP-hard [Caprara, 1999b] 11/8 [Berman et al., 2002]
transposition NP-hard [Bulteau et al., 2011b] 11/8 [Elias and Hartman, 2006]
exchange O(n) [Akers et al., 1987] 1
prefix
reversal NP-hard [Bulteau et al., 2011a] 2 [Fischer and Ginzinger, 2005]
transposition ? ? 2 [Dias and Meidanis, 2002]
What could be done:
better approximations;
complexity of prefix transposition problems?
Anthony Labarre Permutations in comparative genomics
71. Introduction
Comparing signed permutations
Hultman numbers
Comparing unsigned permutations
Approximating distance distributions
Counting problems
“Listing” problems
Median problems
Conclusions
Counting problems
What is the distribution of a given rearrangement distance?
Most tight bounds on rearrangement distances are obtained
using the breakpoint graph and its cycles;
Use the distribution of cycles in the breakpoint graph to
approximately answer the above;
Anthony Labarre Permutations in comparative genomics
72. Introduction
Comparing signed permutations
Hultman numbers
Comparing unsigned permutations
Approximating distance distributions
Counting problems
“Listing” problems
Median problems
Conclusions
Hultman numbers
Hultman numbers count permutations whose breakpoint
graph contains k cycles:
SH (n, k) = |{π ∈ Sn | c(BG (π)) = k}| unsigned case
± ±
SH (n, k) = |{π ∈ Sn | c(BG (π)) = k}| signed case
Similar in spirit to Stirling numbers of the first kind;
Explicit formulas are available for computing SH (n, k) and
±
SH (n, k) (see e.g. [Grusea and Labarre, 2011]);
Also available: generating functions, expected value and
variance;
Anthony Labarre Permutations in comparative genomics
73. Introduction
Comparing signed permutations
Hultman numbers
Comparing unsigned permutations
Approximating distance distributions
Counting problems
“Listing” problems
Median problems
Conclusions
Approximating distance distributions using Hultman numbers
unsigned permutations signed permutations
·109 ·109 ·109
|td(·) = k| 3 |ptd(·) = k| 1.5 |psrd(·) = k|
SH (n, n + 1 − 2k) SH (n, n + 2 − 2k) ±
3 SH (n, n + 3 − k)
(n = 13) (n = 13) (n = 10)
2 1
2
1 0.5
1
0 0 0
0 2 4 6 8 0 2 4 6 8 10 0 5 10 15
·109 ·109 ·109
3 |rd(·) = k| 3 |bid(·) = k| 1.5 |srd(·) = k|
SH (n, n + 3 − 2k) SH (n, n + 1 − 2k) ±
SH (n, n + 1 − k)
(n = 13) (n = 13) (n = 10)
2 2 1
distributions match 0.5 distributions are close
1 1
0 0 0
0 2 4 6 8 10 12 0 2 4 6 0 2 4 6 8 10 12
(distance distributions from [Galv˜o and Dias, 2011])
a
Anthony Labarre Permutations in comparative genomics
74. Introduction
Comparing signed permutations
Hultman numbers
Comparing unsigned permutations
Approximating distance distributions
Counting problems
“Listing” problems
Median problems
Conclusions
Experiments wrap-up and perspectives
Some distance distributions are extremely well approximated
using (some function of) the Hultman numbers;
Can bounds be tightened by trying to minimise the difference
between the distributions?
Can the proximity of distributions be used to argue that exact
computation of hard distances is overrated?
Anthony Labarre Permutations in comparative genomics
75. Introduction
Comparing signed permutations
Hultman numbers
Comparing unsigned permutations
Approximating distance distributions
Counting problems
“Listing” problems
Median problems
Conclusions
Listing optimal sequences
Distances are informative;
Anthony Labarre Permutations in comparative genomics
76. Introduction
Comparing signed permutations
Hultman numbers
Comparing unsigned permutations
Approximating distance distributions
Counting problems
“Listing” problems
Median problems
Conclusions
Listing optimal sequences
Distances are informative;
... but an actual sorting sequence is more informative;
Anthony Labarre Permutations in comparative genomics
77. Introduction
Comparing signed permutations
Hultman numbers
Comparing unsigned permutations
Approximating distance distributions
Counting problems
“Listing” problems
Median problems
Conclusions
Listing optimal sequences
Distances are informative;
... but an actual sorting sequence is more informative;
What if the given sequence we found makes no biological
sense?
Anthony Labarre Permutations in comparative genomics
78. Introduction
Comparing signed permutations
Hultman numbers
Comparing unsigned permutations
Approximating distance distributions
Counting problems
“Listing” problems
Median problems
Conclusions
Listing optimal sequences
Distances are informative;
... but an actual sorting sequence is more informative;
What if the given sequence we found makes no biological
sense?
⇒ can we list all optimal sequences for a given instance?
Anthony Labarre Permutations in comparative genomics
79. Introduction
Comparing signed permutations
Hultman numbers
Comparing unsigned permutations
Approximating distance distributions
Counting problems
“Listing” problems
Median problems
Conclusions
Listing optimal sequences
Distances are informative;
... but an actual sorting sequence is more informative;
What if the given sequence we found makes no biological
sense?
⇒ can we list all optimal sequences for a given instance?
Efficient algorithms exist for listing:
all optimal signed reversals [Swenson et al., 2011];
all optimal sequences [Badr et al., 2011];
Anthony Labarre Permutations in comparative genomics
80. Introduction
Comparing signed permutations
Motivation
Comparing unsigned permutations
Bounds
Counting problems
Selected results
Median problems
Conclusions
Median problems
Measures of similarities between genomes are useful in
reconstructing phylogenies;
Example (phylogeny from distance matrix)
a b c d e c
a 0 2 3 6 6 2
d
b 2 0 3 6 6 a 1 1
1 2
c 3 3 0 5 5
d 6 6 5 0 4 b 1
2
e 6 6 5 4 0 e
(The matrix must satisfy some conditions [Buneman, 1971]);
Anthony Labarre Permutations in comparative genomics
81. Introduction
Comparing signed permutations
Motivation
Comparing unsigned permutations
Bounds
Counting problems
Selected results
Median problems
Conclusions
Median problems
Parsimony again: search for a tree that minimises the total
number of evolutionary events (i.e. the sum of all edge
weights);
In its simplest form, the problem we want to solve is:
Problem (median of three)
± ± ±
Given: π, σ, τ in Sn ; a distance d : Sn × Sn → N.
Find: a permutation µ in Sn ± that minimises
w (µ) = d(π, µ) + d(σ, µ) + d(τ, µ).
Can be generalised to more than three input permutations;
Anthony Labarre Permutations in comparative genomics
82. Introduction
Comparing signed permutations
Motivation
Comparing unsigned permutations
Bounds
Counting problems
Selected results
Median problems
Conclusions
Generic bounds [Siepel and Moret, 2001]
Generic lower and upper bounds for any distance:
π
d(π, σ) d(π, τ )
σ τ
d(σ, τ )
Anthony Labarre Permutations in comparative genomics
83. Introduction
Comparing signed permutations
Motivation
Comparing unsigned permutations
Bounds
Counting problems
Selected results
Median problems
Conclusions
Generic bounds [Siepel and Moret, 2001]
Generic lower and upper bounds for any distance:
π
d(π, σ) d(π, τ )
σ τ
d(σ, τ )
if µ=π if µ=σ if µ=τ
w (µ) ≤ min{d(π, σ) + d(π, τ ), d(π, σ) + d(σ, τ ), d(π, τ ) + d(σ, τ )}.
Anthony Labarre Permutations in comparative genomics