SlideShare a Scribd company logo
1 of 96
Download to read offline
Introduction
    Comparing signed permutations
  Comparing unsigned permutations
                Counting problems
                 Median problems
                       Conclusions




Permutations in comparative genomics

              Anthony Labarre
      Anthony.Labarre@cs.kuleuven.be

            Katholieke Universiteit Leuven (K.U.L.)


                     February 21st, 2012




                  Anthony Labarre    Permutations in comparative genomics
Introduction
             Comparing signed permutations    Motivation
           Comparing unsigned permutations    From genomes to (signed) permutations
                         Counting problems    Comparing genomes
                          Median problems     Overview of the talk
                                Conclusions


A few definitions


      Deoxyribonucleic acide: double
      helix of nucleotides (A, C, G, T);
      Complementarity (A-T, C-G): one
      strand is enough;
      Gene = sequence of nucleotides
      (that codes for a specific protein);
      Chromosome = ordered set of
      genes;
      Genome = set of chromosomes;
      Goal: compare genomes;

                           Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations    Motivation
             Comparing unsigned permutations    From genomes to (signed) permutations
                           Counting problems    Comparing genomes
                            Median problems     Overview of the talk
                                  Conclusions


Comparing genomes at the nucleotide level

      Most common comparisons: at the nucleotide level;

  Example (sequence alignment)

    S1 : · · · T      C     C      G     C C       A − − C                   T A ···
               |      |            |       |             |                     |
    S2 : · · · T      C     G      G     A C       T G G C                   − A ···

      Matches, substitutions, insertions and deletions;
      Correspond to mutations;


                             Anthony Labarre    Permutations in comparative genomics
Introduction
                Comparing signed permutations    Motivation
              Comparing unsigned permutations    From genomes to (signed) permutations
                            Counting problems    Comparing genomes
                             Median problems     Overview of the talk
                                   Conclusions


At the gene level

      Some mutations involve whole segments of nucleotides;
      Genomes = signed permutations if genomes:
        1   are totally ordered sets of genes, and
        2   only differ by gene order (no duplications, no deletions).




                              Anthony Labarre    Permutations in comparative genomics
Introduction
                Comparing signed permutations    Motivation
              Comparing unsigned permutations    From genomes to (signed) permutations
                            Counting problems    Comparing genomes
                             Median problems     Overview of the talk
                                   Conclusions


At the gene level

      Some mutations involve whole segments of nucleotides;
      Genomes = signed permutations if genomes:
        1   are totally ordered sets of genes, and
        2   only differ by gene order (no duplications, no deletions).

  Example (genomes → signed permutations)
                                                                                         (A)




                                                                                         (B)


                              Anthony Labarre    Permutations in comparative genomics
Introduction
                 Comparing signed permutations    Motivation
               Comparing unsigned permutations    From genomes to (signed) permutations
                             Counting problems    Comparing genomes
                              Median problems     Overview of the talk
                                    Conclusions


At the gene level

      Some mutations involve whole segments of nucleotides;
      Genomes = signed permutations if genomes:
        1    are totally ordered sets of genes, and
        2    only differ by gene order (no duplications, no deletions).

  Example (genomes → signed permutations)
                                                                                          (A)




        +1         +2          +3          +4       +5         +6          +7             (B)


                               Anthony Labarre    Permutations in comparative genomics
Introduction
                 Comparing signed permutations    Motivation
               Comparing unsigned permutations    From genomes to (signed) permutations
                             Counting problems    Comparing genomes
                              Median problems     Overview of the talk
                                    Conclusions


At the gene level

      Some mutations involve whole segments of nucleotides;
      Genomes = signed permutations if genomes:
        1    are totally ordered sets of genes, and
        2    only differ by gene order (no duplications, no deletions).

  Example (genomes → signed permutations)
        −5         +1          +2          +4       −7         −3          +6             (A)




        +1         +2          +3          +4       +5         +6          +7             (B)


                               Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations    Motivation
             Comparing unsigned permutations    From genomes to (signed) permutations
                           Counting problems    Comparing genomes
                            Median problems     Overview of the talk
                                  Conclusions


Comparing genomes



     Two main ways of comparing genomes:
       1   by identifying “common” or close content;
       2   by measuring their distance according to evolutionary events;
     Both approaches yield measures of (dis)similarity between
     permutations;
     (Many other measures are available, but they’re generally not
     biologically relevant (see e.g. [Estivill-Castro and Wood, 1992]));




                             Anthony Labarre    Permutations in comparative genomics
Introduction
              Comparing signed permutations    Motivation
            Comparing unsigned permutations    From genomes to (signed) permutations
                          Counting problems    Comparing genomes
                           Median problems     Overview of the talk
                                 Conclusions


What lies ahead



     I will give an overview of several representative problems:
         comparing signed and unsigned permutations;
         enumeration problems, with applications;
         using comparisons to reconstruct evolution;
     Links with other interesting areas;
     Open problems and suggestions for future research;




                            Anthony Labarre    Permutations in comparative genomics
Introduction
                                                 Searching for similarities
                Comparing signed permutations
                                                 Genome rearrangements
              Comparing unsigned permutations
                                                 Sorting by signed reversals
                            Counting problems
                                                 The breakpoint graph and its uses
                             Median problems
                                                 Selected results and open problems
                                   Conclusions


Measuring similarity



      Search for segments that are “alike”;
      This is a form of approximate pattern matching:
        1   at the nucleotide level: look for subsequences that are “almost
            the same”;
            (this is how genomes are partitioned to yield permutations)
        2   at the gene level: look for subsequences that have exactly the
            same content BUT in a possibly different order;
      Point 2 motivates the next definition;




                              Anthony Labarre    Permutations in comparative genomics
Introduction
                                                 Searching for similarities
                Comparing signed permutations
                                                 Genome rearrangements
              Comparing unsigned permutations
                                                 Sorting by signed reversals
                            Counting problems
                                                 The breakpoint graph and its uses
                             Median problems
                                                 Selected results and open problems
                                   Conclusions


Common intervals

  Definition
                                                                ±
  A common interval of permutations π 1 , π 2 , . . . , π m in Sn is a
  subset of {1, 2, . . . , n} whose elements form a substring of
  π 1 , π 2 , . . . , π m (up to reordering and sign changes).

  Biological motivation: genes that stuck together in an ancestor
  and in the present species are not likely to have been separated
  during evolution.




                              Anthony Labarre    Permutations in comparative genomics
Introduction
                                                 Searching for similarities
                Comparing signed permutations
                                                 Genome rearrangements
              Comparing unsigned permutations
                                                 Sorting by signed reversals
                            Counting problems
                                                 The breakpoint graph and its uses
                             Median problems
                                                 Selected results and open problems
                                   Conclusions


Common intervals

  Definition
                                                                ±
  A common interval of permutations π 1 , π 2 , . . . , π m in Sn is a
  subset of {1, 2, . . . , n} whose elements form a substring of
  π 1 , π 2 , . . . , π m (up to reordering and sign changes).

  Biological motivation: genes that stuck together in an ancestor
  and in the present species are not likely to have been separated
  during evolution.
  Example (some common intervals of two given permutations)
    9   −8    4   −5      −6        7       1    2     3        {4, 5, 6, 7, 8, 9}, {1, 2, 3}
    1    2    3   −8       7       −4      −5    6    −9        {4, 5, 8}: NO


                              Anthony Labarre    Permutations in comparative genomics
Introduction
                                               Searching for similarities
              Comparing signed permutations
                                               Genome rearrangements
            Comparing unsigned permutations
                                               Sorting by signed reversals
                          Counting problems
                                               The breakpoint graph and its uses
                           Median problems
                                               Selected results and open problems
                                 Conclusions


Intervals and measures



      Other variants exist (e.g. conserved intervals);
      Measures of (dis)similarity can be built and computed
      efficiently [Bergeron and Stoye, 2006];
         simple to compute;
         biologically relevant;
       × do not correspond to evolutionary events;
      ... which is why we’ll now have a look at “event-based”
      measures;




                            Anthony Labarre    Permutations in comparative genomics
Introduction
                                              Searching for similarities
             Comparing signed permutations
                                              Genome rearrangements
           Comparing unsigned permutations
                                              Sorting by signed reversals
                         Counting problems
                                              The breakpoint graph and its uses
                          Median problems
                                              Selected results and open problems
                                Conclusions


Genome rearrangements



     Genomes evolve by point mutations, but also by means of
     mutations involving whole segments;
     Parsimony principle in biology: a shorter scenario of
     mutations is more likely;
     This motivates the study of the following problem;




                           Anthony Labarre    Permutations in comparative genomics
Introduction
                                                  Searching for similarities
                 Comparing signed permutations
                                                  Genome rearrangements
               Comparing unsigned permutations
                                                  Sorting by signed reversals
                             Counting problems
                                                  The breakpoint graph and its uses
                              Median problems
                                                  Selected results and open problems
                                    Conclusions


Genome rearrangements

  Problem (pairwise genome rearrangement problem)
                               ±                              ±
  Given: permutations π, σ in Sn , and a generating set S of Sn .
  Find: a sequence of t elements of S that:
    1   transforms π into σ (or conversely), and
    2   such that t is as small as possible.

        This yields a distance dS (·, ·) between permutations;




                               Anthony Labarre    Permutations in comparative genomics
Introduction
                                                  Searching for similarities
                 Comparing signed permutations
                                                  Genome rearrangements
               Comparing unsigned permutations
                                                  Sorting by signed reversals
                             Counting problems
                                                  The breakpoint graph and its uses
                              Median problems
                                                  Selected results and open problems
                                    Conclusions


Genome rearrangements

  Problem (pairwise genome rearrangement problem)
                               ±                              ±
  Given: permutations π, σ in Sn , and a generating set S of Sn .
  Find: a sequence of t elements of S that:
    1   transforms π into σ (or conversely), and
    2   such that t is as small as possible.

        This yields a distance dS (·, ·) between permutations;
        dS (·, ·) is usually left-invariant, so we assume σ = ι;




                               Anthony Labarre    Permutations in comparative genomics
Introduction
                                                  Searching for similarities
                 Comparing signed permutations
                                                  Genome rearrangements
               Comparing unsigned permutations
                                                  Sorting by signed reversals
                             Counting problems
                                                  The breakpoint graph and its uses
                              Median problems
                                                  Selected results and open problems
                                    Conclusions


Genome rearrangements

  Problem (pairwise genome rearrangement problem)
                               ±                              ±
  Given: permutations π, σ in Sn , and a generating set S of Sn .
  Find: a sequence of t elements of S that:
    1   transforms π into σ (or conversely), and
    2   such that t is as small as possible.

        This yields a distance dS (·, ·) between permutations;
        dS (·, ·) is usually left-invariant, so we assume σ = ι;
        S is closed under inverses (x ∈ S ⇔ x −1 ∈ S);



                               Anthony Labarre    Permutations in comparative genomics
Introduction
                                                  Searching for similarities
                 Comparing signed permutations
                                                  Genome rearrangements
               Comparing unsigned permutations
                                                  Sorting by signed reversals
                             Counting problems
                                                  The breakpoint graph and its uses
                              Median problems
                                                  Selected results and open problems
                                    Conclusions


Genome rearrangements

  Problem (pairwise genome rearrangement problem)
                               ±                              ±
  Given: permutations π, σ in Sn , and a generating set S of Sn .
  Find: a sequence of t elements of S that:
    1   transforms π into σ (or conversely), and
    2   such that t is as small as possible.

        This yields a distance dS (·, ·) between permutations;
        dS (·, ·) is usually left-invariant, so we assume σ = ι;
        S is closed under inverses (x ∈ S ⇔ x −1 ∈ S);
        ⇒ find a minimum-length factorisation of π into the product
        of elements of S;

                               Anthony Labarre    Permutations in comparative genomics
Introduction
                                              Searching for similarities
             Comparing signed permutations
                                              Genome rearrangements
           Comparing unsigned permutations
                                              Sorting by signed reversals
                         Counting problems
                                              The breakpoint graph and its uses
                          Median problems
                                              Selected results and open problems
                                Conclusions


Example: sorting by signed reversals
      π=    −5          +1          +2        +4          −7          −3           +6




      σ=    +1          +2          +3        +4          +5          +6           +7



                           Anthony Labarre    Permutations in comparative genomics
Introduction
                                              Searching for similarities
             Comparing signed permutations
                                              Genome rearrangements
           Comparing unsigned permutations
                                              Sorting by signed reversals
                         Counting problems
                                              The breakpoint graph and its uses
                          Median problems
                                              Selected results and open problems
                                Conclusions


Example: sorting by signed reversals
      π=    −5          +1          +2        +4          −7          −3           +6




      σ=    +1          +2          +3        +4          +5          +6           +7



                           Anthony Labarre    Permutations in comparative genomics
Introduction
                                              Searching for similarities
             Comparing signed permutations
                                              Genome rearrangements
           Comparing unsigned permutations
                                              Sorting by signed reversals
                         Counting problems
                                              The breakpoint graph and its uses
                          Median problems
                                              Selected results and open problems
                                Conclusions


Example: sorting by signed reversals
      π=    −5          +1          +2        +4          −7          −3           +6

            −5          +1          +2        −4          −7          −3           +6




      σ=    +1          +2          +3        +4          +5          +6           +7



                           Anthony Labarre    Permutations in comparative genomics
Introduction
                                              Searching for similarities
             Comparing signed permutations
                                              Genome rearrangements
           Comparing unsigned permutations
                                              Sorting by signed reversals
                         Counting problems
                                              The breakpoint graph and its uses
                          Median problems
                                              Selected results and open problems
                                Conclusions


Example: sorting by signed reversals
      π=    −5          +1          +2        +4          −7          −3           +6

            −5          +1          +2        −4          −7          −3           +6




      σ=    +1          +2          +3        +4          +5          +6           +7



                           Anthony Labarre    Permutations in comparative genomics
Introduction
                                              Searching for similarities
             Comparing signed permutations
                                              Genome rearrangements
           Comparing unsigned permutations
                                              Sorting by signed reversals
                         Counting problems
                                              The breakpoint graph and its uses
                          Median problems
                                              Selected results and open problems
                                Conclusions


Example: sorting by signed reversals
      π=    −5          +1          +2        +4          −7          −3           +6

            −5          +1          +2        −4          −7          −3           +6

            −5          +1          +2        −4          −7          −6           +3




      σ=    +1          +2          +3        +4          +5          +6           +7



                           Anthony Labarre    Permutations in comparative genomics
Introduction
                                              Searching for similarities
             Comparing signed permutations
                                              Genome rearrangements
           Comparing unsigned permutations
                                              Sorting by signed reversals
                         Counting problems
                                              The breakpoint graph and its uses
                          Median problems
                                              Selected results and open problems
                                Conclusions


Example: sorting by signed reversals
      π=    −5          +1          +2        +4          −7          −3           +6

            −5          +1          +2        −4          −7          −3           +6

            −5          +1          +2        −4          −7          −6           +3




      σ=    +1          +2          +3        +4          +5          +6           +7



                           Anthony Labarre    Permutations in comparative genomics
Introduction
                                              Searching for similarities
             Comparing signed permutations
                                              Genome rearrangements
           Comparing unsigned permutations
                                              Sorting by signed reversals
                         Counting problems
                                              The breakpoint graph and its uses
                          Median problems
                                              Selected results and open problems
                                Conclusions


Example: sorting by signed reversals
      π=    −5          +1          +2        +4          −7          −3           +6

            −5          +1          +2        −4          −7          −3           +6

            −5          +1          +2        −4          −7          −6           +3

            −5          +1          +2        −4          −3          +6           +7




      σ=    +1          +2          +3        +4          +5          +6           +7



                           Anthony Labarre    Permutations in comparative genomics
Introduction
                                              Searching for similarities
             Comparing signed permutations
                                              Genome rearrangements
           Comparing unsigned permutations
                                              Sorting by signed reversals
                         Counting problems
                                              The breakpoint graph and its uses
                          Median problems
                                              Selected results and open problems
                                Conclusions


Example: sorting by signed reversals
      π=    −5          +1          +2        +4          −7          −3           +6

            −5          +1          +2        −4          −7          −3           +6

            −5          +1          +2        −4          −7          −6           +3

            −5          +1          +2        −4          −3          +6           +7




      σ=    +1          +2          +3        +4          +5          +6           +7



                           Anthony Labarre    Permutations in comparative genomics
Introduction
                                              Searching for similarities
             Comparing signed permutations
                                              Genome rearrangements
           Comparing unsigned permutations
                                              Sorting by signed reversals
                         Counting problems
                                              The breakpoint graph and its uses
                          Median problems
                                              Selected results and open problems
                                Conclusions


Example: sorting by signed reversals
      π=    −5          +1          +2        +4          −7          −3           +6

            −5          +1          +2        −4          −7          −3           +6

            −5          +1          +2        −4          −7          −6           +3

            −5          +1          +2        −4          −3          +6           +7

            −5          +1          +2        +3          +4          +6           +7




      σ=    +1          +2          +3        +4          +5          +6           +7



                           Anthony Labarre    Permutations in comparative genomics
Introduction
                                              Searching for similarities
             Comparing signed permutations
                                              Genome rearrangements
           Comparing unsigned permutations
                                              Sorting by signed reversals
                         Counting problems
                                              The breakpoint graph and its uses
                          Median problems
                                              Selected results and open problems
                                Conclusions


Example: sorting by signed reversals
      π=    −5          +1          +2        +4          −7          −3           +6

            −5          +1          +2        −4          −7          −3           +6

            −5          +1          +2        −4          −7          −6           +3

            −5          +1          +2        −4          −3          +6           +7

            −5          +1          +2        +3          +4          +6           +7




      σ=    +1          +2          +3        +4          +5          +6           +7



                           Anthony Labarre    Permutations in comparative genomics
Introduction
                                              Searching for similarities
             Comparing signed permutations
                                              Genome rearrangements
           Comparing unsigned permutations
                                              Sorting by signed reversals
                         Counting problems
                                              The breakpoint graph and its uses
                          Median problems
                                              Selected results and open problems
                                Conclusions


Example: sorting by signed reversals
      π=    −5          +1          +2        +4          −7          −3           +6

            −5          +1          +2        −4          −7          −3           +6

            −5          +1          +2        −4          −7          −6           +3

            −5          +1          +2        −4          −3          +6           +7

            −5          +1          +2        +3          +4          +6           +7

            −5          −4          −3        −2          −1          +6           +7

      σ=    +1          +2          +3        +4          +5          +6           +7



                           Anthony Labarre    Permutations in comparative genomics
Introduction
                                              Searching for similarities
             Comparing signed permutations
                                              Genome rearrangements
           Comparing unsigned permutations
                                              Sorting by signed reversals
                         Counting problems
                                              The breakpoint graph and its uses
                          Median problems
                                              Selected results and open problems
                                Conclusions


Example: sorting by signed reversals
      π=    −5          +1          +2        +4          −7          −3           +6

            −5          +1          +2        −4          −7          −3           +6

            −5          +1          +2        −4          −7          −6           +3

            −5          +1          +2        −4          −3          +6           +7

            −5          +1          +2        +3          +4          +6           +7

            −5          −4          −3        −2          −1          +6           +7

      σ=    +1          +2          +3        +4          +5          +6           +7

                                   srd(π, σ) ≤ 6
                           Anthony Labarre    Permutations in comparative genomics
Introduction
                                                Searching for similarities
               Comparing signed permutations
                                                Genome rearrangements
             Comparing unsigned permutations
                                                Sorting by signed reversals
                           Counting problems
                                                The breakpoint graph and its uses
                            Median problems
                                                Selected results and open problems
                                  Conclusions


Computing genome rearrangement distances


     Proving upper bounds is “easy” (find a sequence that works);
     But how do we guarantee that a given sequence is optimal?
     We usually rely on variants of the breakpoint graph, first
     introduced by [Bafna and Pevzner, 1996];
     That structure proved very useful in:
       1   obtaining bounds and approximations;
       2   computing distances exactly in polynomial time;
       3   proving complexity results;




                             Anthony Labarre    Permutations in comparative genomics
Introduction
                                              Searching for similarities
             Comparing signed permutations
                                              Genome rearrangements
           Comparing unsigned permutations
                                              Sorting by signed reversals
                         Counting problems
                                              The breakpoint graph and its uses
                          Median problems
                                              Selected results and open problems
                                Conclusions


The breakpoint graph
     Going back to our example:
     π=    −5          +1          +2         +4        −7          −3             +6




                            Anthony Labarre   Permutations in comparative genomics
Introduction
                                                 Searching for similarities
                Comparing signed permutations
                                                 Genome rearrangements
              Comparing unsigned permutations
                                                 Sorting by signed reversals
                            Counting problems
                                                 The breakpoint graph and its uses
                             Median problems
                                                 Selected results and open problems
                                   Conclusions


The breakpoint graph
        Going back to our example:
    π=       −5     +1     +2     +4      −7      −3      +6
    π = 0 10    9 1    2 3    4 7    8 14    13 6    5 11    12 15

    1    double π’s elements
         (i → {2|i| − 1, 2|i|}) and
         add 0 and 2n + 1




                              Anthony Labarre    Permutations in comparative genomics
Introduction
                                                 Searching for similarities
                Comparing signed permutations
                                                 Genome rearrangements
              Comparing unsigned permutations
                                                 Sorting by signed reversals
                            Counting problems
                                                 The breakpoint graph and its uses
                             Median problems
                                                 Selected results and open problems
                                   Conclusions


The breakpoint graph
        Going back to our example:
    π=       −5     +1     +2     +4      −7      −3      +6
    π = 0 10    9 1    2 3    4 7    8 14    13 6    5 11    12 15

    1    double π’s elements                                       π14 π15              π0
                                                                        15
         (i → {2|i| − 1, 2|i|}) and                           π13 12                    0     π1
                                                                 11                          10
         add 0 and 2n + 1                               π12                                        π2
                                                               5                               9
    2    elements of π = vertices
                                                       π11 6                                       1 π3

                                                        π10 13                                 2π
                                                                                                     4
                                                                   14                        3
                                                              π9       8 7              4     π5
                                                                     π8 π               π6
                                                                          7

                              Anthony Labarre    Permutations in comparative genomics
Introduction
                                                 Searching for similarities
                Comparing signed permutations
                                                 Genome rearrangements
              Comparing unsigned permutations
                                                 Sorting by signed reversals
                            Counting problems
                                                 The breakpoint graph and its uses
                             Median problems
                                                 Selected results and open problems
                                   Conclusions


The breakpoint graph
        Going back to our example:
    π=       −5     +1     +2     +4      −7      −3      +6
    π = 0 10    9 1    2 3    4 7    8 14    13 6    5 11    12 15

    1    double π’s elements                                       π14 π15              π0
                                                                        15
         (i → {2|i| − 1, 2|i|}) and                           π13 12                    0     π1
                                                                 11                          10
         add 0 and 2n + 1                               π12                                        π2
                                                               5                               9
    2    elements of π = vertices
                                                       π11 6                                       1 π3
    3    black edges connect
         distinct adjacent genes                        π10 13                                 2π
                                                                                                     4
                                                                   14                        3
                                                              π9       8 7              4     π5
                                                                     π8 π               π6
                                                                          7

                              Anthony Labarre    Permutations in comparative genomics
Introduction
                                                 Searching for similarities
                Comparing signed permutations
                                                 Genome rearrangements
              Comparing unsigned permutations
                                                 Sorting by signed reversals
                            Counting problems
                                                 The breakpoint graph and its uses
                             Median problems
                                                 Selected results and open problems
                                   Conclusions


The breakpoint graph
        Going back to our example:
    π=       −5     +1     +2     +4      −7      −3      +6
    π = 0 10    9 1    2 3    4 7    8 14    13 6    5 11    12 15

    1    double π’s elements                                       π14 π15              π0
                                                                        15
         (i → {2|i| − 1, 2|i|}) and                           π13 12                    0     π1
                                                                 11                          10
         add 0 and 2n + 1                               π12                                        π2
                                                               5                               9
    2    elements of π = vertices
                                                       π11 6                                       1 π3
    3    black edges connect
         distinct adjacent genes                        π10 13                                 2π
                                                                                                     4
    4    grey edges connect distinct                               14                        3
                                                              π9       8 7              4     π5
         consecutive genes
                                                                     π8 π               π6
                                                                          7

                              Anthony Labarre    Permutations in comparative genomics
Introduction
                                                         Searching for similarities
                       Comparing signed permutations
                                                         Genome rearrangements
                     Comparing unsigned permutations
                                                         Sorting by signed reversals
                                   Counting problems
                                                         The breakpoint graph and its uses
                                    Median problems
                                                         Selected results and open problems
                                          Conclusions


Using the breakpoint graph

      The breakpoint graph is 2-regular and decomposes as such
      into alternating cycles in a unique way;
      The breakpoint graph of 1 2 · · · n contains the largest
      number of cycles:
                            15                                                 15
                      12           0                                     14           0
                11                       10                         13                        1

            5                                 9                12                                 2

        6                                         1           11                                      3

        13                                    2                10                                 4

                14                        3                         9                         5
                       8           4                                     8            6
                             7                                                  7

      ⇒ goal: create new cycles in as few moves as possible;

                                       Anthony Labarre   Permutations in comparative genomics
Introduction
                                               Searching for similarities
              Comparing signed permutations
                                               Genome rearrangements
            Comparing unsigned permutations
                                               Sorting by signed reversals
                          Counting problems
                                               The breakpoint graph and its uses
                           Median problems
                                               Selected results and open problems
                                 Conclusions


A lower bound on the signed reversal distance

      A signed reversal involves black edges belonging to at most
      two cycles;




                            Anthony Labarre    Permutations in comparative genomics
Introduction
                                                  Searching for similarities
                 Comparing signed permutations
                                                  Genome rearrangements
               Comparing unsigned permutations
                                                  Sorting by signed reversals
                             Counting problems
                                                  The breakpoint graph and its uses
                              Median problems
                                                  Selected results and open problems
                                    Conclusions


A lower bound on the signed reversal distance

       A signed reversal involves black edges belonging to at most
       two cycles;
       The only way to increase c(BG (π)) is to split cycles:



               ···                                                         ···
   π2i π2i+1              π2j π2j+1                     π2i     π2j                    π2i+1 π2j+1




                               Anthony Labarre    Permutations in comparative genomics
Introduction
                                                  Searching for similarities
                 Comparing signed permutations
                                                  Genome rearrangements
               Comparing unsigned permutations
                                                  Sorting by signed reversals
                             Counting problems
                                                  The breakpoint graph and its uses
                              Median problems
                                                  Selected results and open problems
                                    Conclusions


A lower bound on the signed reversal distance

       A signed reversal involves black edges belonging to at most
       two cycles;
       The only way to increase c(BG (π)) is to split cycles:



               ···                                                         ···
   π2i π2i+1              π2j π2j+1                     π2i     π2j                    π2i+1 π2j+1



                                ±
       Therefore, for all π in Sn :

                              srd(π) ≥ n + 1 − c(BG (π)).

                               Anthony Labarre    Permutations in comparative genomics
Introduction
                                                Searching for similarities
               Comparing signed permutations
                                                Genome rearrangements
             Comparing unsigned permutations
                                                Sorting by signed reversals
                           Counting problems
                                                The breakpoint graph and its uses
                            Median problems
                                                Selected results and open problems
                                  Conclusions


An exact formula for computing the signed reversal distance

      It is not always possible to split cycles;




                             Anthony Labarre    Permutations in comparative genomics
Introduction
                                                                     Searching for similarities
                        Comparing signed permutations
                                                                     Genome rearrangements
                      Comparing unsigned permutations
                                                                     Sorting by signed reversals
                                    Counting problems
                                                                     The breakpoint graph and its uses
                                     Median problems
                                                                     Selected results and open problems
                                           Conclusions


An exact formula for computing the signed reversal distance

            It is not always possible to split cycles;
            Worse: some collections of cycles in the breakpoint graph,
            called hurdles, each require an additional move;




   0   13    14   9   10   4   3    6     5   11    12     1     2      7   8    15   16   21     22     19     20    17       18   23
                                    splittable ⇒ component is “ok”                              hurdle (no splittable cycle)




                                        Anthony Labarre              Permutations in comparative genomics
Introduction
                                                                     Searching for similarities
                        Comparing signed permutations
                                                                     Genome rearrangements
                      Comparing unsigned permutations
                                                                     Sorting by signed reversals
                                    Counting problems
                                                                     The breakpoint graph and its uses
                                     Median problems
                                                                     Selected results and open problems
                                           Conclusions


An exact formula for computing the signed reversal distance

            It is not always possible to split cycles;
            Worse: some collections of cycles in the breakpoint graph,
            called hurdles, each require an additional move;




   0   13    14   9   10   4   3    2     1   12    11     5     6      7   8    15   16   21     22     19     20    17       18   23
                                    splittable ⇒ component is “ok”                              hurdle (no splittable cycle)




                                        Anthony Labarre              Permutations in comparative genomics
Introduction
                                                 Searching for similarities
                Comparing signed permutations
                                                 Genome rearrangements
              Comparing unsigned permutations
                                                 Sorting by signed reversals
                            Counting problems
                                                 The breakpoint graph and its uses
                             Median problems
                                                 Selected results and open problems
                                   Conclusions


An exact formula for computing the signed reversal distance

       It is not always possible to split cycles;
       Worse: some collections of cycles in the breakpoint graph,
       called hurdles, each require an additional move;


  Theorem ([Hannenhalli and Pevzner, 1999])
                ±
  For all π in Sn :

      srd(π) = n + 1 − c(BG (π)) + h(BG (π)) + f (BG (π)) .
                                                 number of hurdles    special “fortress” case




                              Anthony Labarre    Permutations in comparative genomics
Introduction
                                                 Searching for similarities
                Comparing signed permutations
                                                 Genome rearrangements
              Comparing unsigned permutations
                                                 Sorting by signed reversals
                            Counting problems
                                                 The breakpoint graph and its uses
                             Median problems
                                                 Selected results and open problems
                                   Conclusions


An exact formula for computing the signed reversal distance

       It is not always possible to split cycles;
       Worse: some collections of cycles in the breakpoint graph,
       called hurdles, each require an additional move;


  Theorem ([Hannenhalli and Pevzner, 1999])
                ±
  For all π in Sn :

      srd(π) = n + 1 − c(BG (π)) + h(BG (π)) + f (BG (π)) .
                                                 number of hurdles    special “fortress” case



       Computing srd(·) / sorting can be done in polynomial time;

                              Anthony Labarre    Permutations in comparative genomics
Introduction
                                                Searching for similarities
               Comparing signed permutations
                                                Genome rearrangements
             Comparing unsigned permutations
                                                Sorting by signed reversals
                           Counting problems
                                                The breakpoint graph and its uses
                            Median problems
                                                Selected results and open problems
                                  Conclusions


Integrating other constraints: “perfection”

      (recall Mathilde Bouvel’s talk)
      As seen before, some genes “stick together”;
  Problem (perfect sorting by signed reversals)
                             ±
  Given: a permutation π in Sn , and a set S of intervals of π.
  Find: a minimum-length sequence of signed reversals that sorts π
         and whose elements do not overlap with intervals from S.

  Example (not sorting sequences)
                      −3 2 − 1 4 5              −3 2 − 1 4 5
                            invalid                     valid

                             Anthony Labarre    Permutations in comparative genomics
Introduction
                                                Searching for similarities
               Comparing signed permutations
                                                Genome rearrangements
             Comparing unsigned permutations
                                                Sorting by signed reversals
                           Counting problems
                                                The breakpoint graph and its uses
                            Median problems
                                                Selected results and open problems
                                  Conclusions


A more general view: double cut-and-join


      The “double-cut-and-join” (DCJ) operation is defined directly
      on the breakpoint graph:
      Idea: cut two black edges and join their endpoints;
           Simulates signed reversals and block-interchanges (2 DCJs for
           the latter);
           ⇒ sorting by DCJs ≡ sorting by signed reversals with weight 1
           and block-interchanges with weight 2;

  Theorem ([Yancopoulos et al., 2005])
                ±
  For any π in Sn : dcj(π) = n + 1 − c(BG (π)).



                             Anthony Labarre    Permutations in comparative genomics
Introduction
                                                       Searching for similarities
                     Comparing signed permutations
                                                       Genome rearrangements
                   Comparing unsigned permutations
                                                       Sorting by signed reversals
                                 Counting problems
                                                       The breakpoint graph and its uses
                                  Median problems
                                                       Selected results and open problems
                                        Conclusions


Some results and some questions (non exhaustive)

        What has been done:
   Operation                       Sorting                 Distance                 Best approximation
   signed reversal           O(n3/2 ) [Han, 2006] O(n) [Bader et al., 2001]                 1
   perfect signed reversal          NP-hard [Figeac and Varr´, 2004]
                                                             e                              ?
   prefix signed reversal               ?                       ?                2 [Cohen and Blum, 1995]
   double cut and join               O(n) [Yancopoulos et al., 2005]                        1

        What could be done:
                Prefix signed reversals:
                   1   complexity of sorting / computing the distance?
                   2   largest value the distance can reach?
                   3   “better-than-2”-approximation?
                Characterise “hard instances” using pattern
                matching/avoidance;


                                   Anthony Labarre     Permutations in comparative genomics
Introduction
             Comparing signed permutations
                                              An “unsigned variant” of the breakpoint graph
           Comparing unsigned permutations
                                              Lower bounds on unsigned distances
                         Counting problems
                                              Selected results and open problems
                          Median problems
                                Conclusions


Unsigned permutations



     In some situations, gene orientation may be unknown or can
     be disregarded;
     Genomes are then modelled by the more traditional unsigned
     permutations;
     Of course, rearrangements in that setting do not affect
     orientation;




                           Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                An “unsigned variant” of the breakpoint graph
             Comparing unsigned permutations
                                                Lower bounds on unsigned distances
                           Counting problems
                                                Selected results and open problems
                            Median problems
                                  Conclusions


The breakpoint graph in the unsigned case

      More direct construction than in the signed case;
      Let’s build the breakpoint graph of π = 4 1 6 2 5 7 3 :




                             Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                An “unsigned variant” of the breakpoint graph
             Comparing unsigned permutations
                                                Lower bounds on unsigned distances
                           Counting problems
                                                Selected results and open problems
                            Median problems
                                  Conclusions


The breakpoint graph in the unsigned case

      More direct construction than in the signed case;
      Let’s build the breakpoint graph of π = 4 1 6 2 5 7 3 :
                   0
                                                 1   build the ordered vertex set
         3                      4                    (π0 = 0, π1 , π2 , . . . , πn );


     7                               1


         5                      6
                   2


                             Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                An “unsigned variant” of the breakpoint graph
             Comparing unsigned permutations
                                                Lower bounds on unsigned distances
                           Counting problems
                                                Selected results and open problems
                            Median problems
                                  Conclusions


The breakpoint graph in the unsigned case

      More direct construction than in the signed case;
      Let’s build the breakpoint graph of π = 4 1 6 2 5 7 3 :
                   0
                                                 1   build the ordered vertex set
         3                      4                    (π0 = 0, π1 , π2 , . . . , πn );
                                                 2   add black arcs for every
     7                               1               ordered pair
                                                     (πi , πi−1 (mod n+1) );

         5                      6
                   2


                             Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                An “unsigned variant” of the breakpoint graph
             Comparing unsigned permutations
                                                Lower bounds on unsigned distances
                           Counting problems
                                                Selected results and open problems
                            Median problems
                                  Conclusions


The breakpoint graph in the unsigned case

      More direct construction than in the signed case;
      Let’s build the breakpoint graph of π = 4 1 6 2 5 7 3 :
                   0
                                                 1   build the ordered vertex set
         3                      4                    (π0 = 0, π1 , π2 , . . . , πn );
                                                 2   add black arcs for every
     7                               1               ordered pair
                                                     (πi , πi−1 (mod n+1) );
                                                 3   add grey arcs for every ordered
         5                      6                    pair (i, i + 1 (mod n + 1));
                   2


                             Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                An “unsigned variant” of the breakpoint graph
             Comparing unsigned permutations
                                                Lower bounds on unsigned distances
                           Counting problems
                                                Selected results and open problems
                            Median problems
                                  Conclusions


The breakpoint graph in the unsigned case

      More direct construction than in the signed case;
      Let’s build the breakpoint graph of π = 4 1 6 2 5 7 3 :
                   0
                                                 1   build the ordered vertex set
         3                      4                    (π0 = 0, π1 , π2 , . . . , πn );
                                                 2   add black arcs for every
     7                               1               ordered pair
                                                     (πi , πi−1 (mod n+1) );
                                                 3   add grey arcs for every ordered
         5                      6                    pair (i, i + 1 (mod n + 1));
                   2
     BG (π) decomposes in a unique way into alternating cycles
                             Anthony Labarre    Permutations in comparative genomics
Introduction
                Comparing signed permutations
                                                 An “unsigned variant” of the breakpoint graph
              Comparing unsigned permutations
                                                 Lower bounds on unsigned distances
                            Counting problems
                                                 Selected results and open problems
                             Median problems
                                   Conclusions


Breakpoint graphs as permutations

       Nice property of BG (π): its alternating cycle decomposition
       matches the traditional disjoint cycle decomposition of

                   π = (0, 1, 2, . . . , n) ◦ (0, πn , πn−1 , . . . , π1 );

       As a consequence, we can express the action of any
       rearrangement σ on π using π:

  Lemma ([Labarre, 2012])
  For all π, σ in Sn , we have π ◦ σ = π ◦ σ π .

       ... and we can recycle known results in this context;

                              Anthony Labarre    Permutations in comparative genomics
Introduction
                Comparing signed permutations
                                                 An “unsigned variant” of the breakpoint graph
              Comparing unsigned permutations
                                                 Lower bounds on unsigned distances
                            Counting problems
                                                 Selected results and open problems
                             Median problems
                                   Conclusions


Obtaining lower bounds

      Lower bounds can be automatically obtained as follows:
                          π            π ◦ x1    π ◦ x1 ◦ x2                             ι
    Initial problem:                                                   ···
   computing dS (π):




                              Anthony Labarre    Permutations in comparative genomics
Introduction
                 Comparing signed permutations
                                                  An “unsigned variant” of the breakpoint graph
               Comparing unsigned permutations
                                                  Lower bounds on unsigned distances
                             Counting problems
                                                  Selected results and open problems
                              Median problems
                                    Conclusions


Obtaining lower bounds

       Lower bounds can be automatically obtained as follows:
                           π            π ◦ x1    π ◦ x1 ◦ x2                             ι
    Initial problem:                                                    ···
   computing dS (π):


      New problem:
    computing dC (π):      π                                                             ι=ι




                               Anthony Labarre    Permutations in comparative genomics
Introduction
                 Comparing signed permutations
                                                  An “unsigned variant” of the breakpoint graph
               Comparing unsigned permutations
                                                  Lower bounds on unsigned distances
                             Counting problems
                                                  Selected results and open problems
                              Median problems
                                    Conclusions


Obtaining lower bounds

       Lower bounds can be automatically obtained as follows:
                           π            π ◦ x1    π ◦ x1 ◦ x2                             ι
    Initial problem:                                                    ···
   computing dS (π):
     dC (π) ≤ dS (π)

      New problem:                                                      ···
    computing dC (π):      π            π ◦ y1    π ◦ y1 ◦ y2                            ι=ι




                               Anthony Labarre    Permutations in comparative genomics
Introduction
                 Comparing signed permutations
                                                  An “unsigned variant” of the breakpoint graph
               Comparing unsigned permutations
                                                  Lower bounds on unsigned distances
                             Counting problems
                                                  Selected results and open problems
                              Median problems
                                    Conclusions


Obtaining lower bounds

       Lower bounds can be automatically obtained as follows:
                           π            π ◦ x1    π ◦ x1 ◦ x2                             ι
    Initial problem:                                                    ···
   computing dS (π):
     dC (π) ≤ dS (π)

      New problem:                                                      ···
    computing dC (π):      π            π ◦ y1    π ◦ y1 ◦ y2                            ι=ι

       This yields tight lower bounds on:
         1   the block-interchange distance [Christie, 1996];
         2   the transposition distance [Bafna and Pevzner, 1998];
         3   the prefix transposition distance [Labarre, 2012];


                               Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                An “unsigned variant” of the breakpoint graph
             Comparing unsigned permutations
                                                Lower bounds on unsigned distances
                           Counting problems
                                                Selected results and open problems
                            Median problems
                                  Conclusions


Three examples

     How one can obtain bounds on the following distances:
       1   block-interchange distance (bid(·)):




                             Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                An “unsigned variant” of the breakpoint graph
             Comparing unsigned permutations
                                                Lower bounds on unsigned distances
                           Counting problems
                                                Selected results and open problems
                            Median problems
                                  Conclusions


Three examples

     How one can obtain bounds on the following distances:
       1   block-interchange distance (bid(·)):
             1   block-interchange = pair of 2-cycles;




                             Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                An “unsigned variant” of the breakpoint graph
             Comparing unsigned permutations
                                                Lower bounds on unsigned distances
                           Counting problems
                                                Selected results and open problems
                            Median problems
                                  Conclusions


Three examples

     How one can obtain bounds on the following distances:
       1   block-interchange distance (bid(·)):
             1   block-interchange = pair of 2-cycles;
             2   bid(π) ≥ dC (π) = n+1−c(BG (π)) ;
                                        2




                             Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                An “unsigned variant” of the breakpoint graph
             Comparing unsigned permutations
                                                Lower bounds on unsigned distances
                           Counting problems
                                                Selected results and open problems
                            Median problems
                                  Conclusions


Three examples

     How one can obtain bounds on the following distances:
       1   block-interchange distance (bid(·)):
             1   block-interchange = pair of 2-cycles;
             2   bid(π) ≥ dC (π) = n+1−c(BG (π)) ;
                                        2
       2   transposition distance (td(·)):




                             Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                An “unsigned variant” of the breakpoint graph
             Comparing unsigned permutations
                                                Lower bounds on unsigned distances
                           Counting problems
                                                Selected results and open problems
                            Median problems
                                  Conclusions


Three examples

     How one can obtain bounds on the following distances:
       1   block-interchange distance (bid(·)):
             1   block-interchange = pair of 2-cycles;
             2   bid(π) ≥ dC (π) = n+1−c(BG (π)) ;
                                        2
       2   transposition distance (td(·)):
             1   transposition = 3-cycle;




                             Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                An “unsigned variant” of the breakpoint graph
             Comparing unsigned permutations
                                                Lower bounds on unsigned distances
                           Counting problems
                                                Selected results and open problems
                            Median problems
                                  Conclusions


Three examples

     How one can obtain bounds on the following distances:
       1   block-interchange distance (bid(·)):
             1   block-interchange = pair of 2-cycles;
             2   bid(π) ≥ dC (π) = n+1−c(BG (π)) ;
                                        2
       2   transposition distance (td(·)):
             1   transposition = 3-cycle;
             2   td(π) ≥ dC (π) = n+1−codd (BG (π)) ;
                                          2




                             Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                An “unsigned variant” of the breakpoint graph
             Comparing unsigned permutations
                                                Lower bounds on unsigned distances
                           Counting problems
                                                Selected results and open problems
                            Median problems
                                  Conclusions


Three examples

     How one can obtain bounds on the following distances:
       1   block-interchange distance (bid(·)):
             1   block-interchange = pair of 2-cycles;
             2   bid(π) ≥ dC (π) = n+1−c(BG (π)) ;
                                        2
       2   transposition distance (td(·)):
             1   transposition = 3-cycle;
             2   td(π) ≥ dC (π) = n+1−codd (BG (π)) ;
                                          2
       3   prefix transposition distance (ptd(·)):




                             Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                An “unsigned variant” of the breakpoint graph
             Comparing unsigned permutations
                                                Lower bounds on unsigned distances
                           Counting problems
                                                Selected results and open problems
                            Median problems
                                  Conclusions


Three examples

     How one can obtain bounds on the following distances:
       1   block-interchange distance (bid(·)):
             1   block-interchange = pair of 2-cycles;
             2   bid(π) ≥ dC (π) = n+1−c(BG (π)) ;
                                        2
       2   transposition distance (td(·)):
             1   transposition = 3-cycle;
             2   td(π) ≥ dC (π) = n+1−codd (BG (π)) ;
                                          2
       3   prefix transposition distance (ptd(·)):
             1   prefix transposition = 3-cycle containing 0;




                             Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                An “unsigned variant” of the breakpoint graph
             Comparing unsigned permutations
                                                Lower bounds on unsigned distances
                           Counting problems
                                                Selected results and open problems
                            Median problems
                                  Conclusions


Three examples

     How one can obtain bounds on the following distances:
       1   block-interchange distance (bid(·)):
             1   block-interchange = pair of 2-cycles;
             2   bid(π) ≥ dC (π) = n+1−c(BG (π)) ;
                                        2
       2   transposition distance (td(·)):
             1   transposition = 3-cycle;
             2   td(π) ≥ dC (π) = n+1−codd (BG (π)) ;
                                          2
       3   prefix transposition distance (ptd(·)):
             1   prefix transposition = 3-cycle containing 0;
             2   ptd(π) ≥ dC (π) =
                 n+1+c(BG (π))                   0 if π1 = 1,
                      2
                               − 2c1 (BG (π)) −               ;
                                                 1 otherwise.



                             Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                An “unsigned variant” of the breakpoint graph
             Comparing unsigned permutations
                                                Lower bounds on unsigned distances
                           Counting problems
                                                Selected results and open problems
                            Median problems
                                  Conclusions


A word of caution



     Don’t think that sorting unsigned permutations is trivial;
     bid(·) and exc(·) are indeed easy to compute, but:
       1   sorting   by   (prefix or arbitrary) reversals becomes NP-hard;
       2   sorting   by   transpositions is NP-hard;
       3   sorting   by   double cut-and-joins is NP-hard;
       4   sorting   by   prefix transpositions is open;




                             Anthony Labarre    Permutations in comparative genomics
Introduction
                      Comparing signed permutations
                                                       An “unsigned variant” of the breakpoint graph
                    Comparing unsigned permutations
                                                       Lower bounds on unsigned distances
                                  Counting problems
                                                       Selected results and open problems
                                   Median problems
                                         Conclusions


Results on sorting unsigned permutations

           What has been done:
           Operation              Sorting        Distance                   Best approximation
           exchange                     O(n) [Knuth, 1995]                           1
           block-interchange           O(n) [Christie, 1996]                         1
           double cut-and-joins        NP-hard [Chen, 2010]                          ?
           reversal                  NP-hard [Caprara, 1999b]           11/8 [Berman et al., 2002]
           transposition          NP-hard [Bulteau et al., 2011b]     11/8 [Elias and Hartman, 2006]
           exchange                  O(n) [Akers et al., 1987]                       1
   prefix




           reversal               NP-hard [Bulteau et al., 2011a]     2 [Fischer and Ginzinger, 2005]
           transposition             ?                ?                 2 [Dias and Meidanis, 2002]

           What could be done:
                 better approximations;
                 complexity of prefix transposition problems?


                                    Anthony Labarre    Permutations in comparative genomics
Introduction
              Comparing signed permutations
                                               Hultman numbers
            Comparing unsigned permutations
                                               Approximating distance distributions
                          Counting problems
                                               “Listing” problems
                           Median problems
                                 Conclusions


Counting problems



     What is the distribution of a given rearrangement distance?
     Most tight bounds on rearrangement distances are obtained
     using the breakpoint graph and its cycles;
     Use the distribution of cycles in the breakpoint graph to
     approximately answer the above;




                            Anthony Labarre    Permutations in comparative genomics
Introduction
              Comparing signed permutations
                                               Hultman numbers
            Comparing unsigned permutations
                                               Approximating distance distributions
                          Counting problems
                                               “Listing” problems
                           Median problems
                                 Conclusions


Hultman numbers

     Hultman numbers count permutations whose breakpoint
     graph contains k cycles:

       SH (n, k) = |{π ∈ Sn | c(BG (π)) = k}|                            unsigned case
        ±                 ±
       SH (n, k) = |{π ∈ Sn | c(BG (π)) = k}|                            signed case

     Similar in spirit to Stirling numbers of the first kind;
     Explicit formulas are available for computing SH (n, k) and
      ±
     SH (n, k) (see e.g. [Grusea and Labarre, 2011]);
     Also available: generating functions, expected value and
     variance;


                            Anthony Labarre    Permutations in comparative genomics
Introduction
                                    Comparing signed permutations
                                                                                                  Hultman numbers
                                  Comparing unsigned permutations
                                                                                                  Approximating distance distributions
                                                Counting problems
                                                                                                  “Listing” problems
                                                 Median problems
                                                       Conclusions


Approximating distance distributions using Hultman numbers

                                                unsigned permutations                                                           signed permutations
            ·109                                                  ·109                                                   ·109
                       |td(·) = k|                            3             |ptd(·) = k|                           1.5            |psrd(·) = k|
                   SH (n, n + 1 − 2k)                                    SH (n, n + 2 − 2k)                                      ±
        3                                                                                                                       SH (n, n + 3 − k)

                      (n = 13)                                              (n = 13)                                                (n = 10)
                                                              2                                                      1
        2


                                                              1                                                    0.5
        1


        0                                                     0                                                      0
              0           2         4       6        8              0         2        4      6      8       10             0           5           10       15

            ·109                                                  ·109                                                   ·109
        3              |rd(·) = k|                            3             |bid(·) = k|                           1.5            |srd(·) = k|
                   SH (n, n + 3 − 2k)                                    SH (n, n + 1 − 2k)                                      ±
                                                                                                                                SH (n, n + 1 − k)

                      (n = 13)                                              (n = 13)                                               (n = 10)
        2                                                     2                                                     1


                                                                    distributions match                            0.5    distributions are close
        1                                                     1



        0                                                     0                                                     0
              0       2       4     6   8       10   12             0             2           4          6                 0        2       4       6    8    10   12



                                                 (distance distributions from [Galv˜o and Dias, 2011])
                                                                                   a


                                                          Anthony Labarre                         Permutations in comparative genomics
Introduction
              Comparing signed permutations
                                               Hultman numbers
            Comparing unsigned permutations
                                               Approximating distance distributions
                          Counting problems
                                               “Listing” problems
                           Median problems
                                 Conclusions


Experiments wrap-up and perspectives



     Some distance distributions are extremely well approximated
     using (some function of) the Hultman numbers;
     Can bounds be tightened by trying to minimise the difference
     between the distributions?
     Can the proximity of distributions be used to argue that exact
     computation of hard distances is overrated?




                            Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                Hultman numbers
             Comparing unsigned permutations
                                                Approximating distance distributions
                           Counting problems
                                                “Listing” problems
                            Median problems
                                  Conclusions


Listing optimal sequences


      Distances are informative;




                             Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                Hultman numbers
             Comparing unsigned permutations
                                                Approximating distance distributions
                           Counting problems
                                                “Listing” problems
                            Median problems
                                  Conclusions


Listing optimal sequences


      Distances are informative;
      ... but an actual sorting sequence is more informative;




                             Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                Hultman numbers
             Comparing unsigned permutations
                                                Approximating distance distributions
                           Counting problems
                                                “Listing” problems
                            Median problems
                                  Conclusions


Listing optimal sequences


      Distances are informative;
      ... but an actual sorting sequence is more informative;
      What if the given sequence we found makes no biological
      sense?




                             Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                Hultman numbers
             Comparing unsigned permutations
                                                Approximating distance distributions
                           Counting problems
                                                “Listing” problems
                            Median problems
                                  Conclusions


Listing optimal sequences


      Distances are informative;
      ... but an actual sorting sequence is more informative;
      What if the given sequence we found makes no biological
      sense?
      ⇒ can we list all optimal sequences for a given instance?




                             Anthony Labarre    Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                Hultman numbers
             Comparing unsigned permutations
                                                Approximating distance distributions
                           Counting problems
                                                “Listing” problems
                            Median problems
                                  Conclusions


Listing optimal sequences


      Distances are informative;
      ... but an actual sorting sequence is more informative;
      What if the given sequence we found makes no biological
      sense?
      ⇒ can we list all optimal sequences for a given instance?
      Efficient algorithms exist for listing:
          all optimal signed reversals [Swenson et al., 2011];
          all optimal sequences [Badr et al., 2011];




                             Anthony Labarre    Permutations in comparative genomics
Introduction
                Comparing signed permutations
                                                      Motivation
              Comparing unsigned permutations
                                                      Bounds
                            Counting problems
                                                      Selected results
                             Median problems
                                   Conclusions


Median problems

      Measures of similarities between genomes are useful in
      reconstructing phylogenies;

  Example (phylogeny from distance matrix)
          a   b   c    d    e                                  c
      a   0   2   3    6    6                                                           2
                                                                                             d
      b   2   0   3    6    6            a        1                1
                                                          1              2
      c   3   3   0    5    5
      d   6   6   5    0    4            b        1
                                                                                        2
      e   6   6   5    4    0                                                                e


      (The matrix must satisfy some conditions [Buneman, 1971]);

                                Anthony Labarre       Permutations in comparative genomics
Introduction
               Comparing signed permutations
                                                Motivation
             Comparing unsigned permutations
                                                Bounds
                           Counting problems
                                                Selected results
                            Median problems
                                  Conclusions


Median problems

      Parsimony again: search for a tree that minimises the total
      number of evolutionary events (i.e. the sum of all edge
      weights);
      In its simplest form, the problem we want to solve is:

  Problem (median of three)
                     ±                   ±     ±
  Given: π, σ, τ in Sn ; a distance d : Sn × Sn → N.
  Find: a permutation µ in Sn   ± that minimises

                  w (µ) = d(π, µ) + d(σ, µ) + d(τ, µ).

      Can be generalised to more than three input permutations;

                             Anthony Labarre    Permutations in comparative genomics
Introduction
              Comparing signed permutations
                                                  Motivation
            Comparing unsigned permutations
                                                  Bounds
                          Counting problems
                                                  Selected results
                           Median problems
                                 Conclusions


Generic bounds [Siepel and Moret, 2001]
      Generic lower and upper bounds for any distance:
                                                  π




                           d(π, σ)                                   d(π, τ )


                                 σ                                    τ

                                               d(σ, τ )




                            Anthony Labarre       Permutations in comparative genomics
Introduction
                Comparing signed permutations
                                                    Motivation
              Comparing unsigned permutations
                                                    Bounds
                            Counting problems
                                                    Selected results
                             Median problems
                                   Conclusions


Generic bounds [Siepel and Moret, 2001]
      Generic lower and upper bounds for any distance:
                                                    π




                             d(π, σ)                                   d(π, τ )


                                   σ                                    τ

                                                 d(σ, τ )
                             if µ=π                     if µ=σ                    if µ=τ

      w (µ) ≤ min{d(π, σ) + d(π, τ ), d(π, σ) + d(σ, τ ), d(π, τ ) + d(σ, τ )}.



                              Anthony Labarre       Permutations in comparative genomics
Comparing Genomes through Permutations
Comparing Genomes through Permutations
Comparing Genomes through Permutations
Comparing Genomes through Permutations
Comparing Genomes through Permutations
Comparing Genomes through Permutations
Comparing Genomes through Permutations
Comparing Genomes through Permutations
Comparing Genomes through Permutations
Comparing Genomes through Permutations
Comparing Genomes through Permutations
Comparing Genomes through Permutations
Comparing Genomes through Permutations

More Related Content

More from AlgoPerm 2012

AlgoPerm2012 - 08 Jean Cardinal
AlgoPerm2012 - 08 Jean CardinalAlgoPerm2012 - 08 Jean Cardinal
AlgoPerm2012 - 08 Jean CardinalAlgoPerm 2012
 
AlgoPerm2012 - 07 Mathilde Bouvel
AlgoPerm2012 - 07 Mathilde BouvelAlgoPerm2012 - 07 Mathilde Bouvel
AlgoPerm2012 - 07 Mathilde BouvelAlgoPerm 2012
 
AlgoPerm2012 - 06 Mireille Bousquet-Mélou
AlgoPerm2012 - 06 Mireille Bousquet-MélouAlgoPerm2012 - 06 Mireille Bousquet-Mélou
AlgoPerm2012 - 06 Mireille Bousquet-MélouAlgoPerm 2012
 
AlgoPerm2012 - 05 Ioan Todinca
AlgoPerm2012 - 05 Ioan TodincaAlgoPerm2012 - 05 Ioan Todinca
AlgoPerm2012 - 05 Ioan TodincaAlgoPerm 2012
 
AlgoPerm2012 - 04 Christophe Paul
AlgoPerm2012 - 04 Christophe PaulAlgoPerm2012 - 04 Christophe Paul
AlgoPerm2012 - 04 Christophe PaulAlgoPerm 2012
 
AlgoPerm2012 - 03 Olivier Hudry
AlgoPerm2012 - 03 Olivier HudryAlgoPerm2012 - 03 Olivier Hudry
AlgoPerm2012 - 03 Olivier HudryAlgoPerm 2012
 
AlgoPerm2012 - 02 Rolf Niedermeier
AlgoPerm2012 - 02 Rolf NiedermeierAlgoPerm2012 - 02 Rolf Niedermeier
AlgoPerm2012 - 02 Rolf NiedermeierAlgoPerm 2012
 
AlgoPerm2012 - 01 Rida Laraki
AlgoPerm2012 - 01 Rida LarakiAlgoPerm2012 - 01 Rida Laraki
AlgoPerm2012 - 01 Rida LarakiAlgoPerm 2012
 

More from AlgoPerm 2012 (8)

AlgoPerm2012 - 08 Jean Cardinal
AlgoPerm2012 - 08 Jean CardinalAlgoPerm2012 - 08 Jean Cardinal
AlgoPerm2012 - 08 Jean Cardinal
 
AlgoPerm2012 - 07 Mathilde Bouvel
AlgoPerm2012 - 07 Mathilde BouvelAlgoPerm2012 - 07 Mathilde Bouvel
AlgoPerm2012 - 07 Mathilde Bouvel
 
AlgoPerm2012 - 06 Mireille Bousquet-Mélou
AlgoPerm2012 - 06 Mireille Bousquet-MélouAlgoPerm2012 - 06 Mireille Bousquet-Mélou
AlgoPerm2012 - 06 Mireille Bousquet-Mélou
 
AlgoPerm2012 - 05 Ioan Todinca
AlgoPerm2012 - 05 Ioan TodincaAlgoPerm2012 - 05 Ioan Todinca
AlgoPerm2012 - 05 Ioan Todinca
 
AlgoPerm2012 - 04 Christophe Paul
AlgoPerm2012 - 04 Christophe PaulAlgoPerm2012 - 04 Christophe Paul
AlgoPerm2012 - 04 Christophe Paul
 
AlgoPerm2012 - 03 Olivier Hudry
AlgoPerm2012 - 03 Olivier HudryAlgoPerm2012 - 03 Olivier Hudry
AlgoPerm2012 - 03 Olivier Hudry
 
AlgoPerm2012 - 02 Rolf Niedermeier
AlgoPerm2012 - 02 Rolf NiedermeierAlgoPerm2012 - 02 Rolf Niedermeier
AlgoPerm2012 - 02 Rolf Niedermeier
 
AlgoPerm2012 - 01 Rida Laraki
AlgoPerm2012 - 01 Rida LarakiAlgoPerm2012 - 01 Rida Laraki
AlgoPerm2012 - 01 Rida Laraki
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Comparing Genomes through Permutations

  • 1. Introduction Comparing signed permutations Comparing unsigned permutations Counting problems Median problems Conclusions Permutations in comparative genomics Anthony Labarre Anthony.Labarre@cs.kuleuven.be Katholieke Universiteit Leuven (K.U.L.) February 21st, 2012 Anthony Labarre Permutations in comparative genomics
  • 2. Introduction Comparing signed permutations Motivation Comparing unsigned permutations From genomes to (signed) permutations Counting problems Comparing genomes Median problems Overview of the talk Conclusions A few definitions Deoxyribonucleic acide: double helix of nucleotides (A, C, G, T); Complementarity (A-T, C-G): one strand is enough; Gene = sequence of nucleotides (that codes for a specific protein); Chromosome = ordered set of genes; Genome = set of chromosomes; Goal: compare genomes; Anthony Labarre Permutations in comparative genomics
  • 3. Introduction Comparing signed permutations Motivation Comparing unsigned permutations From genomes to (signed) permutations Counting problems Comparing genomes Median problems Overview of the talk Conclusions Comparing genomes at the nucleotide level Most common comparisons: at the nucleotide level; Example (sequence alignment) S1 : · · · T C C G C C A − − C T A ··· | | | | | | S2 : · · · T C G G A C T G G C − A ··· Matches, substitutions, insertions and deletions; Correspond to mutations; Anthony Labarre Permutations in comparative genomics
  • 4. Introduction Comparing signed permutations Motivation Comparing unsigned permutations From genomes to (signed) permutations Counting problems Comparing genomes Median problems Overview of the talk Conclusions At the gene level Some mutations involve whole segments of nucleotides; Genomes = signed permutations if genomes: 1 are totally ordered sets of genes, and 2 only differ by gene order (no duplications, no deletions). Anthony Labarre Permutations in comparative genomics
  • 5. Introduction Comparing signed permutations Motivation Comparing unsigned permutations From genomes to (signed) permutations Counting problems Comparing genomes Median problems Overview of the talk Conclusions At the gene level Some mutations involve whole segments of nucleotides; Genomes = signed permutations if genomes: 1 are totally ordered sets of genes, and 2 only differ by gene order (no duplications, no deletions). Example (genomes → signed permutations) (A) (B) Anthony Labarre Permutations in comparative genomics
  • 6. Introduction Comparing signed permutations Motivation Comparing unsigned permutations From genomes to (signed) permutations Counting problems Comparing genomes Median problems Overview of the talk Conclusions At the gene level Some mutations involve whole segments of nucleotides; Genomes = signed permutations if genomes: 1 are totally ordered sets of genes, and 2 only differ by gene order (no duplications, no deletions). Example (genomes → signed permutations) (A) +1 +2 +3 +4 +5 +6 +7 (B) Anthony Labarre Permutations in comparative genomics
  • 7. Introduction Comparing signed permutations Motivation Comparing unsigned permutations From genomes to (signed) permutations Counting problems Comparing genomes Median problems Overview of the talk Conclusions At the gene level Some mutations involve whole segments of nucleotides; Genomes = signed permutations if genomes: 1 are totally ordered sets of genes, and 2 only differ by gene order (no duplications, no deletions). Example (genomes → signed permutations) −5 +1 +2 +4 −7 −3 +6 (A) +1 +2 +3 +4 +5 +6 +7 (B) Anthony Labarre Permutations in comparative genomics
  • 8. Introduction Comparing signed permutations Motivation Comparing unsigned permutations From genomes to (signed) permutations Counting problems Comparing genomes Median problems Overview of the talk Conclusions Comparing genomes Two main ways of comparing genomes: 1 by identifying “common” or close content; 2 by measuring their distance according to evolutionary events; Both approaches yield measures of (dis)similarity between permutations; (Many other measures are available, but they’re generally not biologically relevant (see e.g. [Estivill-Castro and Wood, 1992])); Anthony Labarre Permutations in comparative genomics
  • 9. Introduction Comparing signed permutations Motivation Comparing unsigned permutations From genomes to (signed) permutations Counting problems Comparing genomes Median problems Overview of the talk Conclusions What lies ahead I will give an overview of several representative problems: comparing signed and unsigned permutations; enumeration problems, with applications; using comparisons to reconstruct evolution; Links with other interesting areas; Open problems and suggestions for future research; Anthony Labarre Permutations in comparative genomics
  • 10. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Measuring similarity Search for segments that are “alike”; This is a form of approximate pattern matching: 1 at the nucleotide level: look for subsequences that are “almost the same”; (this is how genomes are partitioned to yield permutations) 2 at the gene level: look for subsequences that have exactly the same content BUT in a possibly different order; Point 2 motivates the next definition; Anthony Labarre Permutations in comparative genomics
  • 11. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Common intervals Definition ± A common interval of permutations π 1 , π 2 , . . . , π m in Sn is a subset of {1, 2, . . . , n} whose elements form a substring of π 1 , π 2 , . . . , π m (up to reordering and sign changes). Biological motivation: genes that stuck together in an ancestor and in the present species are not likely to have been separated during evolution. Anthony Labarre Permutations in comparative genomics
  • 12. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Common intervals Definition ± A common interval of permutations π 1 , π 2 , . . . , π m in Sn is a subset of {1, 2, . . . , n} whose elements form a substring of π 1 , π 2 , . . . , π m (up to reordering and sign changes). Biological motivation: genes that stuck together in an ancestor and in the present species are not likely to have been separated during evolution. Example (some common intervals of two given permutations) 9 −8 4 −5 −6 7 1 2 3 {4, 5, 6, 7, 8, 9}, {1, 2, 3} 1 2 3 −8 7 −4 −5 6 −9 {4, 5, 8}: NO Anthony Labarre Permutations in comparative genomics
  • 13. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Intervals and measures Other variants exist (e.g. conserved intervals); Measures of (dis)similarity can be built and computed efficiently [Bergeron and Stoye, 2006]; simple to compute; biologically relevant; × do not correspond to evolutionary events; ... which is why we’ll now have a look at “event-based” measures; Anthony Labarre Permutations in comparative genomics
  • 14. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Genome rearrangements Genomes evolve by point mutations, but also by means of mutations involving whole segments; Parsimony principle in biology: a shorter scenario of mutations is more likely; This motivates the study of the following problem; Anthony Labarre Permutations in comparative genomics
  • 15. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Genome rearrangements Problem (pairwise genome rearrangement problem) ± ± Given: permutations π, σ in Sn , and a generating set S of Sn . Find: a sequence of t elements of S that: 1 transforms π into σ (or conversely), and 2 such that t is as small as possible. This yields a distance dS (·, ·) between permutations; Anthony Labarre Permutations in comparative genomics
  • 16. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Genome rearrangements Problem (pairwise genome rearrangement problem) ± ± Given: permutations π, σ in Sn , and a generating set S of Sn . Find: a sequence of t elements of S that: 1 transforms π into σ (or conversely), and 2 such that t is as small as possible. This yields a distance dS (·, ·) between permutations; dS (·, ·) is usually left-invariant, so we assume σ = ι; Anthony Labarre Permutations in comparative genomics
  • 17. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Genome rearrangements Problem (pairwise genome rearrangement problem) ± ± Given: permutations π, σ in Sn , and a generating set S of Sn . Find: a sequence of t elements of S that: 1 transforms π into σ (or conversely), and 2 such that t is as small as possible. This yields a distance dS (·, ·) between permutations; dS (·, ·) is usually left-invariant, so we assume σ = ι; S is closed under inverses (x ∈ S ⇔ x −1 ∈ S); Anthony Labarre Permutations in comparative genomics
  • 18. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Genome rearrangements Problem (pairwise genome rearrangement problem) ± ± Given: permutations π, σ in Sn , and a generating set S of Sn . Find: a sequence of t elements of S that: 1 transforms π into σ (or conversely), and 2 such that t is as small as possible. This yields a distance dS (·, ·) between permutations; dS (·, ·) is usually left-invariant, so we assume σ = ι; S is closed under inverses (x ∈ S ⇔ x −1 ∈ S); ⇒ find a minimum-length factorisation of π into the product of elements of S; Anthony Labarre Permutations in comparative genomics
  • 19. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Example: sorting by signed reversals π= −5 +1 +2 +4 −7 −3 +6 σ= +1 +2 +3 +4 +5 +6 +7 Anthony Labarre Permutations in comparative genomics
  • 20. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Example: sorting by signed reversals π= −5 +1 +2 +4 −7 −3 +6 σ= +1 +2 +3 +4 +5 +6 +7 Anthony Labarre Permutations in comparative genomics
  • 21. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Example: sorting by signed reversals π= −5 +1 +2 +4 −7 −3 +6 −5 +1 +2 −4 −7 −3 +6 σ= +1 +2 +3 +4 +5 +6 +7 Anthony Labarre Permutations in comparative genomics
  • 22. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Example: sorting by signed reversals π= −5 +1 +2 +4 −7 −3 +6 −5 +1 +2 −4 −7 −3 +6 σ= +1 +2 +3 +4 +5 +6 +7 Anthony Labarre Permutations in comparative genomics
  • 23. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Example: sorting by signed reversals π= −5 +1 +2 +4 −7 −3 +6 −5 +1 +2 −4 −7 −3 +6 −5 +1 +2 −4 −7 −6 +3 σ= +1 +2 +3 +4 +5 +6 +7 Anthony Labarre Permutations in comparative genomics
  • 24. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Example: sorting by signed reversals π= −5 +1 +2 +4 −7 −3 +6 −5 +1 +2 −4 −7 −3 +6 −5 +1 +2 −4 −7 −6 +3 σ= +1 +2 +3 +4 +5 +6 +7 Anthony Labarre Permutations in comparative genomics
  • 25. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Example: sorting by signed reversals π= −5 +1 +2 +4 −7 −3 +6 −5 +1 +2 −4 −7 −3 +6 −5 +1 +2 −4 −7 −6 +3 −5 +1 +2 −4 −3 +6 +7 σ= +1 +2 +3 +4 +5 +6 +7 Anthony Labarre Permutations in comparative genomics
  • 26. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Example: sorting by signed reversals π= −5 +1 +2 +4 −7 −3 +6 −5 +1 +2 −4 −7 −3 +6 −5 +1 +2 −4 −7 −6 +3 −5 +1 +2 −4 −3 +6 +7 σ= +1 +2 +3 +4 +5 +6 +7 Anthony Labarre Permutations in comparative genomics
  • 27. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Example: sorting by signed reversals π= −5 +1 +2 +4 −7 −3 +6 −5 +1 +2 −4 −7 −3 +6 −5 +1 +2 −4 −7 −6 +3 −5 +1 +2 −4 −3 +6 +7 −5 +1 +2 +3 +4 +6 +7 σ= +1 +2 +3 +4 +5 +6 +7 Anthony Labarre Permutations in comparative genomics
  • 28. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Example: sorting by signed reversals π= −5 +1 +2 +4 −7 −3 +6 −5 +1 +2 −4 −7 −3 +6 −5 +1 +2 −4 −7 −6 +3 −5 +1 +2 −4 −3 +6 +7 −5 +1 +2 +3 +4 +6 +7 σ= +1 +2 +3 +4 +5 +6 +7 Anthony Labarre Permutations in comparative genomics
  • 29. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Example: sorting by signed reversals π= −5 +1 +2 +4 −7 −3 +6 −5 +1 +2 −4 −7 −3 +6 −5 +1 +2 −4 −7 −6 +3 −5 +1 +2 −4 −3 +6 +7 −5 +1 +2 +3 +4 +6 +7 −5 −4 −3 −2 −1 +6 +7 σ= +1 +2 +3 +4 +5 +6 +7 Anthony Labarre Permutations in comparative genomics
  • 30. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Example: sorting by signed reversals π= −5 +1 +2 +4 −7 −3 +6 −5 +1 +2 −4 −7 −3 +6 −5 +1 +2 −4 −7 −6 +3 −5 +1 +2 −4 −3 +6 +7 −5 +1 +2 +3 +4 +6 +7 −5 −4 −3 −2 −1 +6 +7 σ= +1 +2 +3 +4 +5 +6 +7 srd(π, σ) ≤ 6 Anthony Labarre Permutations in comparative genomics
  • 31. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Computing genome rearrangement distances Proving upper bounds is “easy” (find a sequence that works); But how do we guarantee that a given sequence is optimal? We usually rely on variants of the breakpoint graph, first introduced by [Bafna and Pevzner, 1996]; That structure proved very useful in: 1 obtaining bounds and approximations; 2 computing distances exactly in polynomial time; 3 proving complexity results; Anthony Labarre Permutations in comparative genomics
  • 32. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions The breakpoint graph Going back to our example: π= −5 +1 +2 +4 −7 −3 +6 Anthony Labarre Permutations in comparative genomics
  • 33. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions The breakpoint graph Going back to our example: π= −5 +1 +2 +4 −7 −3 +6 π = 0 10 9 1 2 3 4 7 8 14 13 6 5 11 12 15 1 double π’s elements (i → {2|i| − 1, 2|i|}) and add 0 and 2n + 1 Anthony Labarre Permutations in comparative genomics
  • 34. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions The breakpoint graph Going back to our example: π= −5 +1 +2 +4 −7 −3 +6 π = 0 10 9 1 2 3 4 7 8 14 13 6 5 11 12 15 1 double π’s elements π14 π15 π0 15 (i → {2|i| − 1, 2|i|}) and π13 12 0 π1 11 10 add 0 and 2n + 1 π12 π2 5 9 2 elements of π = vertices π11 6 1 π3 π10 13 2π 4 14 3 π9 8 7 4 π5 π8 π π6 7 Anthony Labarre Permutations in comparative genomics
  • 35. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions The breakpoint graph Going back to our example: π= −5 +1 +2 +4 −7 −3 +6 π = 0 10 9 1 2 3 4 7 8 14 13 6 5 11 12 15 1 double π’s elements π14 π15 π0 15 (i → {2|i| − 1, 2|i|}) and π13 12 0 π1 11 10 add 0 and 2n + 1 π12 π2 5 9 2 elements of π = vertices π11 6 1 π3 3 black edges connect distinct adjacent genes π10 13 2π 4 14 3 π9 8 7 4 π5 π8 π π6 7 Anthony Labarre Permutations in comparative genomics
  • 36. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions The breakpoint graph Going back to our example: π= −5 +1 +2 +4 −7 −3 +6 π = 0 10 9 1 2 3 4 7 8 14 13 6 5 11 12 15 1 double π’s elements π14 π15 π0 15 (i → {2|i| − 1, 2|i|}) and π13 12 0 π1 11 10 add 0 and 2n + 1 π12 π2 5 9 2 elements of π = vertices π11 6 1 π3 3 black edges connect distinct adjacent genes π10 13 2π 4 4 grey edges connect distinct 14 3 π9 8 7 4 π5 consecutive genes π8 π π6 7 Anthony Labarre Permutations in comparative genomics
  • 37. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Using the breakpoint graph The breakpoint graph is 2-regular and decomposes as such into alternating cycles in a unique way; The breakpoint graph of 1 2 · · · n contains the largest number of cycles: 15 15 12 0 14 0 11 10 13 1 5 9 12 2 6 1 11 3 13 2 10 4 14 3 9 5 8 4 8 6 7 7 ⇒ goal: create new cycles in as few moves as possible; Anthony Labarre Permutations in comparative genomics
  • 38. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions A lower bound on the signed reversal distance A signed reversal involves black edges belonging to at most two cycles; Anthony Labarre Permutations in comparative genomics
  • 39. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions A lower bound on the signed reversal distance A signed reversal involves black edges belonging to at most two cycles; The only way to increase c(BG (π)) is to split cycles: ··· ··· π2i π2i+1 π2j π2j+1 π2i π2j π2i+1 π2j+1 Anthony Labarre Permutations in comparative genomics
  • 40. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions A lower bound on the signed reversal distance A signed reversal involves black edges belonging to at most two cycles; The only way to increase c(BG (π)) is to split cycles: ··· ··· π2i π2i+1 π2j π2j+1 π2i π2j π2i+1 π2j+1 ± Therefore, for all π in Sn : srd(π) ≥ n + 1 − c(BG (π)). Anthony Labarre Permutations in comparative genomics
  • 41. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions An exact formula for computing the signed reversal distance It is not always possible to split cycles; Anthony Labarre Permutations in comparative genomics
  • 42. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions An exact formula for computing the signed reversal distance It is not always possible to split cycles; Worse: some collections of cycles in the breakpoint graph, called hurdles, each require an additional move; 0 13 14 9 10 4 3 6 5 11 12 1 2 7 8 15 16 21 22 19 20 17 18 23 splittable ⇒ component is “ok” hurdle (no splittable cycle) Anthony Labarre Permutations in comparative genomics
  • 43. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions An exact formula for computing the signed reversal distance It is not always possible to split cycles; Worse: some collections of cycles in the breakpoint graph, called hurdles, each require an additional move; 0 13 14 9 10 4 3 2 1 12 11 5 6 7 8 15 16 21 22 19 20 17 18 23 splittable ⇒ component is “ok” hurdle (no splittable cycle) Anthony Labarre Permutations in comparative genomics
  • 44. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions An exact formula for computing the signed reversal distance It is not always possible to split cycles; Worse: some collections of cycles in the breakpoint graph, called hurdles, each require an additional move; Theorem ([Hannenhalli and Pevzner, 1999]) ± For all π in Sn : srd(π) = n + 1 − c(BG (π)) + h(BG (π)) + f (BG (π)) . number of hurdles special “fortress” case Anthony Labarre Permutations in comparative genomics
  • 45. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions An exact formula for computing the signed reversal distance It is not always possible to split cycles; Worse: some collections of cycles in the breakpoint graph, called hurdles, each require an additional move; Theorem ([Hannenhalli and Pevzner, 1999]) ± For all π in Sn : srd(π) = n + 1 − c(BG (π)) + h(BG (π)) + f (BG (π)) . number of hurdles special “fortress” case Computing srd(·) / sorting can be done in polynomial time; Anthony Labarre Permutations in comparative genomics
  • 46. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Integrating other constraints: “perfection” (recall Mathilde Bouvel’s talk) As seen before, some genes “stick together”; Problem (perfect sorting by signed reversals) ± Given: a permutation π in Sn , and a set S of intervals of π. Find: a minimum-length sequence of signed reversals that sorts π and whose elements do not overlap with intervals from S. Example (not sorting sequences) −3 2 − 1 4 5 −3 2 − 1 4 5 invalid valid Anthony Labarre Permutations in comparative genomics
  • 47. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions A more general view: double cut-and-join The “double-cut-and-join” (DCJ) operation is defined directly on the breakpoint graph: Idea: cut two black edges and join their endpoints; Simulates signed reversals and block-interchanges (2 DCJs for the latter); ⇒ sorting by DCJs ≡ sorting by signed reversals with weight 1 and block-interchanges with weight 2; Theorem ([Yancopoulos et al., 2005]) ± For any π in Sn : dcj(π) = n + 1 − c(BG (π)). Anthony Labarre Permutations in comparative genomics
  • 48. Introduction Searching for similarities Comparing signed permutations Genome rearrangements Comparing unsigned permutations Sorting by signed reversals Counting problems The breakpoint graph and its uses Median problems Selected results and open problems Conclusions Some results and some questions (non exhaustive) What has been done: Operation Sorting Distance Best approximation signed reversal O(n3/2 ) [Han, 2006] O(n) [Bader et al., 2001] 1 perfect signed reversal NP-hard [Figeac and Varr´, 2004] e ? prefix signed reversal ? ? 2 [Cohen and Blum, 1995] double cut and join O(n) [Yancopoulos et al., 2005] 1 What could be done: Prefix signed reversals: 1 complexity of sorting / computing the distance? 2 largest value the distance can reach? 3 “better-than-2”-approximation? Characterise “hard instances” using pattern matching/avoidance; Anthony Labarre Permutations in comparative genomics
  • 49. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions Unsigned permutations In some situations, gene orientation may be unknown or can be disregarded; Genomes are then modelled by the more traditional unsigned permutations; Of course, rearrangements in that setting do not affect orientation; Anthony Labarre Permutations in comparative genomics
  • 50. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions The breakpoint graph in the unsigned case More direct construction than in the signed case; Let’s build the breakpoint graph of π = 4 1 6 2 5 7 3 : Anthony Labarre Permutations in comparative genomics
  • 51. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions The breakpoint graph in the unsigned case More direct construction than in the signed case; Let’s build the breakpoint graph of π = 4 1 6 2 5 7 3 : 0 1 build the ordered vertex set 3 4 (π0 = 0, π1 , π2 , . . . , πn ); 7 1 5 6 2 Anthony Labarre Permutations in comparative genomics
  • 52. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions The breakpoint graph in the unsigned case More direct construction than in the signed case; Let’s build the breakpoint graph of π = 4 1 6 2 5 7 3 : 0 1 build the ordered vertex set 3 4 (π0 = 0, π1 , π2 , . . . , πn ); 2 add black arcs for every 7 1 ordered pair (πi , πi−1 (mod n+1) ); 5 6 2 Anthony Labarre Permutations in comparative genomics
  • 53. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions The breakpoint graph in the unsigned case More direct construction than in the signed case; Let’s build the breakpoint graph of π = 4 1 6 2 5 7 3 : 0 1 build the ordered vertex set 3 4 (π0 = 0, π1 , π2 , . . . , πn ); 2 add black arcs for every 7 1 ordered pair (πi , πi−1 (mod n+1) ); 3 add grey arcs for every ordered 5 6 pair (i, i + 1 (mod n + 1)); 2 Anthony Labarre Permutations in comparative genomics
  • 54. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions The breakpoint graph in the unsigned case More direct construction than in the signed case; Let’s build the breakpoint graph of π = 4 1 6 2 5 7 3 : 0 1 build the ordered vertex set 3 4 (π0 = 0, π1 , π2 , . . . , πn ); 2 add black arcs for every 7 1 ordered pair (πi , πi−1 (mod n+1) ); 3 add grey arcs for every ordered 5 6 pair (i, i + 1 (mod n + 1)); 2 BG (π) decomposes in a unique way into alternating cycles Anthony Labarre Permutations in comparative genomics
  • 55. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions Breakpoint graphs as permutations Nice property of BG (π): its alternating cycle decomposition matches the traditional disjoint cycle decomposition of π = (0, 1, 2, . . . , n) ◦ (0, πn , πn−1 , . . . , π1 ); As a consequence, we can express the action of any rearrangement σ on π using π: Lemma ([Labarre, 2012]) For all π, σ in Sn , we have π ◦ σ = π ◦ σ π . ... and we can recycle known results in this context; Anthony Labarre Permutations in comparative genomics
  • 56. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions Obtaining lower bounds Lower bounds can be automatically obtained as follows: π π ◦ x1 π ◦ x1 ◦ x2 ι Initial problem: ··· computing dS (π): Anthony Labarre Permutations in comparative genomics
  • 57. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions Obtaining lower bounds Lower bounds can be automatically obtained as follows: π π ◦ x1 π ◦ x1 ◦ x2 ι Initial problem: ··· computing dS (π): New problem: computing dC (π): π ι=ι Anthony Labarre Permutations in comparative genomics
  • 58. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions Obtaining lower bounds Lower bounds can be automatically obtained as follows: π π ◦ x1 π ◦ x1 ◦ x2 ι Initial problem: ··· computing dS (π): dC (π) ≤ dS (π) New problem: ··· computing dC (π): π π ◦ y1 π ◦ y1 ◦ y2 ι=ι Anthony Labarre Permutations in comparative genomics
  • 59. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions Obtaining lower bounds Lower bounds can be automatically obtained as follows: π π ◦ x1 π ◦ x1 ◦ x2 ι Initial problem: ··· computing dS (π): dC (π) ≤ dS (π) New problem: ··· computing dC (π): π π ◦ y1 π ◦ y1 ◦ y2 ι=ι This yields tight lower bounds on: 1 the block-interchange distance [Christie, 1996]; 2 the transposition distance [Bafna and Pevzner, 1998]; 3 the prefix transposition distance [Labarre, 2012]; Anthony Labarre Permutations in comparative genomics
  • 60. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions Three examples How one can obtain bounds on the following distances: 1 block-interchange distance (bid(·)): Anthony Labarre Permutations in comparative genomics
  • 61. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions Three examples How one can obtain bounds on the following distances: 1 block-interchange distance (bid(·)): 1 block-interchange = pair of 2-cycles; Anthony Labarre Permutations in comparative genomics
  • 62. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions Three examples How one can obtain bounds on the following distances: 1 block-interchange distance (bid(·)): 1 block-interchange = pair of 2-cycles; 2 bid(π) ≥ dC (π) = n+1−c(BG (π)) ; 2 Anthony Labarre Permutations in comparative genomics
  • 63. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions Three examples How one can obtain bounds on the following distances: 1 block-interchange distance (bid(·)): 1 block-interchange = pair of 2-cycles; 2 bid(π) ≥ dC (π) = n+1−c(BG (π)) ; 2 2 transposition distance (td(·)): Anthony Labarre Permutations in comparative genomics
  • 64. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions Three examples How one can obtain bounds on the following distances: 1 block-interchange distance (bid(·)): 1 block-interchange = pair of 2-cycles; 2 bid(π) ≥ dC (π) = n+1−c(BG (π)) ; 2 2 transposition distance (td(·)): 1 transposition = 3-cycle; Anthony Labarre Permutations in comparative genomics
  • 65. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions Three examples How one can obtain bounds on the following distances: 1 block-interchange distance (bid(·)): 1 block-interchange = pair of 2-cycles; 2 bid(π) ≥ dC (π) = n+1−c(BG (π)) ; 2 2 transposition distance (td(·)): 1 transposition = 3-cycle; 2 td(π) ≥ dC (π) = n+1−codd (BG (π)) ; 2 Anthony Labarre Permutations in comparative genomics
  • 66. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions Three examples How one can obtain bounds on the following distances: 1 block-interchange distance (bid(·)): 1 block-interchange = pair of 2-cycles; 2 bid(π) ≥ dC (π) = n+1−c(BG (π)) ; 2 2 transposition distance (td(·)): 1 transposition = 3-cycle; 2 td(π) ≥ dC (π) = n+1−codd (BG (π)) ; 2 3 prefix transposition distance (ptd(·)): Anthony Labarre Permutations in comparative genomics
  • 67. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions Three examples How one can obtain bounds on the following distances: 1 block-interchange distance (bid(·)): 1 block-interchange = pair of 2-cycles; 2 bid(π) ≥ dC (π) = n+1−c(BG (π)) ; 2 2 transposition distance (td(·)): 1 transposition = 3-cycle; 2 td(π) ≥ dC (π) = n+1−codd (BG (π)) ; 2 3 prefix transposition distance (ptd(·)): 1 prefix transposition = 3-cycle containing 0; Anthony Labarre Permutations in comparative genomics
  • 68. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions Three examples How one can obtain bounds on the following distances: 1 block-interchange distance (bid(·)): 1 block-interchange = pair of 2-cycles; 2 bid(π) ≥ dC (π) = n+1−c(BG (π)) ; 2 2 transposition distance (td(·)): 1 transposition = 3-cycle; 2 td(π) ≥ dC (π) = n+1−codd (BG (π)) ; 2 3 prefix transposition distance (ptd(·)): 1 prefix transposition = 3-cycle containing 0; 2 ptd(π) ≥ dC (π) = n+1+c(BG (π)) 0 if π1 = 1, 2 − 2c1 (BG (π)) − ; 1 otherwise. Anthony Labarre Permutations in comparative genomics
  • 69. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions A word of caution Don’t think that sorting unsigned permutations is trivial; bid(·) and exc(·) are indeed easy to compute, but: 1 sorting by (prefix or arbitrary) reversals becomes NP-hard; 2 sorting by transpositions is NP-hard; 3 sorting by double cut-and-joins is NP-hard; 4 sorting by prefix transpositions is open; Anthony Labarre Permutations in comparative genomics
  • 70. Introduction Comparing signed permutations An “unsigned variant” of the breakpoint graph Comparing unsigned permutations Lower bounds on unsigned distances Counting problems Selected results and open problems Median problems Conclusions Results on sorting unsigned permutations What has been done: Operation Sorting Distance Best approximation exchange O(n) [Knuth, 1995] 1 block-interchange O(n) [Christie, 1996] 1 double cut-and-joins NP-hard [Chen, 2010] ? reversal NP-hard [Caprara, 1999b] 11/8 [Berman et al., 2002] transposition NP-hard [Bulteau et al., 2011b] 11/8 [Elias and Hartman, 2006] exchange O(n) [Akers et al., 1987] 1 prefix reversal NP-hard [Bulteau et al., 2011a] 2 [Fischer and Ginzinger, 2005] transposition ? ? 2 [Dias and Meidanis, 2002] What could be done: better approximations; complexity of prefix transposition problems? Anthony Labarre Permutations in comparative genomics
  • 71. Introduction Comparing signed permutations Hultman numbers Comparing unsigned permutations Approximating distance distributions Counting problems “Listing” problems Median problems Conclusions Counting problems What is the distribution of a given rearrangement distance? Most tight bounds on rearrangement distances are obtained using the breakpoint graph and its cycles; Use the distribution of cycles in the breakpoint graph to approximately answer the above; Anthony Labarre Permutations in comparative genomics
  • 72. Introduction Comparing signed permutations Hultman numbers Comparing unsigned permutations Approximating distance distributions Counting problems “Listing” problems Median problems Conclusions Hultman numbers Hultman numbers count permutations whose breakpoint graph contains k cycles: SH (n, k) = |{π ∈ Sn | c(BG (π)) = k}| unsigned case ± ± SH (n, k) = |{π ∈ Sn | c(BG (π)) = k}| signed case Similar in spirit to Stirling numbers of the first kind; Explicit formulas are available for computing SH (n, k) and ± SH (n, k) (see e.g. [Grusea and Labarre, 2011]); Also available: generating functions, expected value and variance; Anthony Labarre Permutations in comparative genomics
  • 73. Introduction Comparing signed permutations Hultman numbers Comparing unsigned permutations Approximating distance distributions Counting problems “Listing” problems Median problems Conclusions Approximating distance distributions using Hultman numbers unsigned permutations signed permutations ·109 ·109 ·109 |td(·) = k| 3 |ptd(·) = k| 1.5 |psrd(·) = k| SH (n, n + 1 − 2k) SH (n, n + 2 − 2k) ± 3 SH (n, n + 3 − k) (n = 13) (n = 13) (n = 10) 2 1 2 1 0.5 1 0 0 0 0 2 4 6 8 0 2 4 6 8 10 0 5 10 15 ·109 ·109 ·109 3 |rd(·) = k| 3 |bid(·) = k| 1.5 |srd(·) = k| SH (n, n + 3 − 2k) SH (n, n + 1 − 2k) ± SH (n, n + 1 − k) (n = 13) (n = 13) (n = 10) 2 2 1 distributions match 0.5 distributions are close 1 1 0 0 0 0 2 4 6 8 10 12 0 2 4 6 0 2 4 6 8 10 12 (distance distributions from [Galv˜o and Dias, 2011]) a Anthony Labarre Permutations in comparative genomics
  • 74. Introduction Comparing signed permutations Hultman numbers Comparing unsigned permutations Approximating distance distributions Counting problems “Listing” problems Median problems Conclusions Experiments wrap-up and perspectives Some distance distributions are extremely well approximated using (some function of) the Hultman numbers; Can bounds be tightened by trying to minimise the difference between the distributions? Can the proximity of distributions be used to argue that exact computation of hard distances is overrated? Anthony Labarre Permutations in comparative genomics
  • 75. Introduction Comparing signed permutations Hultman numbers Comparing unsigned permutations Approximating distance distributions Counting problems “Listing” problems Median problems Conclusions Listing optimal sequences Distances are informative; Anthony Labarre Permutations in comparative genomics
  • 76. Introduction Comparing signed permutations Hultman numbers Comparing unsigned permutations Approximating distance distributions Counting problems “Listing” problems Median problems Conclusions Listing optimal sequences Distances are informative; ... but an actual sorting sequence is more informative; Anthony Labarre Permutations in comparative genomics
  • 77. Introduction Comparing signed permutations Hultman numbers Comparing unsigned permutations Approximating distance distributions Counting problems “Listing” problems Median problems Conclusions Listing optimal sequences Distances are informative; ... but an actual sorting sequence is more informative; What if the given sequence we found makes no biological sense? Anthony Labarre Permutations in comparative genomics
  • 78. Introduction Comparing signed permutations Hultman numbers Comparing unsigned permutations Approximating distance distributions Counting problems “Listing” problems Median problems Conclusions Listing optimal sequences Distances are informative; ... but an actual sorting sequence is more informative; What if the given sequence we found makes no biological sense? ⇒ can we list all optimal sequences for a given instance? Anthony Labarre Permutations in comparative genomics
  • 79. Introduction Comparing signed permutations Hultman numbers Comparing unsigned permutations Approximating distance distributions Counting problems “Listing” problems Median problems Conclusions Listing optimal sequences Distances are informative; ... but an actual sorting sequence is more informative; What if the given sequence we found makes no biological sense? ⇒ can we list all optimal sequences for a given instance? Efficient algorithms exist for listing: all optimal signed reversals [Swenson et al., 2011]; all optimal sequences [Badr et al., 2011]; Anthony Labarre Permutations in comparative genomics
  • 80. Introduction Comparing signed permutations Motivation Comparing unsigned permutations Bounds Counting problems Selected results Median problems Conclusions Median problems Measures of similarities between genomes are useful in reconstructing phylogenies; Example (phylogeny from distance matrix) a b c d e c a 0 2 3 6 6 2 d b 2 0 3 6 6 a 1 1 1 2 c 3 3 0 5 5 d 6 6 5 0 4 b 1 2 e 6 6 5 4 0 e (The matrix must satisfy some conditions [Buneman, 1971]); Anthony Labarre Permutations in comparative genomics
  • 81. Introduction Comparing signed permutations Motivation Comparing unsigned permutations Bounds Counting problems Selected results Median problems Conclusions Median problems Parsimony again: search for a tree that minimises the total number of evolutionary events (i.e. the sum of all edge weights); In its simplest form, the problem we want to solve is: Problem (median of three) ± ± ± Given: π, σ, τ in Sn ; a distance d : Sn × Sn → N. Find: a permutation µ in Sn ± that minimises w (µ) = d(π, µ) + d(σ, µ) + d(τ, µ). Can be generalised to more than three input permutations; Anthony Labarre Permutations in comparative genomics
  • 82. Introduction Comparing signed permutations Motivation Comparing unsigned permutations Bounds Counting problems Selected results Median problems Conclusions Generic bounds [Siepel and Moret, 2001] Generic lower and upper bounds for any distance: π d(π, σ) d(π, τ ) σ τ d(σ, τ ) Anthony Labarre Permutations in comparative genomics
  • 83. Introduction Comparing signed permutations Motivation Comparing unsigned permutations Bounds Counting problems Selected results Median problems Conclusions Generic bounds [Siepel and Moret, 2001] Generic lower and upper bounds for any distance: π d(π, σ) d(π, τ ) σ τ d(σ, τ ) if µ=π if µ=σ if µ=τ w (µ) ≤ min{d(π, σ) + d(π, τ ), d(π, σ) + d(σ, τ ), d(π, τ ) + d(σ, τ )}. Anthony Labarre Permutations in comparative genomics