Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

De Bruijn Superwalk with Multiplicities Problem is NP-hard

6,289 views

Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

De Bruijn Superwalk with Multiplicities Problem is NP-hard

  1. 1. De Bruijn Superwalk with Multiplicities Problem is NP-hard Evgeny Kapun Fedor TsarevGenome Assembly Algorithms LaboratoryUniversity ITMO, St. Petersburg, Russia April 11, 2013
  2. 2. Genome Assembly Models Shortest Common Superstring – NP-hard (Gallant et al., 1980). Shortest de Bruijn Superwalk – NP-hard (Medvedev et al., 2007).
  3. 3. Genome Assembly Models Maximum likelihood approach – to find edges’ multiplicities (Medvedev and Brudno, 2009; Varna et al., 2011). De Bruijn Superwalk with Multiplicities – complexity was not known. This talk – we prove it is NP-hard.
  4. 4. De Bruijn Superwalk withMultiplicities Problem Find a walk in the de Bruijn graph containing several walks as subwalks and passing through each edge the exactly predefined number of times.
  5. 5. Example Reads = subwalks: AAGT, AGTCA, TCAA 2 AA AG 2 1 1 CA GT 2 TC 2 Superwalk: AAGTCAGTCAAG
  6. 6. NP-hardness Proof Outline 1. Reduce Shortest Common Superstring problem to Common Superstring with Multiplicities problem. 2. Reduce Common Superstring with Multiplicities problem to De Bruijn Superwalk with Multiplicities problem.
  7. 7. NP-hardness Proof Outline 1. Reduce Shortest Common Superstring problem to Common Superstring with Multiplicities problem. 2. Reduce Common Superstring with Multiplicities problem to De Bruijn Superwalk with Multiplicities problem.
  8. 8. Common Superstring withMultiplicities Problem Find a string containing several strings as substrings and containing each character the exactly predefined number of times.
  9. 9. Example Strings: AAGT, AGTCA, TCAA Multiplicities: m(A) = 5, m(C) = 2, m(G) = 3, m(T) = 2 Solution for SCS: AAGTCAA Solution for CSM: AAGTCAGTCAAG or just AAGTCAAACGGT
  10. 10. Reducing SCS to CSM Given an instance of SCS with Σ = {0, 1} in decision form (“Is there such a string that . . . ”), substitute 0 → T0 = 000111 1 → T1 = 001011 and make the multiplicities of 0 and 1 equal to 3 times the desired superstring length.
  11. 11. Properties of T0 and T1 T0 and T1 have the same length. Furthermore, number of occurrences of each character is the same in T0 and T1. No proper suffix of either T0 or T1 is equal to any of the proper prefixes of either T0 or T1.
  12. 12. Properties of T0 and T1 As a result, all overlaps of the transformed strings are aligned. 000111 001011 000111 001011 000111 001011
  13. 13. Properties of T0 and T1 Unaligned overlaps are impossible because no proper prefix of T0 and T1 is equal to any proper suffix. 000111 001011 000111 ? ?
  14. 14. Reducing SCS to CSM As a result, the shortest common superstring of the transformed strings would be equal to the transformed shortest common superstring of the original strings.
  15. 15. NP-hardness Proof Outline1. Reduce Shortest Common Superstring problem to CommonSuperstring with Multiplicities problem. 2. Reduce Common Superstring with Multiplicities problem to De Bruijn Superwalk with Multiplicities problem.
  16. 16. Reducing CSM to DBSM Trivial reduction (Σ = {0, 1}, k = 0): 0 1 Strings become walks, multiplicities of characters become multiplicities of edges.
  17. 17. Reducing CSM to DBSM Generalization for any k: 0k−1 10 0k−1 1 0k−2 10 0k 1 0k+1 0k 10k 10k−1 010k−2 010k−1
  18. 18. Result De Bruijn Superwalk with Multiplicities problem is NP-hard for any |Σ| ≥ 2 and any k. Since the case |Σ| = 1 is trivial, the problem is NP-hard for all nontrivial cases.
  19. 19. Ackowledgements Funding: Ministry of Education and Science of Russian Federation (contract 16.740.11.0495, agreement 14.B37.21.0562) University ITMO (research project 610455)
  20. 20. Thank you! Questions?

×