Successfully reported this slideshow.
Upcoming SlideShare
×

# De Bruijn Superwalk with Multiplicities Problem is NP-hard

6,279 views

Published on

Published in: Technology, Education
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### De Bruijn Superwalk with Multiplicities Problem is NP-hard

1. 1. De Bruijn Superwalk with Multiplicities Problem is NP-hard Evgeny Kapun Fedor TsarevGenome Assembly Algorithms LaboratoryUniversity ITMO, St. Petersburg, Russia April 11, 2013
2. 2. Genome Assembly Models Shortest Common Superstring – NP-hard (Gallant et al., 1980). Shortest de Bruijn Superwalk – NP-hard (Medvedev et al., 2007).
3. 3. Genome Assembly Models Maximum likelihood approach – to ﬁnd edges’ multiplicities (Medvedev and Brudno, 2009; Varna et al., 2011). De Bruijn Superwalk with Multiplicities – complexity was not known. This talk – we prove it is NP-hard.
4. 4. De Bruijn Superwalk withMultiplicities Problem Find a walk in the de Bruijn graph containing several walks as subwalks and passing through each edge the exactly predeﬁned number of times.
5. 5. Example Reads = subwalks: AAGT, AGTCA, TCAA 2 AA AG 2 1 1 CA GT 2 TC 2 Superwalk: AAGTCAGTCAAG
6. 6. NP-hardness Proof Outline 1. Reduce Shortest Common Superstring problem to Common Superstring with Multiplicities problem. 2. Reduce Common Superstring with Multiplicities problem to De Bruijn Superwalk with Multiplicities problem.
7. 7. NP-hardness Proof Outline 1. Reduce Shortest Common Superstring problem to Common Superstring with Multiplicities problem. 2. Reduce Common Superstring with Multiplicities problem to De Bruijn Superwalk with Multiplicities problem.
8. 8. Common Superstring withMultiplicities Problem Find a string containing several strings as substrings and containing each character the exactly predeﬁned number of times.
9. 9. Example Strings: AAGT, AGTCA, TCAA Multiplicities: m(A) = 5, m(C) = 2, m(G) = 3, m(T) = 2 Solution for SCS: AAGTCAA Solution for CSM: AAGTCAGTCAAG or just AAGTCAAACGGT
10. 10. Reducing SCS to CSM Given an instance of SCS with Σ = {0, 1} in decision form (“Is there such a string that . . . ”), substitute 0 → T0 = 000111 1 → T1 = 001011 and make the multiplicities of 0 and 1 equal to 3 times the desired superstring length.
11. 11. Properties of T0 and T1 T0 and T1 have the same length. Furthermore, number of occurrences of each character is the same in T0 and T1. No proper suﬃx of either T0 or T1 is equal to any of the proper preﬁxes of either T0 or T1.
12. 12. Properties of T0 and T1 As a result, all overlaps of the transformed strings are aligned. 000111 001011 000111 001011 000111 001011
13. 13. Properties of T0 and T1 Unaligned overlaps are impossible because no proper preﬁx of T0 and T1 is equal to any proper suﬃx. 000111 001011 000111 ? ?
14. 14. Reducing SCS to CSM As a result, the shortest common superstring of the transformed strings would be equal to the transformed shortest common superstring of the original strings.
15. 15. NP-hardness Proof Outline1. Reduce Shortest Common Superstring problem to CommonSuperstring with Multiplicities problem. 2. Reduce Common Superstring with Multiplicities problem to De Bruijn Superwalk with Multiplicities problem.
16. 16. Reducing CSM to DBSM Trivial reduction (Σ = {0, 1}, k = 0): 0 1 Strings become walks, multiplicities of characters become multiplicities of edges.
17. 17. Reducing CSM to DBSM Generalization for any k: 0k−1 10 0k−1 1 0k−2 10 0k 1 0k+1 0k 10k 10k−1 010k−2 010k−1
18. 18. Result De Bruijn Superwalk with Multiplicities problem is NP-hard for any |Σ| ≥ 2 and any k. Since the case |Σ| = 1 is trivial, the problem is NP-hard for all nontrivial cases.
19. 19. Ackowledgements Funding: Ministry of Education and Science of Russian Federation (contract 16.740.11.0495, agreement 14.B37.21.0562) University ITMO (research project 610455)
20. 20. Thank you! Questions?