On NP-Hardness of the
Paired de Bruijn Sound Cycle
Problem
Evgeny Kapun Fedor Tsarev
Genome Assembly Algorithms Laboratory
University ITMO, St. Petersburg, Russia
September 2, 2013
Genome assembly models
Shortest common superstring –
NP-hard.
Shortest common superwalk in a de
Bruijn graph – NP-hard.
Superwalk in a de Bruijn graph with
known edge multiplicities – NP-hard.
Path in a paired de Bruijn graph?
Paired de Bruijn graph
TT
CT
TA
TT
AG
TC
GC
CT
CA
TT
AT
TC
TTA
CTT
TAG
TTC
AGC
TCT
GCA
CTT
CAT
TTC
ATT
TCT
TAT
TTC
CAG
TTC
Sound path
Sound path: strings match with shift d = 6.
TAGCTCACCCGTTGGT
ACCCGTTGGTAATTGC
Sound cycle: cyclic strings match with shift
d = 6.
TGATAAGTAGGCTAAG
GTAGGCTAAGTGATAA
Paired de Bruijn Sound Cycle
Problem
Given a paired de Bruijn graph G and an
integer d (represented in unary coding), find
if G has a sound cycle with respect to shift d.
Paired de Bruijn Covering Sound
Cycle Problem
Given a paired de Bruijn graph G and an
integer d (represented in unary coding), find
if G has a covering sound cycle with respect
to shift d.
Parameters
|Σ|: size of the alphabet.
k: length of vertex labels.
|V |, |E|: size of the graph (bounded in
terms of |Σ| and k).
d: shift distance.
Simple cases
|Σ| = 1: at most one vertex, at most
one edge.
k = 0: at most one vertex, at most |Σ|2
edges, reduces to the problem of
computing strongly connected
components.
d is fixed: find a cycle in a graph of
|V ||Σ|d
states.
Interesting case: k = 1
With fixed k = 1, the problem is NP-hard.
Proof outline:
1. Reduce Hamiltonian Cycle Problem,
which is NP-hard, to an intermediate
problem.
2. Reduce that problem to Paired de Bruijn
(Covering) Sound Cycle Problem.
The intermediate problem
Given an undirected graph,
if it contains a hamiltonian cycle, output
1.
if it doesn’t contain hamiltonian paths,
output 0.
otherwise, invoke undefined behavior.
Solution, step 1
G1 G2
a1
a2
a3
b1
b2
b3
Properties of a graph with a
hamiltonian cycle
Such graph contains hamiltonian paths
ending at any vertex.
passing through any edge.
for any edge {i, j} and vertex k = i, j,
passing through {i, j} such that j is
between i and k on the path.
Solution, step 2
V2n+2
s1
s2
V1
s2
s3
V2
. . . . . . s1. . . . . . . . . . . . s2. . . . . .
. . . . . . s2. . . . . . . . . . . . s3. . . . . .
Solution, step 2
In V1, V2n+2:
i
j
−→
i
j
Solution, step 2
In V3. . . V2n+1:
i
How to make a covering cycle
Pass along
the loop
multiple
times,
covering
more and
more edges
with each
iteration.
Interesting case: |Σ| = 2
With fixed |Σ| = 2, the problem is NP-hard.
The proof is done by reduction from the case
k = 1. The characters are replaced with
binary sequences, and a transformation is
done to avoid undesired overlaps.
Fix both k and |Σ|
If both k and |Σ| are fixed, the number of
different de Bruijn graphs is finite. The only
argument which can take infinitely many
values is d, and it is an integer represented in
unary coding. As a result, the number of
valid problem instances having any fixed
length is bounded. So, the language defined
by the problem is sparse. Therefore, the
problem is not NP-hard unless P=NP.
Results
Paired de Bruijn (Covering) Sound Cycle
Problem is
NP-hard for any fixed k ≥ 1 (can be
reduced from k = 1).
NP-hard for any fixed |Σ| ≥ 2 (trivially
reduced from |Σ| = 2).
NP-hard in the general case.
Results
Paired de Bruijn (Covering) Sound Cycle
Problem is
Not NP-hard if both k and |Σ| are fixed,
unless P=NP.
Solvable in polynomial time if k = 0,
|Σ| = 1, or d is fixed.
Thank you!

On NP-Hardness of the Paired de Bruijn Sound Cycle Problem

  • 1.
    On NP-Hardness ofthe Paired de Bruijn Sound Cycle Problem Evgeny Kapun Fedor Tsarev Genome Assembly Algorithms Laboratory University ITMO, St. Petersburg, Russia September 2, 2013
  • 2.
    Genome assembly models Shortestcommon superstring – NP-hard. Shortest common superwalk in a de Bruijn graph – NP-hard. Superwalk in a de Bruijn graph with known edge multiplicities – NP-hard. Path in a paired de Bruijn graph?
  • 3.
    Paired de Bruijngraph TT CT TA TT AG TC GC CT CA TT AT TC TTA CTT TAG TTC AGC TCT GCA CTT CAT TTC ATT TCT TAT TTC CAG TTC
  • 4.
    Sound path Sound path:strings match with shift d = 6. TAGCTCACCCGTTGGT ACCCGTTGGTAATTGC Sound cycle: cyclic strings match with shift d = 6. TGATAAGTAGGCTAAG GTAGGCTAAGTGATAA
  • 5.
    Paired de BruijnSound Cycle Problem Given a paired de Bruijn graph G and an integer d (represented in unary coding), find if G has a sound cycle with respect to shift d.
  • 6.
    Paired de BruijnCovering Sound Cycle Problem Given a paired de Bruijn graph G and an integer d (represented in unary coding), find if G has a covering sound cycle with respect to shift d.
  • 7.
    Parameters |Σ|: size ofthe alphabet. k: length of vertex labels. |V |, |E|: size of the graph (bounded in terms of |Σ| and k). d: shift distance.
  • 8.
    Simple cases |Σ| =1: at most one vertex, at most one edge. k = 0: at most one vertex, at most |Σ|2 edges, reduces to the problem of computing strongly connected components. d is fixed: find a cycle in a graph of |V ||Σ|d states.
  • 9.
    Interesting case: k= 1 With fixed k = 1, the problem is NP-hard. Proof outline: 1. Reduce Hamiltonian Cycle Problem, which is NP-hard, to an intermediate problem. 2. Reduce that problem to Paired de Bruijn (Covering) Sound Cycle Problem.
  • 10.
    The intermediate problem Givenan undirected graph, if it contains a hamiltonian cycle, output 1. if it doesn’t contain hamiltonian paths, output 0. otherwise, invoke undefined behavior.
  • 11.
    Solution, step 1 G1G2 a1 a2 a3 b1 b2 b3
  • 12.
    Properties of agraph with a hamiltonian cycle Such graph contains hamiltonian paths ending at any vertex. passing through any edge. for any edge {i, j} and vertex k = i, j, passing through {i, j} such that j is between i and k on the path.
  • 13.
    Solution, step 2 V2n+2 s1 s2 V1 s2 s3 V2 .. . . . . s1. . . . . . . . . . . . s2. . . . . . . . . . . . s2. . . . . . . . . . . . s3. . . . . .
  • 14.
    Solution, step 2 InV1, V2n+2: i j −→ i j
  • 15.
    Solution, step 2 InV3. . . V2n+1: i
  • 16.
    How to makea covering cycle Pass along the loop multiple times, covering more and more edges with each iteration.
  • 17.
    Interesting case: |Σ|= 2 With fixed |Σ| = 2, the problem is NP-hard. The proof is done by reduction from the case k = 1. The characters are replaced with binary sequences, and a transformation is done to avoid undesired overlaps.
  • 18.
    Fix both kand |Σ| If both k and |Σ| are fixed, the number of different de Bruijn graphs is finite. The only argument which can take infinitely many values is d, and it is an integer represented in unary coding. As a result, the number of valid problem instances having any fixed length is bounded. So, the language defined by the problem is sparse. Therefore, the problem is not NP-hard unless P=NP.
  • 19.
    Results Paired de Bruijn(Covering) Sound Cycle Problem is NP-hard for any fixed k ≥ 1 (can be reduced from k = 1). NP-hard for any fixed |Σ| ≥ 2 (trivially reduced from |Σ| = 2). NP-hard in the general case.
  • 20.
    Results Paired de Bruijn(Covering) Sound Cycle Problem is Not NP-hard if both k and |Σ| are fixed, unless P=NP. Solvable in polynomial time if k = 0, |Σ| = 1, or d is fixed.
  • 21.