A coverage criterion for spaced seeds and its applications to SVM string-kernels and k-mer distances - presentation
1.
A coverage criterionfor spaced seeds
and its applications to SVM string-kernels and
k-mer distances
Laurent Noe, Donald E. K. Martin
LIFL (UMR 8022 Lille 1/CNRS) - Inria Lille, Villeneuve d'Ascq, France
Department of Statistics, North Carolina State University, Raleigh, NC, USA
SeqBio 2014
November 45, 2014 - Montpellier
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
ned as abinary word over the alphabet f1; *g :
1 : accepts only match symbol | ,
* : accepts all alignment symbols (joker) .
s : span (length), w : weight (number of 1).
Example
= 111*1*11
ATCAGTGCGAATGCGCAAGA
|||||:||:|||||.|||||
ATCAGCGCAAATGCTCAAGA
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
ned as abinary word over the alphabet f1; *g :
1 : accepts only match symbol | ,
* : accepts all alignment symbols (joker) .
s : span (length), w : weight (number of 1).
Example
= 111*1*11
111*1*11
ATCAGTGCGAATGCGCAAGA
|||||:||:|||||.|||||
ATCAGCGCAAATGCTCAAGA
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
ned as abinary word over the alphabet f1; *g :
1 : accepts only match symbol | ,
* : accepts all alignment symbols (joker) .
s : span (length), w : weight (number of 1).
Example
= 111*1*11
111*1*11
ATCAGTGCGAATGCGCAAGA
|||||:||:|||||.|||||
ATCAGCGCAAATGCTCAAGA
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
ned as abinary word over the alphabet f1; *g :
1 : accepts only match symbol | ,
* : accepts all alignment symbols (joker) .
s : span (length), w : weight (number of 1).
Example
= 111*1*11
111*1*11
ATCAGTGCGAATGCGCAAGA
|||||:||:|||||.|||||
ATCAGCGCAAATGCTCAAGA
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
ned as abinary word over the alphabet f1; *g :
1 : accepts only match symbol | ,
* : accepts all alignment symbols (joker) .
s : span (length), w : weight (number of 1).
Example
= 111*1*11
111*1*11
ATCAGTGCGAATGCGCAAGA
|||||:||:|||||.|||||
ATCAGCGCAAATGCTCAAGA
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
ned as abinary word over the alphabet f1; *g :
1 : accepts only match symbol | ,
* : accepts all alignment symbols (joker) .
s : span (length), w : weight (number of 1).
Example
= 111*1*11
111*1*11
ATCAGTGCGAATGCGCAAGA
|||||:||:|||||.|||||
ATCAGCGCAAATGCTCAAGA
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
ned as abinary word over the alphabet f1; *g :
1 : accepts only match symbol | ,
* : accepts all alignment symbols (joker) .
s : span (length), w : weight (number of 1).
Example
= 111*1*11
111*1*11
ATCAGTGCGAATGCGCAAGA
|||||:||:|||||.|||||
ATCAGCGCAAATGCTCAAGA
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
26.
Example
Laurent Noe,Donald E. K. Martin A coverage criterion for spaced seeds and its applications
Recent work relatedto spaced seeds
1 Alignment-free distances
[Leimeister et al., 2014, Horwege et al., 2014, Boden et al., 2013]
2 SVM classi
43.
cation
[Onodera andShibuya, 2013, Ghandi et al., 2014]
3 Read clustering
[Bao et al., 2011, Chong et al., 2012, Hauser et al., 2013]
4 Metagenomic classi
44.
cation, . ..
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
45.
New Uses forOld Things
little boy
))))
frying pan
1
1http://arch5541.wordpress.com/2012/11/16/and-then-there-was-teflon/
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
46.
New Uses forOld Things
little boy
))))
frying pan
1
ATCAGTGCGAATGCGCAAGA
|||||.||.|||||.|||||
ATCAGCGCAAATGCTCAAGA
111*1*11
1http://arch5541.wordpress.com/2012/11/16/and-then-there-was-teflon/
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
47.
New Uses forOld Things
little boy
))))
frying pan
1
ATCAGTGCGAATGCGCAAGA
|||||.||.|||||.|||||
ATCAGCGCAAATGCTCAAGA
111*1*11
))))
ATCAGTGCGAATGCGCAAGA
|||||.||.|||||.|||||
ATCAGCGCAAATGCTCAAGA
111*1*11
111*1*11
111*1*11
1http://arch5541.wordpress.com/2012/11/16/and-then-there-was-teflon/
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
48.
New Uses forOld Things
little boy
))))
frying pan
1
ATCAGTGCGAATGCGCAAGA
|||||.||.|||||.|||||
ATCAGCGCAAATGCTCAAGA
111*1*11
))))
ATCAGTGCGAATGCGCAAGA
|||||.||.|||||.|||||
ATCAGCGCAAATGCTCAAGA
111*1*11
111*1*11
111*1*11
1http://arch5541.wordpress.com/2012/11/16/and-then-there-was-teflon/
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
49.
New Uses forOld Things
little boy
))))
frying pan
1
ATCAGTGCGAATGCGCAAGA
|||||.||.|||||.|||||
ATCAGCGCAAATGCTCAAGA
111*1*11
))))
ATCAGTGCGAATGCGCAAGA
|||||.||.|||||.|||||
ATCAGCGCAAATGCTCAAGA
111*1*11
111*1*11
111*1*11
1http://arch5541.wordpress.com/2012/11/16/and-then-there-was-teflon/
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
nition
Number ofmatch symbols covered by at least one 1 symbol from any
seed hit [Benson and Mak, 2008, Martin, 2013]
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
nition
Number ofmatch symbols covered by at least one 1 symbol from any
seed hit [Benson and Mak, 2008, Martin, 2013]
Example
ATCAGTGCGAATGCGCAAGA
|||||.||.|||||.|||||
ATCAGCGCAAATGCTCAAGA
111*1*11
111*1*11
111*1*11
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
nition
Number ofmatch symbols covered by at least one 1 symbol from any
seed hit [Benson and Mak, 2008, Martin, 2013]
Example
ATCAGTGCGAATGCGCAAGA
|||||.||.|||||.|||||
ATCAGCGCAAATGCTCAAGA
111*1*11
111*1*11
111*1*11
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
nition
Number ofmatch symbols covered by at least one 1 symbol from any
seed hit [Benson and Mak, 2008, Martin, 2013]
Example
ATCAGTGCGAATGCGCAAGA
|||||.||.|||||.|||||
A
T
C
AG
CG
C
AA
A
T
G
C
TC
A
A
G
A
111*1*11
111*1*11
111*1*11
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
nition
Number ofmatch symbols covered by at least one 1 symbol from any
seed hit [Benson and Mak, 2008, Martin, 2013]
Example
ATCAGTGCGAATGCGCAAGA
|||||.||.|||||.|||||
A
T
C
AG
CG
C
AA
A
T
G
C
TC
A
A
G
A
111*1*11
111*1*11
111*1*11
Coverage is of 15
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
60.
Coverage measure fora seed
alignment : x = 101111001011111
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
61.
Coverage measure fora seed
alignment : x = 101111001011111
Example
seed : = 11*1
x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
62.
Coverage measure fora seed
alignment : x = 101111001011111
Example
seed : = 11*1
...
...
...
occ1 1 1 * 1
...
...
...
x = 1 0 1
1
1 1
0 0 1 0 1 1 1 1 1
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
63.
Coverage measure fora seed
alignment : x = 101111001011111
Example
seed : = 11*1
...
...
...1 1 * 1
occ1 1 1 * 1
occ2
...
...
... ...
...
...
x = 1 0 1
1
1 1
0 0 1 0 1
1
1 1
1
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
64.
Coverage measure fora seed
alignment : x = 101111001011111
Example
seed : = 11*1
...
...
...1 1 * 1
occ1 1 1 * 1
occ2
occ3
...
...
... ...
1 1 * 1
x = 1 0 1
1
1 1
0 0 1 0 1
1
1
1
1
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
65.
Coverage measure fora seed / a set of seeds
alignment : x = 101111001011111
Example
seed : = 11*1
...
...
...1 1 * 1
occ1 1 1 * 1
occ2
occ3
...
...
... ...
1 1 * 1
x = 1 0 1
1
1 1
0 0 1 0 1
1
1
1
1
set of seeds : f1; 2g = f11*1, 1*1*1g
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
66.
Coverage measure fora seed / a set of seeds
alignment : x = 101111001011111
Example
seed : = 11*1
...
...
...1 1 * 1
occ1 1 1 * 1
occ2
occ3
...
...
... ...
1 1 * 1
x = 1 0 1
1
1 1
0 0 1 0 1
1
1
1
1
set of seeds : f1; 2g = f11*1, 1*1*1g
x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
67.
Coverage measure fora seed / a set of seeds
alignment : x = 101111001011111
Example
seed : = 11*1
...
...
...1 1 * 1
occ1 1 1 * 1
occ2
occ3
...
...
... ...
1 1 * 1
x = 1 0 1
1
1 1
0 0 1 0 1
1
1
1
1
set of seeds : f1; 2g = f11*1, 1*1*1g
...
2 occ1 1 * 1 * 1
1 occ2
1 1 * 1
2 occ3
...
...
...
...
...
1 * 1 * 1
1 occ4
...
...
...
...
...
...
1 1 * 1
2 occ5
...
...
...
...
...
...
1 * 1 * 1
1 occ6
...
...
...
...
...
...
...
1 1 * 1
x = 1
0 1
1
1
1
0 0 1
0 1
1
1
1
1
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
Coverage measure fora seed / a set of seeds
That's how coverage can be measured,
estimated, computed on several models. . .
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
78.
Coverage measure fora seed / a set of seeds
That's how coverage can be measured,
estimated, computed on several models. . .
But, . . . is coverage useful?
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
ers?
Yes: see[Onodera and Shibuya, 2013, Ghandi et al., 2014]
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
ers?
Yes: see[Onodera and Shibuya, 2013, Ghandi et al., 2014]
Which spaced seed patterns are better? Does coverage
help here?
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
ers
1 RFAM11.0 database (50% training, 50% testing)
2 Single/double seeds of weight w = 3 : : : 4, span up to
w + 4
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications