SlideShare a Scribd company logo
Smaller fully-functional
bi irectional in e e
Djamal Belazzougui1
Fabio Cunial2,3
1 DTISI, CERIST, Algiers.
2 Max Plan Ins i u e or Mole ular Cell Biolog an ene i s, Dres en.
3 Cen er or S s ems Biolog Dres en
Constant-space descriptor of a string W:
Bidirectional index (synchrono s
pp ication: aria e-order de r n grap encoding a orders
and a fre encies si taneo s
Time PracticalSpace
[1]
[1] and [2]
[1] constant
wo ds
ts
wo ds
wo ds
wo ds
[1] Belazzougui, Cunial. Fully-functional bidirectional Burrows-Wheeler indexes and infinite-order de Bru n gra hs. C 2 1 .
[2] agie, a arro, rezza. Fully functional suffix trees and o ti al text searching in BW -runs bounded s ace. C 2 2 .
Ne
Ne
Bidirectional inde nc rono
C
A
A•# #
#
#
#
#
C
C•
C•A
A
A A C•
A C
ST
A
A
• •
•
C•
•
Maximal re ea
= 8
C
A
A•# #
#
#
#
#
C
C•
C•A
A
A A C•
A C
ST
A
A
• •
•
C•
•
Right-extensions of maximal re eats
≥
≥
C
A
A•# #
#
#
#
#
C
C•
C•A
A
A A C•
A C
ST
A
A
• •
•
C•
•
Depth of the m m epe t t ee
= 3
C
A
A•# #
#
#
#
#
C
C•
C•A
A
A A C•
A C
ST
A
A
• •
•
C•
•
String depth of the maximal repeat tree
= 6
C
A
A•# #
#
#
#
#
C
C•
C•A
A
A A C•
A C
ST
A
A
• •
•
C•
•
Frontier m im re e t
C
A
A•# #
#
#
#
#
C
C•
C•A
A
A A C•
A C
ST
A
A
• •
•
C•
•
Rightmost m im ts
Backgrou
C
A
A•# #
#
#
#
#
C
C•
C•A
A
A A C•
A C
ST
A
A
• •
•
C•
•
Left-contraction from right-maximal
C
A
A•# #
#
#
#
#
C
C•
C•A
A
A A C•
A C
ST
A
A
• •
•
C•
•
Suffix link
Suffix link
CG CGCG
G GG
C
C
C
C
C
C
C
C
C
(a) (b) ( )
Left-contraction from non-right-maximal
C
A
A•# #
#
#
#
#
C
C•
C•A
A
A A C•
A C
ST
A
A
• •
•
C•
•
Belazzougui, Cunial. Fully-functional bidirectional Burrows-Wheeler indexes and infinite-order de Bruijn graphs. C .
Left-contraction from non-right-maximal
C
A
A•# #
#
#
#
#
C
C•
C•A
A
A A C•
A C
ST
A
A
• •
•
C•
•
Belazzougui, Cunial. Fully-functional bidirectional Burrows-Wheeler indexes and infinite-order de Bruijn graphs. C .
Left-contraction from non-right-maximal
Belazzougui, Cunial. Fully-functional bidirectional Burrows-Wheeler indexes and infinite-order de Bruijn graphs. CPM .
Suffix link
Checking if a no e i a axi al e ea
eng h of a axi al e ea
eigh e le el ance o ue ie
on axi al e ea u g a h
ee t
word
Pruning the ST topology
ST
...
Size of the pruned topology
Run-length encoded takes wo ds
he e a e at ost ed nodes
CCCCCCCCCCCCCCCCCCCCCCCAAAAAAAAAACCCAAAAA
Suffix link with the pruned topolo
CG CGCG
G GG
C
C
C
C
C
C
C
C
C
(a) (b) ( )
n h n ed
Suffix link with the pruned topolo
ST
Suffix link with the pruned topolo
ST
Suffix link with the pruned topolo
ST
Left-contraction from non-right-maximal
parent of a red node
Supported by
Suffix l n
e n f a node a ax al repeat
en t of a ax al repeat
e ted le el an e tor uer e
on ax al repeat ub rap
o e t ax al repeat an e tor
ee t
Computing the reverse interval
Reduces to implementing
on the reverse RLBWT (from a maximal repeat W),
and exploiting the isomorphism of subtrees of ST.
in RL T can be implemented via
from a maximal repeat can also be implemented
in RL T.
Nontrivial only when W is left maximal but not right maximal
by traversing the for r maximal repeat tree top down
time, words.
word
Compressing the pruned topology
CCCAAACCCCGTTTCAAAAACCCAAACCCCC
Size of the compressed topolog
might still contain unary paths of black nodes.
Every node in such a path can be charged to a
distinct run boundary.
The deepest nodes in (black) are rightmost
maximal repeats, so they can be charged to distinct
run boundaries.
Thus, there are black nodes with more than
one maximal repeat child.
Every blue node can be charged to its black
descendant.
Red nodes can still be charged to a constant
number of runs or of blue/black nodes.
Run-length encoded takes words
Suffix link with the comp e e topolo
After the LCA, we might end u in ue n de
ut we w nt the inter f the highe t n de
in ST with tring de th t e t
Let den te thi r em with tu e
words
In t s tr
ST
words
In t s tr
STST Weine in
words
In t s tr
STST Weine in
words
Hig s od d
ST
words
Hig s od d
STST Weine in
words
Hig s od d
STST Weine in
words
Weine in
Hig s od d
STST
words
Several other cases are possible
not left-maximal
is blue
is a subpath of a blue path
straddles multiple blue paths
straddles itself
It can be shown that all cases can be handled with
time
After at most H Weiner links, every maximal repeat loses
its right-maximality permanently.
So the length of the ST path between the first interval of an instance
and the solution interval becomes zero.
Suffix link with the comp e e topolo
After the LCA from interval , e mi ht en in a
re no e of that i the hil of a l e no e
t e ant the interval of the re no e of that
ontain , an of it l e arent
Can e olve ith a imilar re r ive ro e re
Computing the reverse interval
Nontrivial only when W is left- a i al t not ri ht- a i al
an the lo s of W is a l e no e of
i le sol tion iss e a nown n er of
eries on the reverse in e
ti e
till nee s on the reverse in e
fro a a i al re eat W.
ti e
Unidirectional operations
TimeSpace ordsor ard direction
(from maximal repeat)
(needs ID of longest
left-maximal suffix)
amortized
Application: variable-order de Bruijn graph
that supports just one direction
Smaller fully-functional
bi irectional in e e
Djamal Belazzougui1
Fabio Cunial2,3
1 DTISI, CERIST, Algiers.
2 Max Plan Ins i u e or Mole ular Cell Biolog an ene i s, Dres en.
3 Cen er or S s ems Biolog Dres en

More Related Content

Similar to Smaller fully-functional bidirectional BWT indexes

cupdf.com_control-chap7.ppt
cupdf.com_control-chap7.pptcupdf.com_control-chap7.ppt
cupdf.com_control-chap7.ppt
Aarthi Venkatesh N
 
Space-efficient detection of unusual words
Space-efficient detection of unusual wordsSpace-efficient detection of unusual words
Space-efficient detection of unusual words
Fabio Cunial
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
avrilcoghlan
 
CS M4.ppt
CS M4.pptCS M4.ppt
CS M4.ppt
OpSmk
 
32 -longest-common-prefix
32 -longest-common-prefix32 -longest-common-prefix
32 -longest-common-prefix
Sanjeev Gupta
 
99995069.ppt
99995069.ppt99995069.ppt
99995069.ppt
AbitiEthiopia
 
G6 m3-c-lesson 18-t
G6 m3-c-lesson 18-tG6 m3-c-lesson 18-t
G6 m3-c-lesson 18-t
mlabuski
 
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
University of California, San Diego
 
TRIANGLE-MIDLINE-THEOREM-TRAPEZOID-KITE.pdf
TRIANGLE-MIDLINE-THEOREM-TRAPEZOID-KITE.pdfTRIANGLE-MIDLINE-THEOREM-TRAPEZOID-KITE.pdf
TRIANGLE-MIDLINE-THEOREM-TRAPEZOID-KITE.pdf
TreshaBahandi
 
Transmission lines
Transmission linesTransmission lines
Transmission lines
Suneel Varma
 
Lecture 5b graphs and hashing
Lecture 5b graphs and hashingLecture 5b graphs and hashing
Lecture 5b graphs and hashing
Victor Palmar
 
23Network FlowsAuthor Arthur M. Hobbs, Department of .docx
23Network FlowsAuthor Arthur M. Hobbs, Department of .docx23Network FlowsAuthor Arthur M. Hobbs, Department of .docx
23Network FlowsAuthor Arthur M. Hobbs, Department of .docx
eugeniadean34240
 
Lecture - 16.ppsx
Lecture - 16.ppsxLecture - 16.ppsx
Lecture - 16.ppsx
ThegreatP
 
NITheP WITS node seminar: Prof Jacob Sonnenschein (Tel Aviv University) TITLE...
NITheP WITS node seminar: Prof Jacob Sonnenschein (Tel Aviv University) TITLE...NITheP WITS node seminar: Prof Jacob Sonnenschein (Tel Aviv University) TITLE...
NITheP WITS node seminar: Prof Jacob Sonnenschein (Tel Aviv University) TITLE...
Rene Kotze
 
de Bruijn Graph Construction from Combination of Short and Long Reads
de Bruijn Graph Construction from Combination of Short and Long Readsde Bruijn Graph Construction from Combination of Short and Long Reads
de Bruijn Graph Construction from Combination of Short and Long Reads
Sikder Tahsin Al-Amin
 
A superglue for string comparison
A superglue for string comparisonA superglue for string comparison
A superglue for string comparison
BioinformaticsInstitute
 
Control chap7
Control chap7Control chap7
Control chap7
Mohd Ashraf Shabarshah
 
Elhabian_curves10.pdf
Elhabian_curves10.pdfElhabian_curves10.pdf
Elhabian_curves10.pdf
Karthik Kavuri
 
Estimating the economic quantities of different concrete slab types
Estimating the economic quantities of different concrete slab typesEstimating the economic quantities of different concrete slab types
Estimating the economic quantities of different concrete slab types
Ahmed Ebid
 
F1303023038
F1303023038F1303023038
F1303023038
IOSR Journals
 

Similar to Smaller fully-functional bidirectional BWT indexes (20)

cupdf.com_control-chap7.ppt
cupdf.com_control-chap7.pptcupdf.com_control-chap7.ppt
cupdf.com_control-chap7.ppt
 
Space-efficient detection of unusual words
Space-efficient detection of unusual wordsSpace-efficient detection of unusual words
Space-efficient detection of unusual words
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
 
CS M4.ppt
CS M4.pptCS M4.ppt
CS M4.ppt
 
32 -longest-common-prefix
32 -longest-common-prefix32 -longest-common-prefix
32 -longest-common-prefix
 
99995069.ppt
99995069.ppt99995069.ppt
99995069.ppt
 
G6 m3-c-lesson 18-t
G6 m3-c-lesson 18-tG6 m3-c-lesson 18-t
G6 m3-c-lesson 18-t
 
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
 
TRIANGLE-MIDLINE-THEOREM-TRAPEZOID-KITE.pdf
TRIANGLE-MIDLINE-THEOREM-TRAPEZOID-KITE.pdfTRIANGLE-MIDLINE-THEOREM-TRAPEZOID-KITE.pdf
TRIANGLE-MIDLINE-THEOREM-TRAPEZOID-KITE.pdf
 
Transmission lines
Transmission linesTransmission lines
Transmission lines
 
Lecture 5b graphs and hashing
Lecture 5b graphs and hashingLecture 5b graphs and hashing
Lecture 5b graphs and hashing
 
23Network FlowsAuthor Arthur M. Hobbs, Department of .docx
23Network FlowsAuthor Arthur M. Hobbs, Department of .docx23Network FlowsAuthor Arthur M. Hobbs, Department of .docx
23Network FlowsAuthor Arthur M. Hobbs, Department of .docx
 
Lecture - 16.ppsx
Lecture - 16.ppsxLecture - 16.ppsx
Lecture - 16.ppsx
 
NITheP WITS node seminar: Prof Jacob Sonnenschein (Tel Aviv University) TITLE...
NITheP WITS node seminar: Prof Jacob Sonnenschein (Tel Aviv University) TITLE...NITheP WITS node seminar: Prof Jacob Sonnenschein (Tel Aviv University) TITLE...
NITheP WITS node seminar: Prof Jacob Sonnenschein (Tel Aviv University) TITLE...
 
de Bruijn Graph Construction from Combination of Short and Long Reads
de Bruijn Graph Construction from Combination of Short and Long Readsde Bruijn Graph Construction from Combination of Short and Long Reads
de Bruijn Graph Construction from Combination of Short and Long Reads
 
A superglue for string comparison
A superglue for string comparisonA superglue for string comparison
A superglue for string comparison
 
Control chap7
Control chap7Control chap7
Control chap7
 
Elhabian_curves10.pdf
Elhabian_curves10.pdfElhabian_curves10.pdf
Elhabian_curves10.pdf
 
Estimating the economic quantities of different concrete slab types
Estimating the economic quantities of different concrete slab typesEstimating the economic quantities of different concrete slab types
Estimating the economic quantities of different concrete slab types
 
F1303023038
F1303023038F1303023038
F1303023038
 

Recently uploaded

Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
BIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROIDBIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROID
ShibsekharRoy1
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
PirithiRaju
 
Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
QusayMaghayerh
 
Male reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptxMale reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptx
suyashempire
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
RAYMUNDONAVARROCORON
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
sandertein
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
sammy700571
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
lucianamillenium
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
Sérgio Sacani
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
shubhijain836
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
Advanced-Concepts-Team
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
goluk9330
 

Recently uploaded (20)

Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
BIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROIDBIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROID
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
 
Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
 
Male reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptxMale reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptx
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
 

Smaller fully-functional bidirectional BWT indexes

  • 1. Smaller fully-functional bi irectional in e e Djamal Belazzougui1 Fabio Cunial2,3 1 DTISI, CERIST, Algiers. 2 Max Plan Ins i u e or Mole ular Cell Biolog an ene i s, Dres en. 3 Cen er or S s ems Biolog Dres en
  • 2. Constant-space descriptor of a string W: Bidirectional index (synchrono s pp ication: aria e-order de r n grap encoding a orders and a fre encies si taneo s
  • 3. Time PracticalSpace [1] [1] and [2] [1] constant wo ds ts wo ds wo ds wo ds [1] Belazzougui, Cunial. Fully-functional bidirectional Burrows-Wheeler indexes and infinite-order de Bru n gra hs. C 2 1 . [2] agie, a arro, rezza. Fully functional suffix trees and o ti al text searching in BW -runs bounded s ace. C 2 2 . Ne Ne Bidirectional inde nc rono
  • 4. C A A•# # # # # # C C• C•A A A A C• A C ST A A • • • C• • Maximal re ea = 8
  • 5. C A A•# # # # # # C C• C•A A A A C• A C ST A A • • • C• • Right-extensions of maximal re eats ≥ ≥
  • 6. C A A•# # # # # # C C• C•A A A A C• A C ST A A • • • C• • Depth of the m m epe t t ee = 3
  • 7. C A A•# # # # # # C C• C•A A A A C• A C ST A A • • • C• • String depth of the maximal repeat tree = 6
  • 8. C A A•# # # # # # C C• C•A A A A C• A C ST A A • • • C• • Frontier m im re e t
  • 9. C A A•# # # # # # C C• C•A A A A C• A C ST A A • • • C• • Rightmost m im ts
  • 11. C A A•# # # # # # C C• C•A A A A C• A C ST A A • • • C• • Left-contraction from right-maximal
  • 12. C A A•# # # # # # C C• C•A A A A C• A C ST A A • • • C• • Suffix link
  • 13. Suffix link CG CGCG G GG C C C C C C C C C (a) (b) ( )
  • 14. Left-contraction from non-right-maximal C A A•# # # # # # C C• C•A A A A C• A C ST A A • • • C• • Belazzougui, Cunial. Fully-functional bidirectional Burrows-Wheeler indexes and infinite-order de Bruijn graphs. C .
  • 15. Left-contraction from non-right-maximal C A A•# # # # # # C C• C•A A A A C• A C ST A A • • • C• • Belazzougui, Cunial. Fully-functional bidirectional Burrows-Wheeler indexes and infinite-order de Bruijn graphs. C .
  • 16. Left-contraction from non-right-maximal Belazzougui, Cunial. Fully-functional bidirectional Burrows-Wheeler indexes and infinite-order de Bruijn graphs. CPM . Suffix link Checking if a no e i a axi al e ea eng h of a axi al e ea eigh e le el ance o ue ie on axi al e ea u g a h ee t
  • 17. word
  • 18. Pruning the ST topology ST ...
  • 19. Size of the pruned topology Run-length encoded takes wo ds he e a e at ost ed nodes CCCCCCCCCCCCCCCCCCCCCCCAAAAAAAAAACCCAAAAA
  • 20. Suffix link with the pruned topolo CG CGCG G GG C C C C C C C C C (a) (b) ( ) n h n ed
  • 21. Suffix link with the pruned topolo ST
  • 22. Suffix link with the pruned topolo ST
  • 23. Suffix link with the pruned topolo ST
  • 24. Left-contraction from non-right-maximal parent of a red node Supported by Suffix l n e n f a node a ax al repeat en t of a ax al repeat e ted le el an e tor uer e on ax al repeat ub rap o e t ax al repeat an e tor ee t
  • 25. Computing the reverse interval Reduces to implementing on the reverse RLBWT (from a maximal repeat W), and exploiting the isomorphism of subtrees of ST. in RL T can be implemented via from a maximal repeat can also be implemented in RL T. Nontrivial only when W is left maximal but not right maximal by traversing the for r maximal repeat tree top down time, words.
  • 26. word
  • 28. CCCAAACCCCGTTTCAAAAACCCAAACCCCC Size of the compressed topolog might still contain unary paths of black nodes. Every node in such a path can be charged to a distinct run boundary. The deepest nodes in (black) are rightmost maximal repeats, so they can be charged to distinct run boundaries. Thus, there are black nodes with more than one maximal repeat child. Every blue node can be charged to its black descendant. Red nodes can still be charged to a constant number of runs or of blue/black nodes. Run-length encoded takes words
  • 29. Suffix link with the comp e e topolo After the LCA, we might end u in ue n de ut we w nt the inter f the highe t n de in ST with tring de th t e t Let den te thi r em with tu e
  • 30. words In t s tr ST
  • 31. words In t s tr STST Weine in
  • 32. words In t s tr STST Weine in
  • 34. words Hig s od d STST Weine in
  • 35. words Hig s od d STST Weine in
  • 36. words Weine in Hig s od d STST
  • 37. words Several other cases are possible not left-maximal is blue is a subpath of a blue path straddles multiple blue paths straddles itself It can be shown that all cases can be handled with time After at most H Weiner links, every maximal repeat loses its right-maximality permanently. So the length of the ST path between the first interval of an instance and the solution interval becomes zero.
  • 38. Suffix link with the comp e e topolo After the LCA from interval , e mi ht en in a re no e of that i the hil of a l e no e t e ant the interval of the re no e of that ontain , an of it l e arent Can e olve ith a imilar re r ive ro e re
  • 39. Computing the reverse interval Nontrivial only when W is left- a i al t not ri ht- a i al an the lo s of W is a l e no e of i le sol tion iss e a nown n er of eries on the reverse in e ti e till nee s on the reverse in e fro a a i al re eat W. ti e
  • 40. Unidirectional operations TimeSpace ordsor ard direction (from maximal repeat) (needs ID of longest left-maximal suffix) amortized Application: variable-order de Bruijn graph that supports just one direction
  • 41. Smaller fully-functional bi irectional in e e Djamal Belazzougui1 Fabio Cunial2,3 1 DTISI, CERIST, Algiers. 2 Max Plan Ins i u e or Mole ular Cell Biolog an ene i s, Dres en. 3 Cen er or S s ems Biolog Dres en