SlideShare a Scribd company logo
1 of 26
Pioneering
                                 Scientific Intelligence




DNA/Small RNA Alignment
in Avadis NGS 1.3

Strictly Confidential   © Strand Life Sciences
How does CoBWeb compare with other
 What is an Alignment algorithm?                  algorithms?

  What issues must an Alignment         How is CoBWeb exposed in Avadis
      algorithm consider?                           NGS?

                                         What is the future evolution of
How do Alignment algorithms work?                  CoBWeb?



    How does CoBWeb work?



        Questions we will seek to answer in this presentation




                                                       © Strand
What is an Alignment algorithm?




                            © Strand
Subject’s
                                          Genome
AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC




AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC
                                          Reference
                                        Genome, close
                                         but not quite
                                       the same as the
                                           Subject’s
                                           Genome



                                    © Strand
What issues must an Alignment
    algorithm consider?




                           © Strand
Mismatches and
     Gaps
                                      Reference
                                       Genome




Deletion




             Reads
                     SNP
                           © Strand
Handling paired
    reads
                                           Subject’s
                                           Genome




                                    ×

                                              Reference
                                               Genome
                  Repeat   Repeat
                  Region   Region




                                        © Strand
A variety of
Read Lengths

                Short reads
                 ~50, few
                mismatches
                 and gaps

                                               Long
                                            reads, few
                                           hundreds to
                                         thousands, ma
                                             ny more
                                           mismatches
                                             and gaps




                              © Strand
Speed and
 Memory




                   Run in 4GB
                     RAM          Allow use of
                                    multiple
     Billions of                 cores/process
      reads.                          ors
                   Scale speed
                    with more
                     memory




                                   © Strand
How do Alignment algorithms work?




                             © Strand
Indexing the
    Genome to find
    Seed Matches                                          Scanning the
                                                         Reference for
                                                           each Read
                                                         takes too long




                      The Reference
                          Index
                                                   The Index very
                                                    quickly yields
                                                   locations in the
                                                  Reference where
                                                 some part (seed) of
                                                 the Read matches.
This Seed occurs at        This Seed occurs at
Reference locations        Reference locations
      x1, x2…                    x3, x4…


                                                   © Strand
Detailed
 Alignment at
 Seed Match
  Locations


                                 Seed
Reference                        Match




                                            Read




        How many Mismatches
        and Gaps are needed
         for the Read to match
           around the Seed?
          Smith-Waterman or
        Dynamic Programming




                                 © Strand
The Burrows-
Wheeler based
   Index

                          The original
                          Reference
                                             C    G      A      C    $
       All its circular
       shifts, sorted                        A    C      $      C    G              This column is
                                         2                                            the BWT
     lexicographically
                                         0   C    G      A      C    $
                                         3   C    $      C      G    A
                                         1   G    A      C      $    C
  Circular Shift
     Indices                             4   $    C      G      A    C



                                                     The Index
   These can be sampled                           comprises these
     to fit into reduced                          along with some
   memory at the expense                         housekeeping data
      of speed without                               structures
   sacrificing correctness


                                                                         © Strand
The Burrows-
Wheeler based
   Index




                                            EXACT
      Reference                             Match




                                                    Read




        All Exact Matches of a Read (NO
           Mismatches or Gaps) in the
        Reference can be found in time
        proportional to the length of the
        Read and largely independent of
            the size of the Reference.




                                             © Strand
How does CoBWeb work?




                        © Strand
Seeding
Strategy




     This 15-mer occurs   This 15-mer occurs
         at locations         at locations
           x1, x2…              x3, x4…              This whole 30-mer
                                                     occurs at location
                                                            x5
   Use the BW based
   index, augmented
  with additional data
     structures for
  speed, to find one or
    more Long Seed
     Matches in the
       Reference
                               Justification: Most long
                                  Reads do not have
                               Mismatches and Gaps
                             strewn across their length;            And Long Seeds
                                there are usually long               will have few
                                 stretches that match              matching locations.
                                        exactly.
                                                            © Strand
Advantages




                                   Separating the Smith-
          Seed length is not       Waterman phase from
        specified in advance, so   the BW Index search
       Long and Short reads can     allows an unlimited
        be handled seamlessly.      number of gaps and
                                        mismatches.




                                                     © Strand
How does CoBWeb compare with other
            algorithms?




                             © Strand
Comparison
 with BWA                    CoBWeb:
                                94%                BWA: 4%
                             Alignment           error + 1 gap
                  Read      Score with up         of possibly
                Length 50    to 2 Gaps           multiple length




               Read
             Length 150




                                             A little faster than
                                                  BWA with
                                            comparable results


                                                © Strand
How is CoBWeb exposed in Avadis
            NGS?




                            © Strand
Entry




             Two new experiment
            types, DNA Alignment
               and Small-RNA
                  Alignment




        © Strand
The Alignment
  Workflow




                Run Alignment, and then
                create a DNA Variant or
                 ChIP-Seq Experiment
                   from the results.




                          © Strand
Specify number of
 Alignment     Mismatches and
Parameters   Gaps, and handling of
              Multiple Matching.




                      Specify Adaptor
                  Trimming (only for Small
                  RNA) and 3’,5’ trimming
                      based on quality




                     Screen against
                 Contaminant Databases.




                © Strand
What is the future evolution of
          CoBWeb?




                             © Strand
ToDos




        Chimeric
         Reads
                          RNA-Seq
                          Alignment




                   Base Quality
                   recalibration


                                      Affine Gap
                                        Costs




                                                   © Strand
http://www.avadis-ngs.com




                      © Strand

More Related Content

More from Strand Life Sciences Pvt Ltd (12)

Strand genomics features in CIO review
Strand genomics features in CIO reviewStrand genomics features in CIO review
Strand genomics features in CIO review
 
Rules of a Quantum World
Rules of  a Quantum WorldRules of  a Quantum World
Rules of a Quantum World
 
Least common ancestors in constant time
Least common ancestors in constant timeLeast common ancestors in constant time
Least common ancestors in constant time
 
Introduction to statistics iii
Introduction to statistics iiiIntroduction to statistics iii
Introduction to statistics iii
 
Introduction to statistics ii
Introduction to statistics iiIntroduction to statistics ii
Introduction to statistics ii
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Dynamic programming for simd
Dynamic programming for simdDynamic programming for simd
Dynamic programming for simd
 
Complex numbers polynomial multiplication
Complex numbers polynomial multiplicationComplex numbers polynomial multiplication
Complex numbers polynomial multiplication
 
Converting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional OnesConverting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional Ones
 
Searching using Quantum Rules
Searching using Quantum RulesSearching using Quantum Rules
Searching using Quantum Rules
 
Randomized algorithms
Randomized algorithmsRandomized algorithms
Randomized algorithms
 
Suffix arrays
Suffix arraysSuffix arrays
Suffix arrays
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 

Alignment of raw reads in Avadis NGS

  • 1. Pioneering Scientific Intelligence DNA/Small RNA Alignment in Avadis NGS 1.3 Strictly Confidential © Strand Life Sciences
  • 2. How does CoBWeb compare with other What is an Alignment algorithm? algorithms? What issues must an Alignment How is CoBWeb exposed in Avadis algorithm consider? NGS? What is the future evolution of How do Alignment algorithms work? CoBWeb? How does CoBWeb work? Questions we will seek to answer in this presentation © Strand
  • 3. What is an Alignment algorithm? © Strand
  • 4. Subject’s Genome AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC Reference Genome, close but not quite the same as the Subject’s Genome © Strand
  • 5. What issues must an Alignment algorithm consider? © Strand
  • 6. Mismatches and Gaps Reference Genome Deletion Reads SNP © Strand
  • 7. Handling paired reads Subject’s Genome × Reference Genome Repeat Repeat Region Region © Strand
  • 8. A variety of Read Lengths Short reads ~50, few mismatches and gaps Long reads, few hundreds to thousands, ma ny more mismatches and gaps © Strand
  • 9. Speed and Memory Run in 4GB RAM Allow use of multiple Billions of cores/process reads. ors Scale speed with more memory © Strand
  • 10. How do Alignment algorithms work? © Strand
  • 11. Indexing the Genome to find Seed Matches Scanning the Reference for each Read takes too long The Reference Index The Index very quickly yields locations in the Reference where some part (seed) of the Read matches. This Seed occurs at This Seed occurs at Reference locations Reference locations x1, x2… x3, x4… © Strand
  • 12. Detailed Alignment at Seed Match Locations Seed Reference Match Read How many Mismatches and Gaps are needed for the Read to match around the Seed? Smith-Waterman or Dynamic Programming © Strand
  • 13. The Burrows- Wheeler based Index The original Reference C G A C $ All its circular shifts, sorted A C $ C G This column is 2 the BWT lexicographically 0 C G A C $ 3 C $ C G A 1 G A C $ C Circular Shift Indices 4 $ C G A C The Index These can be sampled comprises these to fit into reduced along with some memory at the expense housekeeping data of speed without structures sacrificing correctness © Strand
  • 14. The Burrows- Wheeler based Index EXACT Reference Match Read All Exact Matches of a Read (NO Mismatches or Gaps) in the Reference can be found in time proportional to the length of the Read and largely independent of the size of the Reference. © Strand
  • 15. How does CoBWeb work? © Strand
  • 16. Seeding Strategy This 15-mer occurs This 15-mer occurs at locations at locations x1, x2… x3, x4… This whole 30-mer occurs at location x5 Use the BW based index, augmented with additional data structures for speed, to find one or more Long Seed Matches in the Reference Justification: Most long Reads do not have Mismatches and Gaps strewn across their length; And Long Seeds there are usually long will have few stretches that match matching locations. exactly. © Strand
  • 17. Advantages Separating the Smith- Seed length is not Waterman phase from specified in advance, so the BW Index search Long and Short reads can allows an unlimited be handled seamlessly. number of gaps and mismatches. © Strand
  • 18. How does CoBWeb compare with other algorithms? © Strand
  • 19. Comparison with BWA CoBWeb: 94% BWA: 4% Alignment error + 1 gap Read Score with up of possibly Length 50 to 2 Gaps multiple length Read Length 150 A little faster than BWA with comparable results © Strand
  • 20. How is CoBWeb exposed in Avadis NGS? © Strand
  • 21. Entry Two new experiment types, DNA Alignment and Small-RNA Alignment © Strand
  • 22. The Alignment Workflow Run Alignment, and then create a DNA Variant or ChIP-Seq Experiment from the results. © Strand
  • 23. Specify number of Alignment Mismatches and Parameters Gaps, and handling of Multiple Matching. Specify Adaptor Trimming (only for Small RNA) and 3’,5’ trimming based on quality Screen against Contaminant Databases. © Strand
  • 24. What is the future evolution of CoBWeb? © Strand
  • 25. ToDos Chimeric Reads RNA-Seq Alignment Base Quality recalibration Affine Gap Costs © Strand