SlideShare a Scribd company logo
Velvet / Curtain
Matthias Haimel




                   EBI is an Outstation of the European Molecular Biology Laboratory.



2   25.04.11   Velvet / Curtain
Overview
    • De Bruijn Graph
    • Velvet
               • Theory
               • Practice
    • Data formats and quality
    • Velvet
               • Simulation data
               • Multiple insert lengths
    • Curtain
               • Theory
               • Practice


3   25.04.11                Velvet / Curtain
De Bruijn graph
    • A concept in combinatorial mathematics
               • In combinatorics, de bruijn graph is usually fully connected
               • http://en.wikipedia.org/wiki/De_Bruijn_graph
    • de bruijn sequence
               • Related concept
               • Path through graph




    • Velvet
               • de Bruijn inspired graph structure




4   25.04.11              Velvet / Curtain
De Bruijn graph (Velvet)
    • Representation of
               • a sequence based on short words (k-mers)
               • overlaps between words
    • K-mer: word of length k
    • K=5
                                               GCCTTCCA
               • k-1 overlap


    GCCTT                                   GCCTT           GCCTT
     CCTTC                                   CCTTC           CCTTC
                                               CTTCC           CTTCC
                                                                TTCCA
                                                                  ...
    GCCTTCCA                                GCCTTCCA        GCCTTCCA

5   25.04.11             Velvet / Curtain
De Bruijn graph (Velvet)
                            GCCTTCCAATTT
                            GCCTTCAAATTT


                      C                A
                  CTTC             TTCC    .....
                                                   CAATT
        T
     CCT TC
    G CT
     C                                                     AATTT
                     A                 A
                 CTTC              TTCA    .....   AAATT




6    25.04.11   Velvet / Curtain
De Bruijn graph representations (Velvet)
                                                TTCA
                                         ATTC          TCAG
    Error free, no repeat,
    no polymorphism



    Repeat > kmer length



    SNP, variant, < kmer length



    Structural variant, inversion
    Structural variant, deletion…
    …


7   25.04.11          Velvet / Curtain
Example
                      TAGTCGAGGCTTTAGATCCGATGAGGCTTTAGAGACAG
                       AGTCGAG CTTTAGA CGATGAG CTTTAGA
                        GTCGAGG TTAGATC ATGAGGC      GAGACAG
                           GAGGCTC   ATCCGAT AGGCTTT GAGACAG
                       AGTCGAG    TAGATCC ATGAGGC TAGAGAA
                      TAGTCGA CTTTAGA CCGATGA     TTAGAGA
                          CGAGGCT AGATCCG TGAGGCT AGAGACA
                      TAGTCGA GCTTTAG TCCGATG GCTCTAG
                         TCGACGC    GATCCGA GAGGCTT AGAGACA
                      TAGTCGA    TTAGATC GATGAGG TTTAGAG
                        GTCGAGG TCTAGAT   ATGAGGC TAGAGAC
                            AGGCTTT ATCCGAT AGGCTTT GAGACAG
                       AGTCGAG   TTAGATT ATGAGGC    AGAGACA
                             GGCTTTA TCCGATG     TTTAGAG
                          CGAGGCT TAGATCC TGAGGCT    GAGACAG
                       AGTCGAG TTTAGATC ATGAGGC TTAGAGA
                           GAGGCTT GATCCGA GAGGCTT GAGACAG


8   25.04.11   Velvet / Curtain
Example

               Read: GTCGAGG




                   GTCG
                   (1x)




9   25.04.11              Velvet / Curtain
Example

                Read: GTCGAGG




                    GTCG     TCGA
                    (1x)     (1x)




10   25.04.11              Velvet / Curtain
Example

                Read: GTCGAGG




                    GTCG     TCGA      CGAG
                    (1x)     (1x)      (1x)




11   25.04.11              Velvet / Curtain
Example

                Read: GTCGAGG




                    GTCG     TCGA      CGAG   GAGG
                    (1x)     (1x)      (1x)   (1x)




12   25.04.11              Velvet / Curtain
Example

         New read: CGAGGCT




                GTCG     TCGA      CGAG   GAGG
                (1x)     (1x)      (2x)   (1x)




13   25.04.11          Velvet / Curtain
Example

                Read: CGAGGCT




                    GTCG     TCGA      CGAG   GAGG
                    (1x)     (1x)      (2x)   (2x)




14   25.04.11              Velvet / Curtain
Example

                Read: CGAGGCT




                    GTCG     TCGA      CGAG   GAGG   AGGC
                    (1x)     (1x)      (2x)   (2x)   (1x)




15   25.04.11              Velvet / Curtain
Example

                Read: CGAGGCT




                    GTCG     TCGA      CGAG   GAGG   AGGC   GGCT
                    (1x)     (1x)      (2x)   (2x)   (1x)   (1x)




16   25.04.11              Velvet / Curtain
Example

                New read: TCGACGC




                    GTCG     TCGA      CGAG   GAGG   AGGC
                    (1x)     (2x)      (2x)   (2x)   (1x)




17   25.04.11              Velvet / Curtain
Example

                Read: TCGACGC




                    GTCG     TCGA      CGAG   GAGG   AGGC
                    (1x)     (2x)      (2x)   (2x)   (1x)



                                       CGAC   GACG   ACGC
                                       (1x)   (1x)   (1x)




18   25.04.11              Velvet / Curtain
Example

                   etc…
                                                                                                                GATT
                                                                                                                (1x)




                                                        TGAG     ATGA   GATG   CGAT   CCGA   TCCG     ATCC     GATC     AGAT
                                                        (9x)     (8x)   (5x)   (6x)   (7x)   (7x)     (7x)     (8x)     (8x)

                                                                                                                                              AGAA
                                                                                                                                              (1x)

                                                                                   GCTC   CTCT      TCTA     CTAG
                                                                                   (2x)   (1x)      (2x)     (2x)

                TAGT   AGTC   GTCG     TCGA      CGAG    GAGG      AGGC    GGCT                                       TAGA     AGAG   GAGA    AGAC   GACA   ACAG
                (3x)   (7x)   (9x)     (10x)     (8x)    (16x)     (16x)   (11x)                                      (16x)    (9x)   (12x)   (9x)   (8x)   (5x)
                                                                                   GCTT   CTTT      TTTA     TTAG
                                                                                   (8x)   (8x)      (8x)     (12x)
                                                 CGAC    GACG       ACGC
                                                 (1x)    (1x)       (1x)




19   25.04.11                        Velvet / Curtain
Example

                  After simplification…


                                                      GATT
                                                                  AGAT

                                                  GATCCGATGAG                             AGAA
                                                                GCTCTAG
                TAGTCGA    CGAG

                                             GAGGCT    GGCT               TAGA   AGAGA   AGACAG
                                                                GCTTTAG
                          CGACGC




20   25.04.11             Velvet / Curtain
Example

                  Tips removed…


                                                                  AGAT

                                                  GATCCGATGAG
                                                                GCTCTAG
                TAGTCGA    CGAG

                                             GAGGCT    GGCT               TAGA   AGAGA   AGACAG
                                                                GCTTTAG




21   25.04.11             Velvet / Curtain
De Bruijn graph biology extensions (Velvet)
     • Handling of reverse strand
                • DNA is read in two directions
                • Paired-end data
     • Handling small differences, which are “uninteresting”
                • Errors in sequencing technology
     • Memory
                • regularly use 80, 100GB real memory
                • easily get to 1TB real memory requirements




22   25.04.11             Velvet / Curtain
Read variety
     • Short reads                      ~75bp
                • Illumina / Solexa
                • SOLiD (colour space)
     • Long reads                     500-1000 bp
                • 454 read
                • Sanger capillary reads
     • Paired-end reads
                • Short reads
                • short insert length
     • Mate pair reads
                • Short reads
                • long insert length

23   25.04.11              Velvet / Curtain
Paired-End




                                    Mate Pair




24   25.04.11   Velvet / Curtain
Short paired-end / mate pair reads


                                     ?

Velvet expect Illumina paired-end orientation: (L-> <-R)

                    L                              R       paired-end




25   25.04.11     Velvet / Curtain
Short paired-end / mate pair reads


Illumina mate-pair orientation: (<-L R->)
                     L                                     R
                                                               mate pair

                                      reverse complement



                     L                                     R   paired-end




26   25.04.11      Velvet / Curtain
Velvet algorithms
     • Remove Bubbles
                • Tour Bus




     • Velvet parameters
                • -max_branch_length
                • -max_divergence
                • -max_gap_count




27   25.04.11            Velvet / Curtain
Example




                                                                          AGAT

                                                  GATCCGATGAG
                                                                        GCTCTAG
                TAGTCGA    CGAG

                                             GAGGCT    GGCT                               TAGA         AGAGA   AGACAG
                                                                        GCTTTAG



                                                                 GCTC    CTCT   TCTA   CTAG
                                                                 (2x)    (1x)   (2x)   (2x)

                                                         GGCT                                  TAGA
                                                         (11x)                                 (16x)
                                                                 GCTT    CTTT   TTTA   TTAG
                                                                 (8x)    (8x)   (8x)   (12x)


28   25.04.11             Velvet / Curtain
Example

                  Bubbles removed… by TourBus


                                                                 AGAT

                                                  GATCCGATGAG


                TAGTCGA    CGAG

                                             GAGGCT    GGCT     GCTTTAG   TAGA   AGAGA   AGACAG




29   25.04.11             Velvet / Curtain
Example

                Final simplification…


                                          AGATCCGATGAG



                      TAGTCGAG            GAGGCTTTAGA    AGAGACAG




30   25.04.11          Velvet / Curtain
Example
                              TAGTCGAGGCTTTAGATCCGATGAGGCTTTAGAGACAG


                Final simplification…


                                          AGATCCGATGAG



                      TAGTCGAG            GAGGCTTTAGA     AGAGACAG


     One possible walk through the graph ...
                              TAGTCGAG
                                   GAGGCTTTAGA
                                           AGATCCGATGAG
                                                    GAGGCTTTAGA
                                                            AGAGACAG

31   25.04.11          Velvet / Curtain
N50
     • Total

     • N90



     • N50



     • N10




32    25.04.11   Velvet / Curtain
N50
     • Total
          • 4,295,113bp
     • N90
          • 439bp


     • N50
          • 3,119bp


     • N10
          • 13,519bp




33    25.04.11         Velvet / Curtain
N50
     • N50 is the length of the smallest contig
                • contains the fewest (largest) contigs
                • combined length represents at least 50% of the assembly
     • N10
                • > 10 % of the largest contigs


           http://www.broadinstitute.org/crd/wiki/index.php/N50




34   25.04.11              Velvet / Curtain
Velvet practical: Part 1
     • Compile
     • Single end (ERX001300)
                • K-mer length
                • Coverage cut-offs
     • Whole genome sequence as input???
                • Staphylococcus aureus MRSA252




35   25.04.11             Velvet / Curtain
Velvet algorithms
     • Long read information
                • Rock Band




     • Velvet parameters
                • -long_mult_cutoff




36   25.04.11             Velvet / Curtain
Velvet algorithms
     • Paired-end information
                • Pebble




     • Velvet parameters
                • -min_pair_count




                                         Once all distances and variance computed,
                                         Simple greedy extension from main contigs out



37   25.04.11              Velvet / Curtain
Paired-end in Velvet
     • Hugely improves quality of assembly
     • Insert length greater than repeat
                • greater than the length of the most common genomic repeat
     • Mixed insert length improves results
                • Short: helps for local assembly
                • Long: get over repeats
     • Large genomes
                • Very memory intensive
                • Calculation intensive




38   25.04.11              Velvet / Curtain
Data formats and quality
     • Fasta                                            • Fastq
                • .fasta                                  • .fastq
                • .fa                                     • .fq
                • ?                                       • ?
                                              Header

                   >read_1                                @SEQ_ID
                   TATAATATTTAT...                        GATTTGGGGTTCAAAGC
                                       Sequence           +
                                                          !''*((((***+))%%%

                                              Quality




39   25.04.11              Velvet / Curtain
FASTQ paired
                             @SRR022863.1.F
                             ATATAGATGTACATAAATTAGTTGAAGTATATGAACG
                             +
     .F .R                   IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIAIIII
     /1 /2                   @SRR022863.1.R
                             TTCACCCATTTTATCCATGATTTTGTTCTTTCTCTTC
                             +
                             IIIIIHIIIIIIII3III.,IIII&II6II-))&'I0


                 @SRR022863.1.F                          @SRR022863.1.R
                 ATATAGATGTACATAAATTAGT...               TTCACCCATTTTATCCATGATTTTGTT...
                 +                                       +
                 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIAIIII   IIIIIHIIIIIIII3III.,IIII&II6II-))&'I0
                 @SRR022863.2.F                          @SRR022863.2.R
                 TTATGAATTATTAATAAGTGCT...               CATAAAAAAAGAAAATGTACTCTTTAC...
                 +                                       +
                 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII   IIII)0&A,%.&9$8I4+A;I)4II)&%-I$I%#)II



40    25.04.11                Velvet / Curtain
Quality score
     • Velvet does NOT use quality score!!!
                • Error correction of de Bruijn graph
     • p
                • the probability that the corresponding base call is incorrect

     • Phred quality score
                • 10 -> 1 in 10
                • 40 -> 1 in 10,000.

     • Odds ratio
                • earlier versions of solexa pipeline
                • differs mainly at lower levels



41   25.04.11              Velvet / Curtain
Quality encoding
     • !''*((((***+))%%%
                • One value per base
                • Integer mapping based on ASCII encoding
                • probability of incorrect base call



     • Sanger format                           • Illumina 1.5+
                •   Phred score                   •   Phred score
                •   ASCII 33 – 126 -> 0 – 93      •   ASCII 59 – 126 -> -5 – 62
                •   Rarely exceeds 60             •   Only 2 – 40 expected
                •   ! = 33 -> 0                   •   ! = 33 -> (does not exist)
                •   b = 66 -> 33                  •   b = 66 -> 2

42   25.04.11               Velvet / Curtain
Quality encoding
     • wikipedia




43   25.04.11      Velvet / Curtain
Quality trimming

                                                                Good / Bad ?



                          Quality score




                                          Bp position in Read

44   25.04.11   Velvet / Curtain
Quality trimming
     • Fixed length trimming
                • Cut-off at position x
     • Adaptive trimming
                • Quality score cut-off
                • Minimum sequence length
     • Sliding window
                • Window size
                • Quality score cut-off
                • Use average quality value of window




45   25.04.11              Velvet / Curtain
Velvet practical: Part 2
     • Paired-end (SRX008042)
                • Explore parameters
                • Set cut-offs
     • Analyse quality score (SRX008042)
                • Trimming reads




46   25.04.11            Velvet / Curtain
Velvet modules
     • Columbus (since Velvet 1.0)
                •   use reference sequence
                •   assist with alignment information
                •   local re-sequencing
                •   structural variants




47   25.04.11               Velvet / Curtain
Velvet modules
     • Oases
                • De novo transcriptome assembler
                • uses preliminary Velvet assembly
                • clusters contigs into loci
                • construct transcript isoforms using paired-end / long read
                  information
                • confidence score: describes uniqueness of a transcript in a locus




48   25.04.11              Velvet / Curtain
Read Simulation - Why?
     • Controlling the data
                •   Contamination
                •   Coverage distribution
                •   Sequencing errors
                •   Genome size
                •   Insert length
                •   Insert length distribution




49   25.04.11                Velvet / Curtain
Read Simulation - Why?
     • Make results comparable
                •   Assemblers
                •   Parameters
                •   Algorithms
                •   Assembly strategies
                •   Genome specific “features”
     • Robust
                • Introduce errors
                • Simulate SNPs




50   25.04.11               Velvet / Curtain
Real data vs. simulation




                                   Mario Caccamo

51   25.04.11   Velvet / Curtain
Real data vs. simulation




                                   Mario Caccamo

52   25.04.11   Velvet / Curtain
Velvet practical: Part 3
     • Velvet
                • Long Reads
                • Hybrid Assembly
                • Mixed insert length libraries




53   25.04.11              Velvet / Curtain
Curtain
     •     assembly pipeline
     •     Paired-end assembly for large genomes
     •     Group related Contigs
     •     Uses velvet to assemble groups of related reads
     •     Iterative approach




54   25.04.11       Velvet / Curtain
Curtain

                   Genome assembly Pipeline


                                         Curtain
     Contigs

                 Map               Group     Fill
                                                    Assemble   Collect
                 Reads             Contigs   Bins




55   25.04.11   Velvet / Curtain
Curtain
      Curtain                                  Contigs
                                                         Map   Group    Fill
                                                                             AssembleCollect
                                                         Reads Contigs Bins

     • Set of input Contigs
     • Use established assemblers
                •   Velvet unpaired
                •   Cortex
                •   SGA
                •   ...




56   25.04.11               Velvet / Curtain
Curtain
      Curtain                              Contigs
                                                      Map Group Fill AssembleCollect
                                                     Reads Contigs Bins

     • Map reads to input contigs
     • SAM file support
                • bwa
                • maq




57   25.04.11           Velvet / Curtain
Curtain
      Curtain                                  Contigs
                                                             Map   Group Fill AssembleCollect
                                                             Reads Contigs Bins


     • Group Contigs using Paired-end information

                1                      2   3             4                        5




                     bin mapping read & read pair




58   25.04.11       Velvet / Curtain
Curtain
      Curtain                                    Contigs
                                                            Map   Group    Fill
                                                            Reads Contigs Bins AssembleCollect

     • Assemble each bin
                • Run velvet using paired-end information
                • bin specific parameters
     •     Run each bin individually                                velvet
     •     Highly parallelizable
     •     Collect results
     •     Start next iteration                              ………………….




                                                                    Results


59   25.04.11             Velvet / Curtain
Curtain
     •     Low memory footprint
     •     Scalable for large genomes
     •     Make use of cluster
     •     Available
                • www.ebi.ac.uk/egt
                • http://code.google.com/p/curtain/
     • Future announcements
                • http://groups.google.com/group/curtain-assembler
     • Future work
                • Long read support




60   25.04.11              Velvet / Curtain
Curtain practical
     • Run Curtain for Staphylococcus
                • Simulation data




61   25.04.11             Velvet / Curtain
Thanks ...




62   25.04.11   Velvet / Curtain

More Related Content

Recently uploaded

Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 

Recently uploaded (20)

Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
Marius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
Expeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

2011-04-26_01-velvet-curtain-presentation

  • 1. Velvet / Curtain Matthias Haimel EBI is an Outstation of the European Molecular Biology Laboratory.
  • 2.  2 25.04.11 Velvet / Curtain
  • 3. Overview • De Bruijn Graph • Velvet • Theory • Practice • Data formats and quality • Velvet • Simulation data • Multiple insert lengths • Curtain • Theory • Practice 3 25.04.11 Velvet / Curtain
  • 4. De Bruijn graph • A concept in combinatorial mathematics • In combinatorics, de bruijn graph is usually fully connected • http://en.wikipedia.org/wiki/De_Bruijn_graph • de bruijn sequence • Related concept • Path through graph • Velvet • de Bruijn inspired graph structure 4 25.04.11 Velvet / Curtain
  • 5. De Bruijn graph (Velvet) • Representation of • a sequence based on short words (k-mers) • overlaps between words • K-mer: word of length k • K=5 GCCTTCCA • k-1 overlap GCCTT GCCTT GCCTT CCTTC CCTTC CCTTC CTTCC CTTCC TTCCA ... GCCTTCCA GCCTTCCA GCCTTCCA 5 25.04.11 Velvet / Curtain
  • 6. De Bruijn graph (Velvet) GCCTTCCAATTT GCCTTCAAATTT C A CTTC TTCC ..... CAATT T CCT TC G CT C AATTT A A CTTC TTCA ..... AAATT 6 25.04.11 Velvet / Curtain
  • 7. De Bruijn graph representations (Velvet) TTCA ATTC TCAG Error free, no repeat, no polymorphism Repeat > kmer length SNP, variant, < kmer length Structural variant, inversion Structural variant, deletion… … 7 25.04.11 Velvet / Curtain
  • 8. Example TAGTCGAGGCTTTAGATCCGATGAGGCTTTAGAGACAG AGTCGAG CTTTAGA CGATGAG CTTTAGA GTCGAGG TTAGATC ATGAGGC GAGACAG GAGGCTC ATCCGAT AGGCTTT GAGACAG AGTCGAG TAGATCC ATGAGGC TAGAGAA TAGTCGA CTTTAGA CCGATGA TTAGAGA CGAGGCT AGATCCG TGAGGCT AGAGACA TAGTCGA GCTTTAG TCCGATG GCTCTAG TCGACGC GATCCGA GAGGCTT AGAGACA TAGTCGA TTAGATC GATGAGG TTTAGAG GTCGAGG TCTAGAT ATGAGGC TAGAGAC AGGCTTT ATCCGAT AGGCTTT GAGACAG AGTCGAG TTAGATT ATGAGGC AGAGACA GGCTTTA TCCGATG TTTAGAG CGAGGCT TAGATCC TGAGGCT GAGACAG AGTCGAG TTTAGATC ATGAGGC TTAGAGA GAGGCTT GATCCGA GAGGCTT GAGACAG 8 25.04.11 Velvet / Curtain
  • 9. Example Read: GTCGAGG GTCG (1x) 9 25.04.11 Velvet / Curtain
  • 10. Example Read: GTCGAGG GTCG TCGA (1x) (1x) 10 25.04.11 Velvet / Curtain
  • 11. Example Read: GTCGAGG GTCG TCGA CGAG (1x) (1x) (1x) 11 25.04.11 Velvet / Curtain
  • 12. Example Read: GTCGAGG GTCG TCGA CGAG GAGG (1x) (1x) (1x) (1x) 12 25.04.11 Velvet / Curtain
  • 13. Example New read: CGAGGCT GTCG TCGA CGAG GAGG (1x) (1x) (2x) (1x) 13 25.04.11 Velvet / Curtain
  • 14. Example Read: CGAGGCT GTCG TCGA CGAG GAGG (1x) (1x) (2x) (2x) 14 25.04.11 Velvet / Curtain
  • 15. Example Read: CGAGGCT GTCG TCGA CGAG GAGG AGGC (1x) (1x) (2x) (2x) (1x) 15 25.04.11 Velvet / Curtain
  • 16. Example Read: CGAGGCT GTCG TCGA CGAG GAGG AGGC GGCT (1x) (1x) (2x) (2x) (1x) (1x) 16 25.04.11 Velvet / Curtain
  • 17. Example New read: TCGACGC GTCG TCGA CGAG GAGG AGGC (1x) (2x) (2x) (2x) (1x) 17 25.04.11 Velvet / Curtain
  • 18. Example Read: TCGACGC GTCG TCGA CGAG GAGG AGGC (1x) (2x) (2x) (2x) (1x) CGAC GACG ACGC (1x) (1x) (1x) 18 25.04.11 Velvet / Curtain
  • 19. Example etc… GATT (1x) TGAG ATGA GATG CGAT CCGA TCCG ATCC GATC AGAT (9x) (8x) (5x) (6x) (7x) (7x) (7x) (8x) (8x) AGAA (1x) GCTC CTCT TCTA CTAG (2x) (1x) (2x) (2x) TAGT AGTC GTCG TCGA CGAG GAGG AGGC GGCT TAGA AGAG GAGA AGAC GACA ACAG (3x) (7x) (9x) (10x) (8x) (16x) (16x) (11x) (16x) (9x) (12x) (9x) (8x) (5x) GCTT CTTT TTTA TTAG (8x) (8x) (8x) (12x) CGAC GACG ACGC (1x) (1x) (1x) 19 25.04.11 Velvet / Curtain
  • 20. Example After simplification… GATT AGAT GATCCGATGAG AGAA GCTCTAG TAGTCGA CGAG GAGGCT GGCT TAGA AGAGA AGACAG GCTTTAG CGACGC 20 25.04.11 Velvet / Curtain
  • 21. Example Tips removed… AGAT GATCCGATGAG GCTCTAG TAGTCGA CGAG GAGGCT GGCT TAGA AGAGA AGACAG GCTTTAG 21 25.04.11 Velvet / Curtain
  • 22. De Bruijn graph biology extensions (Velvet) • Handling of reverse strand • DNA is read in two directions • Paired-end data • Handling small differences, which are “uninteresting” • Errors in sequencing technology • Memory • regularly use 80, 100GB real memory • easily get to 1TB real memory requirements 22 25.04.11 Velvet / Curtain
  • 23. Read variety • Short reads ~75bp • Illumina / Solexa • SOLiD (colour space) • Long reads 500-1000 bp • 454 read • Sanger capillary reads • Paired-end reads • Short reads • short insert length • Mate pair reads • Short reads • long insert length 23 25.04.11 Velvet / Curtain
  • 24. Paired-End Mate Pair 24 25.04.11 Velvet / Curtain
  • 25. Short paired-end / mate pair reads ? Velvet expect Illumina paired-end orientation: (L-> <-R) L R paired-end 25 25.04.11 Velvet / Curtain
  • 26. Short paired-end / mate pair reads Illumina mate-pair orientation: (<-L R->) L R mate pair reverse complement L R paired-end 26 25.04.11 Velvet / Curtain
  • 27. Velvet algorithms • Remove Bubbles • Tour Bus • Velvet parameters • -max_branch_length • -max_divergence • -max_gap_count 27 25.04.11 Velvet / Curtain
  • 28. Example AGAT GATCCGATGAG GCTCTAG TAGTCGA CGAG GAGGCT GGCT TAGA AGAGA AGACAG GCTTTAG GCTC CTCT TCTA CTAG (2x) (1x) (2x) (2x) GGCT TAGA (11x) (16x) GCTT CTTT TTTA TTAG (8x) (8x) (8x) (12x) 28 25.04.11 Velvet / Curtain
  • 29. Example Bubbles removed… by TourBus AGAT GATCCGATGAG TAGTCGA CGAG GAGGCT GGCT GCTTTAG TAGA AGAGA AGACAG 29 25.04.11 Velvet / Curtain
  • 30. Example Final simplification… AGATCCGATGAG TAGTCGAG GAGGCTTTAGA AGAGACAG 30 25.04.11 Velvet / Curtain
  • 31. Example TAGTCGAGGCTTTAGATCCGATGAGGCTTTAGAGACAG Final simplification… AGATCCGATGAG TAGTCGAG GAGGCTTTAGA AGAGACAG One possible walk through the graph ... TAGTCGAG GAGGCTTTAGA AGATCCGATGAG GAGGCTTTAGA AGAGACAG 31 25.04.11 Velvet / Curtain
  • 32. N50 • Total • N90 • N50 • N10 32 25.04.11 Velvet / Curtain
  • 33. N50 • Total • 4,295,113bp • N90 • 439bp • N50 • 3,119bp • N10 • 13,519bp 33 25.04.11 Velvet / Curtain
  • 34. N50 • N50 is the length of the smallest contig • contains the fewest (largest) contigs • combined length represents at least 50% of the assembly • N10 • > 10 % of the largest contigs http://www.broadinstitute.org/crd/wiki/index.php/N50 34 25.04.11 Velvet / Curtain
  • 35. Velvet practical: Part 1 • Compile • Single end (ERX001300) • K-mer length • Coverage cut-offs • Whole genome sequence as input??? • Staphylococcus aureus MRSA252 35 25.04.11 Velvet / Curtain
  • 36. Velvet algorithms • Long read information • Rock Band • Velvet parameters • -long_mult_cutoff 36 25.04.11 Velvet / Curtain
  • 37. Velvet algorithms • Paired-end information • Pebble • Velvet parameters • -min_pair_count Once all distances and variance computed, Simple greedy extension from main contigs out 37 25.04.11 Velvet / Curtain
  • 38. Paired-end in Velvet • Hugely improves quality of assembly • Insert length greater than repeat • greater than the length of the most common genomic repeat • Mixed insert length improves results • Short: helps for local assembly • Long: get over repeats • Large genomes • Very memory intensive • Calculation intensive 38 25.04.11 Velvet / Curtain
  • 39. Data formats and quality • Fasta • Fastq • .fasta • .fastq • .fa • .fq • ? • ? Header >read_1 @SEQ_ID TATAATATTTAT... GATTTGGGGTTCAAAGC Sequence + !''*((((***+))%%% Quality 39 25.04.11 Velvet / Curtain
  • 40. FASTQ paired @SRR022863.1.F ATATAGATGTACATAAATTAGTTGAAGTATATGAACG + .F .R IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIAIIII /1 /2 @SRR022863.1.R TTCACCCATTTTATCCATGATTTTGTTCTTTCTCTTC + IIIIIHIIIIIIII3III.,IIII&II6II-))&'I0 @SRR022863.1.F @SRR022863.1.R ATATAGATGTACATAAATTAGT... TTCACCCATTTTATCCATGATTTTGTT... + + IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIAIIII IIIIIHIIIIIIII3III.,IIII&II6II-))&'I0 @SRR022863.2.F @SRR022863.2.R TTATGAATTATTAATAAGTGCT... CATAAAAAAAGAAAATGTACTCTTTAC... + + IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIII)0&A,%.&9$8I4+A;I)4II)&%-I$I%#)II 40 25.04.11 Velvet / Curtain
  • 41. Quality score • Velvet does NOT use quality score!!! • Error correction of de Bruijn graph • p • the probability that the corresponding base call is incorrect • Phred quality score • 10 -> 1 in 10 • 40 -> 1 in 10,000. • Odds ratio • earlier versions of solexa pipeline • differs mainly at lower levels 41 25.04.11 Velvet / Curtain
  • 42. Quality encoding • !''*((((***+))%%% • One value per base • Integer mapping based on ASCII encoding • probability of incorrect base call • Sanger format • Illumina 1.5+ • Phred score • Phred score • ASCII 33 – 126 -> 0 – 93 • ASCII 59 – 126 -> -5 – 62 • Rarely exceeds 60 • Only 2 – 40 expected • ! = 33 -> 0 • ! = 33 -> (does not exist) • b = 66 -> 33 • b = 66 -> 2 42 25.04.11 Velvet / Curtain
  • 43. Quality encoding • wikipedia 43 25.04.11 Velvet / Curtain
  • 44. Quality trimming Good / Bad ? Quality score Bp position in Read 44 25.04.11 Velvet / Curtain
  • 45. Quality trimming • Fixed length trimming • Cut-off at position x • Adaptive trimming • Quality score cut-off • Minimum sequence length • Sliding window • Window size • Quality score cut-off • Use average quality value of window 45 25.04.11 Velvet / Curtain
  • 46. Velvet practical: Part 2 • Paired-end (SRX008042) • Explore parameters • Set cut-offs • Analyse quality score (SRX008042) • Trimming reads 46 25.04.11 Velvet / Curtain
  • 47. Velvet modules • Columbus (since Velvet 1.0) • use reference sequence • assist with alignment information • local re-sequencing • structural variants 47 25.04.11 Velvet / Curtain
  • 48. Velvet modules • Oases • De novo transcriptome assembler • uses preliminary Velvet assembly • clusters contigs into loci • construct transcript isoforms using paired-end / long read information • confidence score: describes uniqueness of a transcript in a locus 48 25.04.11 Velvet / Curtain
  • 49. Read Simulation - Why? • Controlling the data • Contamination • Coverage distribution • Sequencing errors • Genome size • Insert length • Insert length distribution 49 25.04.11 Velvet / Curtain
  • 50. Read Simulation - Why? • Make results comparable • Assemblers • Parameters • Algorithms • Assembly strategies • Genome specific “features” • Robust • Introduce errors • Simulate SNPs 50 25.04.11 Velvet / Curtain
  • 51. Real data vs. simulation Mario Caccamo 51 25.04.11 Velvet / Curtain
  • 52. Real data vs. simulation Mario Caccamo 52 25.04.11 Velvet / Curtain
  • 53. Velvet practical: Part 3 • Velvet • Long Reads • Hybrid Assembly • Mixed insert length libraries 53 25.04.11 Velvet / Curtain
  • 54. Curtain • assembly pipeline • Paired-end assembly for large genomes • Group related Contigs • Uses velvet to assemble groups of related reads • Iterative approach 54 25.04.11 Velvet / Curtain
  • 55. Curtain Genome assembly Pipeline Curtain Contigs Map Group Fill Assemble Collect Reads Contigs Bins 55 25.04.11 Velvet / Curtain
  • 56. Curtain Curtain Contigs Map Group Fill AssembleCollect Reads Contigs Bins • Set of input Contigs • Use established assemblers • Velvet unpaired • Cortex • SGA • ... 56 25.04.11 Velvet / Curtain
  • 57. Curtain Curtain Contigs Map Group Fill AssembleCollect Reads Contigs Bins • Map reads to input contigs • SAM file support • bwa • maq 57 25.04.11 Velvet / Curtain
  • 58. Curtain Curtain Contigs Map Group Fill AssembleCollect Reads Contigs Bins • Group Contigs using Paired-end information 1 2 3 4 5 bin mapping read & read pair 58 25.04.11 Velvet / Curtain
  • 59. Curtain Curtain Contigs Map Group Fill Reads Contigs Bins AssembleCollect • Assemble each bin • Run velvet using paired-end information • bin specific parameters • Run each bin individually velvet • Highly parallelizable • Collect results • Start next iteration …………………. Results 59 25.04.11 Velvet / Curtain
  • 60. Curtain • Low memory footprint • Scalable for large genomes • Make use of cluster • Available • www.ebi.ac.uk/egt • http://code.google.com/p/curtain/ • Future announcements • http://groups.google.com/group/curtain-assembler • Future work • Long read support 60 25.04.11 Velvet / Curtain
  • 61. Curtain practical • Run Curtain for Staphylococcus • Simulation data 61 25.04.11 Velvet / Curtain
  • 62. Thanks ... 62 25.04.11 Velvet / Curtain