Whole Genome Analysis

Bone-Net Workshop - 21 January 2012



       Ir Stéphane Wenric – s.wenric@dnavision.be
Whole Genome Analysis




             Whole genome analysis:
           why is it such a hot topic today?
Whole Genome Analysis
Whole Genome Analysis




              human whole genome sequencing




                    Price




                        Time
Whole Genome Analysis




     Microarray                    Exome               Whole Genome
 Only known SNPs           Only the coding regions   The complete DNA
    (~ 900 000)                of the genome            sequences

Up to 0.0003 % of the       ~ 1 % of the human       ~ 80 % of the human
  human genome                   genome                    genome
Whole Genome Analysis




    What are the technologies involved ?



                        •
                          Illumina (Solexa)
                        •
                          ABI SOLiD
                        •
                          Ion Proton (2013)
Whole Genome Analysis




           Illumina (Solexa) :
               •
                 cluster generation by bridge amplification
               •
                 sequencing by synthesis
Whole Genome Analysis
Whole Genome Analysis




           ABI SOLiD :
             •
               amplification on magnetic beads
             •
               ligation + fluorescence detection
             •
               2-base color encoding
Whole Genome Analysis
Whole Genome Analysis




           Ion semiconductor sequencing :
              •
                PCR amplification
              •
                dNTP incorporation w/ H+ release
Whole Genome Analysis




whole genome analysis = whole genome sequencing

                          +
                          bioinformatics
Whole Genome Analysis




        What kind of bioinformatics analyses are
         best suited for whole genome data?




                        •
                          SNP
                        •
                          Indels
                        •
                          CNV
                        •
                          SV
Whole Genome Analysis




Single Nucleotide Polymorphisms

     •
       What?
     •
       How?
     •
       Impact?
Whole Genome Analysis
Whole Genome Analysis




SNPs : easy to detect
   1. Map sequenced reads to reference sequence
   2. See how the 'consensus sequence' differs from the reference
   3. 1-base difference between consensus sequence
      and reference = SNP
Whole Genome Analysis




  SNPs: impact?

        •
            Public databases:
              •
                DbSNP (ncbi)
              •
                OMIM (Johns Hopkins University School of Medicine)
              •
                SIFT (Craig Venter Institute)
              •
                …

        •
            'Manual' annotation
Whole Genome Analysis




Insertions and Deletions

     •
       What?
     •
       How?
     •
       Impact?
Whole Genome Analysis




  Insertion or deletion of a sequence of DNA of arbitrary length
Whole Genome Analysis




    Indels detection

    Small Indels :
    •
      Detection of small gaps in the alignment
    •
      Combination of the gapped alignments based on proximity
    •
      Filtering (read pos., coverage, quality)

      Large Indels :
    •
       Use of the reads pairing info (Illumina and SOLiD only)
Whole Genome Analysis




  Indels: impact?

        •
            Public databases:
              •
                dbSNP for small indels (ncbi)
              •
                dbVar for large indels (ncbi)

        •
            'Manual' annotation
Whole Genome Analysis




Copy Number Variations

     •
       What?
     •
       How?
     •
       Impact?
Whole Genome Analysis




CNV:
•
  variation in terms of copy number of a sequence
•
  >= 1 kb
•
  up to n Mb
Whole Genome Analysis




     CNV detection methods

           •
               Statistical models:
                  •
                    Depth-of-reads/coverage
                  •
                    Pairing information
Whole Genome Analysis




  CNVs: impact?

        •
            Public databases:
              •
                Database of Genomic Variants (Center for Applied genomics, Canada)
              •
                CNV Project (Sanger)
              •
                ...

        •
            'Manual' annotation
Whole Genome Analysis




Structural Variations

     •
       What?
     •
       How?
     •
       Impact?
Whole Genome Analysis




Structural Variations

     •
       >= 1 kb
     •
       Inversions
     •
       Translocations
     •
       (Large indels)
     •
       (CNVs)
Whole Genome Analysis




Inversions
Whole Genome Analysis




Translocations
Whole Genome Analysis




  SVs: detection methods

        •
            Geometric approach (GASV)
              •
                Takes pairing info into account
              •
                Compares potential variants to geometric models
Whole Genome Analysis




  SVs: impact?

        •
            Public databases:
              •
                Database of Genomic Variants (Center for Applied genomics, Canada)
              •
                dbVar (ncbi)
              •
                ...

        •
            'Manual' annotation
Whole Genome Analysis




                        Questions?

Whole Genome Analysis