Comparative genomics
in eukaryotes



  Klaas Vandepoele, PhD
  Klaas.Vandepoele@psb.vib-ugent.be

Professor Ghent University
Comparative & Integrative Genomics
VIB – Ghent University, Belgium


                  http://www.bits.vib.be
Outline

       Introduction

       Gene family analysis

       Genome analysis

       ConTra: promoter alignment analysis



2
What is comparative genomics?

       Because all modern genomes have arisen from
        common ancestral genomes, the relationships
        between genomes can be studies with this fact in
        mind. This commonality means that information gained
        in one organism can have application in other even
        distantly related organisms. Comparative genomics
        enables the application of information gained from
        facile model systems to agricultural and medical
        problems. The nature and significance of differences
        between genomes also provides a powerful tool for
        determining the relationship between genotype and
        phenotype through comparative genomics and
        morphological and physiological studies.

3                                    http://genomics.ucdavis.edu/what.html
Principles

       DNA sequences encoding and regulating the
        expression of essential proteins and RNAs will be
        conserved
       Consequently, the regulatory profiles of genes
        involved in similar processes among related
        species will be conserved
       Conversely, sequences that encode or control the
        expression of proteins or RNAs responsible for
        differences between species will be divergent




4
Definition
    “ The combination of genomic data and comparative /
    evolutionary biology to address questions of genome
    structure, evolution and function”




5                                     Hardison, PLoS Biology 2003
What can we learn from cross-
        species comparisons?
       Genome conservation
         transfer knowledge gained from model
          organisms to non-model organisms

       Genome variation
         understand how genomes change over time in
          order to identify evolutionary processes and
          constraints

       Detection of functional elements
          Coding elements (e.g. exons)
          Conserved non-coding sequences / elements

6
Conservation of gene structure




7
Homology & sequence similarity

       Homology = shared ancestral common
        origin
       Inferred based on:
           Sequence similarity
           Similar (multi-) protein domain
            composition and organization
       So sequence similarity means homology?
           No, it depends!

8                       "Orthologs, paralogs, and evolutionary genomics“, Koonin 2005
Homology & sequence similarity

    Sequence analysis aims at finding important sequence similarities
    Sequence analysis aims at finding important sequence similarities
    that would allow one to infer homology. The latter term is extensively
    that would allow one to infer homology. The latter term is extensively
    used in scientific literature, often without a clear understanding of its
    used in scientific literature, often without a clear understanding of its
    meaning, which is simply common origin.
    meaning, which is simply common origin.

    Homologous organs are not necessarily similar (at least the similarity
    Homologous organs are not necessarily similar (at least the similarity
    may not be obvious); similar organs are not necessarily homologous.
    may not be obvious); similar organs are not necessarily homologous.

    For some reason, this simple concept tends to get extremely muddled
     For some reason, this simple concept tends to get extremely muddled
    when applied to protein and DNA sequences. Phrases like “sequence
     when applied to protein and DNA sequences. Phrases like “sequence
    (structural) homology”, “high homology”, “significant homology”,
     (structural) homology”, “high homology”, “significant homology”,
    or even “35% homology” are as common, even in top scientific
     or even “35% homology” are as common, even in top scientific
    journals, as they are absurd, considering the definition.
     journals, as they are absurd, considering the definition.



9
Multiple Sequence Alignments

                                  Columns (~positions) in the alignment
     Sequences (~taxa)




10
Genome-wide sequence retrieval

                              Finding information from whole-genome
low
                               sequencing projects
                                  DNA sequence reads
                                  Assembled genomic DNA sequences
       Information value




                                  Annotated genes (RNA genes + protein-
                                   encoding genes)
                                  Repeats, transposable elements
                                  Integrated platform providing both sequence
high
                                   data and functional genomics data




11
Genome databases

        Species-specific databases
            SGD
            TAIR
            Many others, e.g. wormbase, flybase,...


        General & Integrative repositories
            EBI Genomes & Integr8 / Ensembl
            NCBI Entrez Genome
            UCSC

12
13
14

BITS - Introduction to comparative genomics

  • 1.
    Comparative genomics in eukaryotes Klaas Vandepoele, PhD Klaas.Vandepoele@psb.vib-ugent.be Professor Ghent University Comparative & Integrative Genomics VIB – Ghent University, Belgium http://www.bits.vib.be
  • 2.
    Outline  Introduction  Gene family analysis  Genome analysis  ConTra: promoter alignment analysis 2
  • 3.
    What is comparativegenomics?  Because all modern genomes have arisen from common ancestral genomes, the relationships between genomes can be studies with this fact in mind. This commonality means that information gained in one organism can have application in other even distantly related organisms. Comparative genomics enables the application of information gained from facile model systems to agricultural and medical problems. The nature and significance of differences between genomes also provides a powerful tool for determining the relationship between genotype and phenotype through comparative genomics and morphological and physiological studies. 3 http://genomics.ucdavis.edu/what.html
  • 4.
    Principles  DNA sequences encoding and regulating the expression of essential proteins and RNAs will be conserved  Consequently, the regulatory profiles of genes involved in similar processes among related species will be conserved  Conversely, sequences that encode or control the expression of proteins or RNAs responsible for differences between species will be divergent 4
  • 5.
    Definition “ The combination of genomic data and comparative / evolutionary biology to address questions of genome structure, evolution and function” 5 Hardison, PLoS Biology 2003
  • 6.
    What can welearn from cross- species comparisons?  Genome conservation  transfer knowledge gained from model organisms to non-model organisms  Genome variation  understand how genomes change over time in order to identify evolutionary processes and constraints  Detection of functional elements  Coding elements (e.g. exons)  Conserved non-coding sequences / elements 6
  • 7.
  • 8.
    Homology & sequencesimilarity  Homology = shared ancestral common origin  Inferred based on:  Sequence similarity  Similar (multi-) protein domain composition and organization  So sequence similarity means homology?  No, it depends! 8 "Orthologs, paralogs, and evolutionary genomics“, Koonin 2005
  • 9.
    Homology & sequencesimilarity Sequence analysis aims at finding important sequence similarities Sequence analysis aims at finding important sequence similarities that would allow one to infer homology. The latter term is extensively that would allow one to infer homology. The latter term is extensively used in scientific literature, often without a clear understanding of its used in scientific literature, often without a clear understanding of its meaning, which is simply common origin. meaning, which is simply common origin. Homologous organs are not necessarily similar (at least the similarity Homologous organs are not necessarily similar (at least the similarity may not be obvious); similar organs are not necessarily homologous. may not be obvious); similar organs are not necessarily homologous. For some reason, this simple concept tends to get extremely muddled For some reason, this simple concept tends to get extremely muddled when applied to protein and DNA sequences. Phrases like “sequence when applied to protein and DNA sequences. Phrases like “sequence (structural) homology”, “high homology”, “significant homology”, (structural) homology”, “high homology”, “significant homology”, or even “35% homology” are as common, even in top scientific or even “35% homology” are as common, even in top scientific journals, as they are absurd, considering the definition. journals, as they are absurd, considering the definition. 9
  • 10.
    Multiple Sequence Alignments Columns (~positions) in the alignment Sequences (~taxa) 10
  • 11.
    Genome-wide sequence retrieval  Finding information from whole-genome low sequencing projects  DNA sequence reads  Assembled genomic DNA sequences Information value  Annotated genes (RNA genes + protein- encoding genes)  Repeats, transposable elements  Integrated platform providing both sequence high data and functional genomics data 11
  • 12.
    Genome databases  Species-specific databases  SGD  TAIR  Many others, e.g. wormbase, flybase,...  General & Integrative repositories  EBI Genomes & Integr8 / Ensembl  NCBI Entrez Genome  UCSC 12
  • 13.
  • 14.