BITS - Comparative genomics on the genome level

1,001
-1

Published on

This is the third presentation of the BITS training on 'Comparative genomics'.

It reviews the basic concepts of sequence homology on the gene

Thanks to Klaas Vandepoele of the PSB department.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,001
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
50
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

BITS - Comparative genomics on the genome level

  1. 1. Comparative genomicsin eukaryotesGenome analysis Klaas Vandepoele, PhDProfessor Ghent UniversityComparative & Integrative GenomicsVIB – Ghent University, Belgium
  2. 2. I. Genome conservation & genomic homology  Alignment of homologous regions  Inter-genomic: aligning genomic sequences from different species  Intra-genomic aligning genomic sequences from the same species  Different levels of resolution  Comparative mapping (markers)  Synteny (~ gene content)  Colinearity (gene content + order conservation)  DNA-based alignments (base-to-base mapping)2
  3. 3. Human – Mouse - Rat resolution3
  4. 4. Human – Mouse orthologous regions resolution Genome translocations associated Comparative with human-mouse speciation mapping HumanMouse chr IV4 www.ensembl.org
  5. 5. Human genome browser resolutionConserved gene Human chr Icontent & order Mouse chr IV Gene loss and insertions in orthologous segments since human-mouse speciationEST/cDNAsimilaritiesGenomesimilarities5 Human gene model
  6. 6. Human – Mouse base-to-base mapping resolution  Functional sequences (e.g. exons) evolve slower than non-functional ones (e.g. introns) due to natural selection against mutations in these regions  Consequently, functional elements, both coding and non-coding, are unusually well conserved in orthologous regions Blue: coding exons GT donor AG acceptor6
  7. 7. DNA substitution rates for different gene/genome regions7 Molecular Evolution, Li WH
  8. 8. Multiple species comparisons (gene-based)8 Hedges, 2002 PhIGs
  9. 9. Genome size variation in the grasses: the use of model systems BEP Rice 450Mb 46 MYA 55 MYA Barley ~5000Mb 28 MYA PACC Sorghum ~750Mb Maize ~2400Mb9 Gaut 2002
  10. 10. Grass genomes: a single genetic system? Gale and Devos, 199810
  11. 11. Micro-colinearity within the grasses11 Bennetzen lab
  12. 12. Yeast Gene Order Browser (YGOB)12
  13. 13. II. Computational detection of genomic homology  Synteny ~ conservation of gene content  Colinearity ~ conservation of (gene) content & order  Macro-colinearity  Marker-based  Micro-colinearity  DNA based or gene-based13
  14. 14. How to find evidence for gene colinearity? A 1 2 3 4 5 6 7 8 9 10 11 speciation S1 1 2 3 4 5 6 7 8 9 10 11 S2 1 2 3 4 5 6 7 8 9 10 11 Time Gene loss, insertions, rearrangements, translocation, etc … 2 S1 1 3 4 6 7 10 11 S2 1 2 4 6 7 8 9 11 retained orthologs (anchor points)14
  15. 15. Matrix representation S1 1 3 4 6 7 10 11 S2 1 2 4 6 7 8 9 11 segment S1 1 - 3 4 - 6 7 X X 10 11 1 2 - segment S2 4 X 6 7 8 9 -15 11
  16. 16. Map-based approach Chromosome 1 • Represent chromosomes as sorted gene lists • Identify all homologous Chromosome 2 gene pairs between chromosomes (all- against-all BLASTP*). • Score pairs of homologues in matrix Identifying homologous regions = identifying diagonal series of elements in the gene homology matrix (GHM).16 Vandepoele et al., Genome Research 2002
  17. 17. The map-based approach: terminology Chromosome 1 Colinear segment Tandem duplication Chromosome 2 Homologous gene Inverted colinear segment 1 2 Gene Homology Matrix (GHM)17
  18. 18. Detection of colinear homologous regions Human-mouse Chicken-human MmuC4 HsaC1 HsaC1 GgaC2318
  19. 19. Detection of colinear homologous regions Human-mouse Human-tetraodon MmuC4 TviC1 HsaC1 HsaC119
  20. 20. MUMmer NUCmer PROmer20
  21. 21. And what about synteny? HsaC1 • Application of 2- dimensional sliding- HsaC9 window approach to score regions with a high density of homologous genes between 2 chromosomes ancient duplication Identifying syntenic regions = identifying high homolog-density regions in the gene homology matrix (GHM).21 DeSyRe, Vandepoele et al. unpublished
  22. 22. Detection of recent and ancient large- scale duplications recent duplication ancient duplication C2 HsaC1 C4 HsaC922 colinearity synteny
  23. 23. III. Whole-genome alignments  Evolutionary constrained sequences are a good indicator of functional genome regions  Basic protocol 1. Sequence generation 2. Reconstructing homologous colinearity across related genomes 3. Multi-sequence alignment 4. Detection sequences under purifying selection.23 Margulies & Birney, NRG 2008
  24. 24. Reconstructing homologous colinearity • Segmental duplication and other species-specific rearrangements (e.g. inversions, insertions, deletions) interfere with the accurate detection of orthologous genomic regions24
  25. 25. Tools  Mercator (Ensembl)  coding exons as anchor points  graph of colinearity information  travel through graph to generate homologous regions  chains-and-nets (UCSC)  reference-based local alignments different genomes (BLASTZ)  filtering highest-scoring chains  net together chains from same locus25
  26. 26. Sequence alignment & constraint detection PhastCons BinCons GERP Siphy26
  27. 27. Whole-genome base-pair alignment  Challenges  multi-species alignment  long DNA sequences (reflecting homologous colinear regions)  one-to-one mapping (with reference genome)  various levels of sequence divergence27
  28. 28. Whole-genome base-pair alignment toolbox  MLAGAN  CHAOS seeding algorithm (k-mer anchors)  Dynamic programming (pairwise)  Multiple alignment using progressive strategy  Shuffle-LAGAN (incl. rearrangement map); VISTA  TBA / MultiZ; UCSC  Pairwise BLASTZ alignments (local blocks)  Merging joining blocks using MultiZ  Complex ordering of blocks using Threaded Blockset Aligner  PECAN (Ensembl)  Consistency alignment based on pairwise alignments (incl. outgroup information)  MAVID28
  29. 29. From gene to DNA-based colinearity…Pairwise approach: Human segment as reference29 VISTA http://genome.lbl.gov/vista
  30. 30. From gene to DNA-based colinearity…30
  31. 31. Input and output files PIP- maker31 Frazer et al., 2003
  32. 32. Conserved Non-coding Sequences or Elements (CNS/CNE)Human/dogHuman/mouse Mouse/dog VISTA plot Blue: exons Turquoise: UTR32
  33. 33. Exercise  Explore the genome organization and conservation of your favorite locus in a set of related species.  Plants  http://bioinformatics.psb.ugent.be/plaza/  Vertebrates  http://teleost.cs.uoregon.edu/synteny_db/  Yeast  http://wolfe.gen.tcd.ie/ygob/33
  34. 34. 34
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×