• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
BITS - Comparative genomics on the genome level
 

BITS - Comparative genomics on the genome level

on

  • 763 views

This is the third presentation of the BITS training on 'Comparative genomics'. ...

This is the third presentation of the BITS training on 'Comparative genomics'.

It reviews the basic concepts of sequence homology on the gene

Thanks to Klaas Vandepoele of the PSB department.

Statistics

Views

Total Views
763
Views on SlideShare
728
Embed Views
35

Actions

Likes
0
Downloads
18
Comments
0

2 Embeds 35

http://www.bits.vib.be 34
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    BITS - Comparative genomics on the genome level BITS - Comparative genomics on the genome level Presentation Transcript

    • Comparative genomicsin eukaryotesGenome analysis Klaas Vandepoele, PhDProfessor Ghent UniversityComparative & Integrative GenomicsVIB – Ghent University, Belgium
    • I. Genome conservation & genomic homology  Alignment of homologous regions  Inter-genomic: aligning genomic sequences from different species  Intra-genomic aligning genomic sequences from the same species  Different levels of resolution  Comparative mapping (markers)  Synteny (~ gene content)  Colinearity (gene content + order conservation)  DNA-based alignments (base-to-base mapping)2
    • Human – Mouse - Rat resolution3
    • Human – Mouse orthologous regions resolution Genome translocations associated Comparative with human-mouse speciation mapping HumanMouse chr IV4 www.ensembl.org
    • Human genome browser resolutionConserved gene Human chr Icontent & order Mouse chr IV Gene loss and insertions in orthologous segments since human-mouse speciationEST/cDNAsimilaritiesGenomesimilarities5 Human gene model
    • Human – Mouse base-to-base mapping resolution  Functional sequences (e.g. exons) evolve slower than non-functional ones (e.g. introns) due to natural selection against mutations in these regions  Consequently, functional elements, both coding and non-coding, are unusually well conserved in orthologous regions Blue: coding exons GT donor AG acceptor6
    • DNA substitution rates for different gene/genome regions7 Molecular Evolution, Li WH
    • Multiple species comparisons (gene-based)8 Hedges, 2002 PhIGs
    • Genome size variation in the grasses: the use of model systems BEP Rice 450Mb 46 MYA 55 MYA Barley ~5000Mb 28 MYA PACC Sorghum ~750Mb Maize ~2400Mb9 Gaut 2002
    • Grass genomes: a single genetic system? Gale and Devos, 199810
    • Micro-colinearity within the grasses11 Bennetzen lab
    • Yeast Gene Order Browser (YGOB)12
    • II. Computational detection of genomic homology  Synteny ~ conservation of gene content  Colinearity ~ conservation of (gene) content & order  Macro-colinearity  Marker-based  Micro-colinearity  DNA based or gene-based13
    • How to find evidence for gene colinearity? A 1 2 3 4 5 6 7 8 9 10 11 speciation S1 1 2 3 4 5 6 7 8 9 10 11 S2 1 2 3 4 5 6 7 8 9 10 11 Time Gene loss, insertions, rearrangements, translocation, etc … 2 S1 1 3 4 6 7 10 11 S2 1 2 4 6 7 8 9 11 retained orthologs (anchor points)14
    • Matrix representation S1 1 3 4 6 7 10 11 S2 1 2 4 6 7 8 9 11 segment S1 1 - 3 4 - 6 7 X X 10 11 1 2 - segment S2 4 X 6 7 8 9 -15 11
    • Map-based approach Chromosome 1 • Represent chromosomes as sorted gene lists • Identify all homologous Chromosome 2 gene pairs between chromosomes (all- against-all BLASTP*). • Score pairs of homologues in matrix Identifying homologous regions = identifying diagonal series of elements in the gene homology matrix (GHM).16 Vandepoele et al., Genome Research 2002
    • The map-based approach: terminology Chromosome 1 Colinear segment Tandem duplication Chromosome 2 Homologous gene Inverted colinear segment 1 2 Gene Homology Matrix (GHM)17
    • Detection of colinear homologous regions Human-mouse Chicken-human MmuC4 HsaC1 HsaC1 GgaC2318
    • Detection of colinear homologous regions Human-mouse Human-tetraodon MmuC4 TviC1 HsaC1 HsaC119
    • MUMmer NUCmer PROmer20
    • And what about synteny? HsaC1 • Application of 2- dimensional sliding- HsaC9 window approach to score regions with a high density of homologous genes between 2 chromosomes ancient duplication Identifying syntenic regions = identifying high homolog-density regions in the gene homology matrix (GHM).21 DeSyRe, Vandepoele et al. unpublished
    • Detection of recent and ancient large- scale duplications recent duplication ancient duplication C2 HsaC1 C4 HsaC922 colinearity synteny
    • III. Whole-genome alignments  Evolutionary constrained sequences are a good indicator of functional genome regions  Basic protocol 1. Sequence generation 2. Reconstructing homologous colinearity across related genomes 3. Multi-sequence alignment 4. Detection sequences under purifying selection.23 Margulies & Birney, NRG 2008
    • Reconstructing homologous colinearity • Segmental duplication and other species-specific rearrangements (e.g. inversions, insertions, deletions) interfere with the accurate detection of orthologous genomic regions24
    • Tools  Mercator (Ensembl)  coding exons as anchor points  graph of colinearity information  travel through graph to generate homologous regions  chains-and-nets (UCSC)  reference-based local alignments different genomes (BLASTZ)  filtering highest-scoring chains  net together chains from same locus25
    • Sequence alignment & constraint detection PhastCons BinCons GERP Siphy26
    • Whole-genome base-pair alignment  Challenges  multi-species alignment  long DNA sequences (reflecting homologous colinear regions)  one-to-one mapping (with reference genome)  various levels of sequence divergence27
    • Whole-genome base-pair alignment toolbox  MLAGAN  CHAOS seeding algorithm (k-mer anchors)  Dynamic programming (pairwise)  Multiple alignment using progressive strategy  Shuffle-LAGAN (incl. rearrangement map); VISTA  TBA / MultiZ; UCSC  Pairwise BLASTZ alignments (local blocks)  Merging joining blocks using MultiZ  Complex ordering of blocks using Threaded Blockset Aligner  PECAN (Ensembl)  Consistency alignment based on pairwise alignments (incl. outgroup information)  MAVID28
    • From gene to DNA-based colinearity…Pairwise approach: Human segment as reference29 VISTA http://genome.lbl.gov/vista
    • From gene to DNA-based colinearity…30
    • Input and output files PIP- maker31 Frazer et al., 2003
    • Conserved Non-coding Sequences or Elements (CNS/CNE)Human/dogHuman/mouse Mouse/dog VISTA plot Blue: exons Turquoise: UTR32
    • Exercise  Explore the genome organization and conservation of your favorite locus in a set of related species.  Plants  http://bioinformatics.psb.ugent.be/plaza/  Vertebrates  http://teleost.cs.uoregon.edu/synteny_db/  Yeast  http://wolfe.gen.tcd.ie/ygob/33
    • 34