SyMAP Master's Thesis Presentation
Upcoming SlideShare
Loading in...5
×
 

SyMAP Master's Thesis Presentation

on

  • 1,870 views

My master's thesis on SyMAP, a synteny mapping and analysis program.

My master's thesis on SyMAP, a synteny mapping and analysis program.

Statistics

Views

Total Views
1,870
Views on SlideShare
1,867
Embed Views
3

Actions

Likes
0
Downloads
12
Comments
0

1 Embed 3

http://www.slideshare.net 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Hello. I am Austin Shoemaker and I’m going to be presenting SyMAP, a synteny mapping and analysis program.

SyMAP Master's Thesis Presentation SyMAP Master's Thesis Presentation Presentation Transcript

  • SyMAP Synteny Mapping and Analysis Program Austin Shoemaker
  • SyMAP Team
    • Dr. Cari Soderlund
    • Dr. Will Nelson
    • Austin Shoemaker
      • Interactive SyMAP views
      • Sytry
        • Testing environment for synteny finding algorithms
      • Worked with the team on:
        • The synteny finding algorithm
        • MySQL database schema
  • Background
    • Comparative Genomics
    • Physical Map
    • Computing Synteny
    • Properties of FPC to Genome Synteny
  • Comparative Genomics
    • Compare genomes of different species
    • Knowledge of one helps understand the other
      • Gene Function
        • Organism O 1 has a gene G 1
        • Organism O 2 has a gene G 2 with a sequence similar to G 1
        • G 1 and G 2 may have similar functions
      • Evolutionary History
        • Genome rearrangements
  • Genome Rearrangements Rearrangement Scenario Result Inversion Duplication Insertion Deletion A BCD E A DCB E A B C AC AB A C B A B CD A B B CD A B C A B C B , A BB C AB AB AB
  • Whole-Genome Duplication
    • mya (million years ago)
    Last Common Ancestor rice maize diverged 50-70 mya 70 mya duplication 11 mya duplication
  • Synteny
    • At least two pairs of genes with similar structure and function on the same chromosome
      • Order does not need to be conserved
    • Often found using sequenced genomes
    • We use a physical map and a genomic sequence
    Genome A c d e f g c d e f g Genome B
  • Physical Map
    • Expensive to sequence large genomes
    • A physical map provides partial ordering of pieces of DNA and pieces of genes
  • FPC Map
    • FingerPrinted Contigs
        • Soderlund et al. 1997
    • Type of physical map
    • Made up of clones
      • Snippets of DNA
      • We use BAC clones
        • Bacterial artificial chromosome clones
      • Stored in clone libraries
  • Making a BAC Clone Library
    • Take thousands of copies of a genome
    • Cut it up into overlapping pieces (~150,000 base pairs)
      • Restriction enzymes
        • Proteins that cut at specific DNA sequences
      • Partial digestion
        • Restriction enzymes not allowed to cut at all possible locations so that the clones overlap
  • Clones
    • Each clone is stored in a well on a microtiter plate
    • Do not know the order of the clones, or where each clone is on the chromosome
  • Clone Fingerprinting
    • Clone fingerprints are found to gather more information on a clone
    • Fully digest a clone using restriction enzymes
    • If two clones share many fragments, they may overlap
  • Clone Fingerprinting
    • Fragments are run on a gel
      • Shorter fragments migrate faster
      • Measure migration rate
    • False positives and false negatives
  • FPC
    • Assembles fingerprinted clones into contigs
      • Contig -> contiguous overlapping clones
    • Assembles into many contigs instead of one large contig
      • Unclonable regions
      • Uneven distribution
  • Markers
    • Markers are pieces of DNA
      • ~ 300 base pairs
    • Hybridization
      • A marker hybridizes to a clone when the clone contains the marker
  • BESs
    • Expensive to sequence entire clones
    • BAC End Sequences
      • BESs are sequences from the ends of BAC clones
      • ~800 base pairs
      • Do not know which end the sequence comes from
      • There are errors in the sequence
  • Anchors
    • Locations of two genomes found to be similar through a comparison of DNA sequences
    • We use marker sequences and BESs searched against a known genome sequence
      • Maize has an FPC map with markers and BESs
      • The rice genome is sequenced
    G G C C G T G G T G C T C T T T G C A A T G G G G G C T G T G G T G C T C T T C G C A A T G G G
  • Component Summary
  • Finding Chains
  • Key Synteny Finding Algorithms
    • Vandepoele et al. (2002) – ADHoRe
      • Variable gap size
      • Coefficient of determination to determine the quality of a synteny block
    • Haas et al. (2004) – DAGchainer
      • Directed acyclic graph
      • Dynamic programming
      • Gap penalty
  • Other Synteny Finding Algorithms
    • Key characteristics for us:
      • Dynamic programming
        • Ordering the anchors to form a DAG
      • Gap penalty
      • Variable gap size
    • Not appropriate for finding synteny using an FPC map
      • Do not consider the error conditions that arise
  • FPC to Genome Synteny
    • Properties associated with FPC
      • FPC maps do not cover the entire genome
      • False+ and False- hybridized markers
      • FPC coordinates are approximate
      • Which end of the parent clone a BES belongs to is unknown
  • FPC Synteny Properties 1 x 2 o 3 x 4 x 5 o 6 8 # x 9 x a x b x c x 7 x 1 2 3 4 5 6 7 8 9 a b c Genome A (FPC map) Genome B (sequenced genome)
  • Noise
  • SyMAP Algorithm
    • Anchor (a k , b l )
      • a k is the location on the FPC map of genome G A
      • b l is the location on the genomic sequence of G B
    • Directed Acyclic Graph
      • E = {u, v | |a k -a i |  M A and 0  b l -b j  M B }
        • where u = (a i , b j ), v = (a k , b l ) are anchors
      • Allows edges decreasing along G A
        • Catch off-diagonal anchors
        • Some inversions
  • SyMAP Algorithm
    • Manhattan distance function with scaling
      • D(v, w) =  |a k - a i | / t A + |b l - b j | / t B 
      • Average distance between anchors may be different
    • Dynamic Programming
      • Node(v) = 1 + Max(0, Max u  P(v) (Node(u) - D(u,v)))
        • P(v) is the set of edges (u,v)  E
      • 1 is the score given to an individual anchor
      • Plus the maximum path score for a previous node
      • Penalized by the distance between the nodes
  • SyMAP Algorithm
    • Chains must satisfy constraints
        • Number of anchors
        • Strength of line
          • Pearson correlation coefficient
      • Required to be more precisely linear the closer they are to the minimal number of anchors
      • Exception for small and dense chains
        • Lower correlation due to errors in the assignment of BES ends or clone ordering within a contig
  • Sytry
    • Tool for testing synteny finding algorithms
    • Allows for modifying the parameters of an algorithm and rerunning
    • Results are shown as a dot plot
      • Need to visually confirm results, as correct
      • Correct is what looks right to the user
  • Automated Parameter Setting
    • Difficult to set parameters (e.g., t A and t B )
      • Effects of changes can be unclear
      • Dependent on average distance between anchors and noise
        • Optimal values vary between regions
    • Have the algorithm set the gap parameters
      • Attempt to optimize t x for each chain
  • Sub-Chains
    • Overall orientation of a synteny chain may not be accurate for sub-chains
  • Sub-Chain Finder
    • Use only anchors that are part of a chain
    • Define distance between anchors in terms of the number of anchors that fall between the anchors
    • A significant gap signals the start of a possible inversion
  • Sub-Chains
    • Evolutionary history
      • e.g., total number of inversions
    • Assigning an accurate orientation to all anchors in a chain
      • Beneficial for fixing the clone end assignment of BES
  • BES Clone End Assignments
    • BESs are arbitrarily assigned to clone ends
      • Algorithm takes this into account
      • However, the synteny when viewing can be distorted
    • Orientation can be used to correct BES assignments
  • BES Clone End Assignments
    • positive orientation -> lines should not cross
    2 x 3 o 4 5 6 o 7 x 8 x 1 2 3 4 5 6 7 8 A B 1 x B A 2 3 4 5 6 7 8 2 3 4 5 6 7 8 1 1
  • BES Clone End Assignments
    • negative orientation -> lines should cross
    7 x 6 o 5 4 3 o 2 x 1 x 1 2 3 4 5 6 7 8 A 8 x B B A 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 8
  • SyMAP Views
    • Accessible through a web browser
    • Static views
      • All synteny blocks ↔ sequenced chromosomes
      • Synteny blocks ↔ sequenced chromosome
    • Interactive views
      • Dot plot view
        • Genome to genome
        • Chromosome to chromosome
      • Alignment view
        • FPC ↔ sequenced chromosome
        • FPC ↔ FPC
        • FPC ↔ sequenced chromosome ↔ FPC
      • Close-up view
        • FPC ↔ sequenced chromosome
  • All Blocks ↔ Sequenced Chromosomes
  • Blocks ↔ Sequenced Chromosome
  • Genome ↔ Genome Dot Plot
  • Chromosome ↔ Chromosome Dot Plot
  • Block ↔ Sequenced Chromosome
  • Subset Flipped
  • Contig ↔ Sequenced Chromosome
  • Filters and Controls
  • FPC ↔ Sequenced Chromosome ↔ FPC
  • FPC ↔ FPC
  • Close-up of Gene
  • SyMAP Implementation
    • Caching is needed:
      • Downloads large amounts of data from remote database
      • History feature
        • Navigating back and forth between the same views
    • Soft References
      • Remain alive as long as the memory is available
    • Data objects
      • Hold data in a compact form
      • Converted to view objects when needed
  • Results
    • www.agcol.arizona.edu/symap
      • Maize and sorghum aligned to rice
      • Maize FPC aligned to sorghum FPC
    • Used in editing the maize FPC maps based on its alignment to rice (Wei et al., in preparation)
    • Alignment of maize to rice chromosome 3
      • Buell et al. (2005)
    • Used in OMAP project
      • Aligning 12 species of rice to the sequenced genome of rice (Wing et al., in preparation)
  • Acknowledgements
    • Thesis Committee
      • Dr. Cari Soderlund, thesis advisor
      • Dr. Peter Downey
      • Dr. Kobus Bernard
    • This work is funded in part by NSF DBI #0115903
    • www.agcol.arizona.edu/symap