Your SlideShare is downloading. ×
0
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

SyMAP Master's Thesis Presentation

1,184

Published on

My master's thesis on SyMAP, a synteny mapping and analysis program.

My master's thesis on SyMAP, a synteny mapping and analysis program.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,184
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Hello. I am Austin Shoemaker and I’m going to be presenting SyMAP, a synteny mapping and analysis program.
  • Transcript

    • 1. SyMAP Synteny Mapping and Analysis Program Austin Shoemaker
    • 2. SyMAP Team
      • Dr. Cari Soderlund
      • Dr. Will Nelson
      • Austin Shoemaker
        • Interactive SyMAP views
        • Sytry
          • Testing environment for synteny finding algorithms
        • Worked with the team on:
          • The synteny finding algorithm
          • MySQL database schema
    • 3. Background
      • Comparative Genomics
      • Physical Map
      • Computing Synteny
      • Properties of FPC to Genome Synteny
    • 4. Comparative Genomics
      • Compare genomes of different species
      • Knowledge of one helps understand the other
        • Gene Function
          • Organism O 1 has a gene G 1
          • Organism O 2 has a gene G 2 with a sequence similar to G 1
          • G 1 and G 2 may have similar functions
        • Evolutionary History
          • Genome rearrangements
    • 5. Genome Rearrangements Rearrangement Scenario Result Inversion Duplication Insertion Deletion A BCD E A DCB E A B C AC AB A C B A B CD A B B CD A B C A B C B , A BB C AB AB AB
    • 6. Whole-Genome Duplication
      • mya (million years ago)
      Last Common Ancestor rice maize diverged 50-70 mya 70 mya duplication 11 mya duplication
    • 7. Synteny
      • At least two pairs of genes with similar structure and function on the same chromosome
        • Order does not need to be conserved
      • Often found using sequenced genomes
      • We use a physical map and a genomic sequence
      Genome A c d e f g c d e f g Genome B
    • 8. Physical Map
      • Expensive to sequence large genomes
      • A physical map provides partial ordering of pieces of DNA and pieces of genes
    • 9. FPC Map
      • FingerPrinted Contigs
          • Soderlund et al. 1997
      • Type of physical map
      • Made up of clones
        • Snippets of DNA
        • We use BAC clones
          • Bacterial artificial chromosome clones
        • Stored in clone libraries
    • 10. Making a BAC Clone Library
      • Take thousands of copies of a genome
      • Cut it up into overlapping pieces (~150,000 base pairs)
        • Restriction enzymes
          • Proteins that cut at specific DNA sequences
        • Partial digestion
          • Restriction enzymes not allowed to cut at all possible locations so that the clones overlap
    • 11. Clones
      • Each clone is stored in a well on a microtiter plate
      • Do not know the order of the clones, or where each clone is on the chromosome
    • 12. Clone Fingerprinting
      • Clone fingerprints are found to gather more information on a clone
      • Fully digest a clone using restriction enzymes
      • If two clones share many fragments, they may overlap
    • 13. Clone Fingerprinting
      • Fragments are run on a gel
        • Shorter fragments migrate faster
        • Measure migration rate
      • False positives and false negatives
    • 14. FPC
      • Assembles fingerprinted clones into contigs
        • Contig -> contiguous overlapping clones
      • Assembles into many contigs instead of one large contig
        • Unclonable regions
        • Uneven distribution
    • 15. Markers
      • Markers are pieces of DNA
        • ~ 300 base pairs
      • Hybridization
        • A marker hybridizes to a clone when the clone contains the marker
    • 16. BESs
      • Expensive to sequence entire clones
      • BAC End Sequences
        • BESs are sequences from the ends of BAC clones
        • ~800 base pairs
        • Do not know which end the sequence comes from
        • There are errors in the sequence
    • 17. Anchors
      • Locations of two genomes found to be similar through a comparison of DNA sequences
      • We use marker sequences and BESs searched against a known genome sequence
        • Maize has an FPC map with markers and BESs
        • The rice genome is sequenced
      G G C C G T G G T G C T C T T T G C A A T G G G G G C T G T G G T G C T C T T C G C A A T G G G
    • 18. Component Summary
    • 19. Finding Chains
    • 20. Key Synteny Finding Algorithms
      • Vandepoele et al. (2002) – ADHoRe
        • Variable gap size
        • Coefficient of determination to determine the quality of a synteny block
      • Haas et al. (2004) – DAGchainer
        • Directed acyclic graph
        • Dynamic programming
        • Gap penalty
    • 21. Other Synteny Finding Algorithms
      • Key characteristics for us:
        • Dynamic programming
          • Ordering the anchors to form a DAG
        • Gap penalty
        • Variable gap size
      • Not appropriate for finding synteny using an FPC map
        • Do not consider the error conditions that arise
    • 22. FPC to Genome Synteny
      • Properties associated with FPC
        • FPC maps do not cover the entire genome
        • False+ and False- hybridized markers
        • FPC coordinates are approximate
        • Which end of the parent clone a BES belongs to is unknown
    • 23. FPC Synteny Properties 1 x 2 o 3 x 4 x 5 o 6 8 # x 9 x a x b x c x 7 x 1 2 3 4 5 6 7 8 9 a b c Genome A (FPC map) Genome B (sequenced genome)
    • 24. Noise
    • 25. SyMAP Algorithm
      • Anchor (a k , b l )
        • a k is the location on the FPC map of genome G A
        • b l is the location on the genomic sequence of G B
      • Directed Acyclic Graph
        • E = {u, v | |a k -a i |  M A and 0  b l -b j  M B }
          • where u = (a i , b j ), v = (a k , b l ) are anchors
        • Allows edges decreasing along G A
          • Catch off-diagonal anchors
          • Some inversions
    • 26. SyMAP Algorithm
      • Manhattan distance function with scaling
        • D(v, w) =  |a k - a i | / t A + |b l - b j | / t B 
        • Average distance between anchors may be different
      • Dynamic Programming
        • Node(v) = 1 + Max(0, Max u  P(v) (Node(u) - D(u,v)))
          • P(v) is the set of edges (u,v)  E
        • 1 is the score given to an individual anchor
        • Plus the maximum path score for a previous node
        • Penalized by the distance between the nodes
    • 27. SyMAP Algorithm
      • Chains must satisfy constraints
          • Number of anchors
          • Strength of line
            • Pearson correlation coefficient
        • Required to be more precisely linear the closer they are to the minimal number of anchors
        • Exception for small and dense chains
          • Lower correlation due to errors in the assignment of BES ends or clone ordering within a contig
    • 28. Sytry
      • Tool for testing synteny finding algorithms
      • Allows for modifying the parameters of an algorithm and rerunning
      • Results are shown as a dot plot
        • Need to visually confirm results, as correct
        • Correct is what looks right to the user
    • 29. Automated Parameter Setting
      • Difficult to set parameters (e.g., t A and t B )
        • Effects of changes can be unclear
        • Dependent on average distance between anchors and noise
          • Optimal values vary between regions
      • Have the algorithm set the gap parameters
        • Attempt to optimize t x for each chain
    • 30. Sub-Chains
      • Overall orientation of a synteny chain may not be accurate for sub-chains
    • 31. Sub-Chain Finder
      • Use only anchors that are part of a chain
      • Define distance between anchors in terms of the number of anchors that fall between the anchors
      • A significant gap signals the start of a possible inversion
    • 32. Sub-Chains
      • Evolutionary history
        • e.g., total number of inversions
      • Assigning an accurate orientation to all anchors in a chain
        • Beneficial for fixing the clone end assignment of BES
    • 33. BES Clone End Assignments
      • BESs are arbitrarily assigned to clone ends
        • Algorithm takes this into account
        • However, the synteny when viewing can be distorted
      • Orientation can be used to correct BES assignments
    • 34. BES Clone End Assignments
      • positive orientation -> lines should not cross
      2 x 3 o 4 5 6 o 7 x 8 x 1 2 3 4 5 6 7 8 A B 1 x B A 2 3 4 5 6 7 8 2 3 4 5 6 7 8 1 1
    • 35. BES Clone End Assignments
      • negative orientation -> lines should cross
      7 x 6 o 5 4 3 o 2 x 1 x 1 2 3 4 5 6 7 8 A 8 x B B A 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 8
    • 36. SyMAP Views
      • Accessible through a web browser
      • Static views
        • All synteny blocks ↔ sequenced chromosomes
        • Synteny blocks ↔ sequenced chromosome
      • Interactive views
        • Dot plot view
          • Genome to genome
          • Chromosome to chromosome
        • Alignment view
          • FPC ↔ sequenced chromosome
          • FPC ↔ FPC
          • FPC ↔ sequenced chromosome ↔ FPC
        • Close-up view
          • FPC ↔ sequenced chromosome
    • 37. All Blocks ↔ Sequenced Chromosomes
    • 38. Blocks ↔ Sequenced Chromosome
    • 39. Genome ↔ Genome Dot Plot
    • 40. Chromosome ↔ Chromosome Dot Plot
    • 41. Block ↔ Sequenced Chromosome
    • 42. Subset Flipped
    • 43. Contig ↔ Sequenced Chromosome
    • 44. Filters and Controls
    • 45. FPC ↔ Sequenced Chromosome ↔ FPC
    • 46. FPC ↔ FPC
    • 47. Close-up of Gene
    • 48. SyMAP Implementation
      • Caching is needed:
        • Downloads large amounts of data from remote database
        • History feature
          • Navigating back and forth between the same views
      • Soft References
        • Remain alive as long as the memory is available
      • Data objects
        • Hold data in a compact form
        • Converted to view objects when needed
    • 49. Results
      • www.agcol.arizona.edu/symap
        • Maize and sorghum aligned to rice
        • Maize FPC aligned to sorghum FPC
      • Used in editing the maize FPC maps based on its alignment to rice (Wei et al., in preparation)
      • Alignment of maize to rice chromosome 3
        • Buell et al. (2005)
      • Used in OMAP project
        • Aligning 12 species of rice to the sequenced genome of rice (Wing et al., in preparation)
    • 50. Acknowledgements
      • Thesis Committee
        • Dr. Cari Soderlund, thesis advisor
        • Dr. Peter Downey
        • Dr. Kobus Bernard
      • This work is funded in part by NSF DBI #0115903
      • www.agcol.arizona.edu/symap

    ×