SyMAP Synteny Mapping and Analysis Program Austin Shoemaker
SyMAP Team <ul><li>Dr. Cari Soderlund </li></ul><ul><li>Dr. Will Nelson </li></ul><ul><li>Austin Shoemaker </li></ul><ul><...
Background <ul><li>Comparative Genomics </li></ul><ul><li>Physical Map </li></ul><ul><li>Computing Synteny </li></ul><ul><...
Comparative Genomics <ul><li>Compare genomes of different species </li></ul><ul><li>Knowledge of one helps understand the ...
Genome Rearrangements Rearrangement  Scenario  Result Inversion Duplication Insertion Deletion A BCD E  A DCB E A B C  AC ...
Whole-Genome Duplication <ul><li>mya (million years ago) </li></ul>Last Common Ancestor rice maize diverged 50-70 mya 70 m...
Synteny <ul><li>At least two pairs of genes with similar structure and function on the same chromosome </li></ul><ul><ul><...
Physical Map <ul><li>Expensive to sequence large genomes </li></ul><ul><li>A physical map provides partial ordering of pie...
FPC Map <ul><li>FingerPrinted Contigs </li></ul><ul><ul><ul><li>Soderlund et al. 1997 </li></ul></ul></ul><ul><li>Type of ...
Making a BAC Clone Library <ul><li>Take thousands of copies of a genome </li></ul><ul><li>Cut it up into overlapping piece...
Clones <ul><li>Each clone is stored in a well on a microtiter plate </li></ul><ul><li>Do not know the order of the clones,...
Clone Fingerprinting <ul><li>Clone fingerprints are found to gather more information on a clone </li></ul><ul><li>Fully di...
Clone Fingerprinting <ul><li>Fragments are run on a gel </li></ul><ul><ul><li>Shorter fragments migrate faster </li></ul><...
FPC <ul><li>Assembles fingerprinted clones into contigs </li></ul><ul><ul><li>Contig -> contiguous overlapping clones </li...
Markers <ul><li>Markers are pieces of DNA </li></ul><ul><ul><li>~ 300 base pairs </li></ul></ul><ul><li>Hybridization </li...
BESs <ul><li>Expensive to sequence entire clones </li></ul><ul><li>BAC End Sequences </li></ul><ul><ul><li>BESs are sequen...
Anchors <ul><li>Locations of two genomes found to be similar through a comparison of DNA sequences </li></ul><ul><li>We us...
Component Summary
Finding Chains
Key Synteny Finding Algorithms <ul><li>Vandepoele et al. (2002) – ADHoRe </li></ul><ul><ul><li>Variable gap size </li></ul...
Other Synteny Finding Algorithms <ul><li>Key characteristics for us: </li></ul><ul><ul><li>Dynamic programming </li></ul><...
FPC to Genome Synteny <ul><li>Properties associated with FPC </li></ul><ul><ul><li>FPC maps do not cover the entire genome...
FPC Synteny Properties 1  x 2  o 3  x  4  x  5  o 6  8  #  x  9  x a  x b  x c  x 7  x 1  2  3  4  5  6  7  8  9  a  b  c ...
Noise
SyMAP Algorithm <ul><li>Anchor (a k , b l )  </li></ul><ul><ul><li>a k  is the location on the FPC map of genome G A </li>...
SyMAP Algorithm <ul><li>Manhattan distance function with scaling </li></ul><ul><ul><li>D(v, w) =   |a k  - a i | / t A  +...
SyMAP Algorithm <ul><li>Chains must satisfy constraints </li></ul><ul><ul><ul><li>Number of anchors </li></ul></ul></ul><u...
Sytry <ul><li>Tool for testing synteny finding algorithms </li></ul><ul><li>Allows for modifying the parameters of an algo...
Automated Parameter Setting <ul><li>Difficult to set parameters (e.g.,  t A  and t B ) </li></ul><ul><ul><li>Effects of ch...
Sub-Chains <ul><li>Overall orientation of a synteny chain may not be accurate for sub-chains </li></ul>
Sub-Chain Finder <ul><li>Use only anchors that are part of a chain </li></ul><ul><li>Define distance between anchors in te...
Sub-Chains <ul><li>Evolutionary history </li></ul><ul><ul><li>e.g., total number of inversions </li></ul></ul><ul><li>Assi...
BES Clone End Assignments <ul><li>BESs are arbitrarily assigned to clone ends </li></ul><ul><ul><li>Algorithm takes this i...
BES Clone End Assignments <ul><li>positive orientation -> lines should not cross </li></ul>2  x 3  o 4  5  6  o 7  x 8  x ...
BES Clone End Assignments <ul><li>negative orientation -> lines should cross </li></ul>7  x 6  o 5 4  3  o 2  x 1  x 1  2 ...
SyMAP Views <ul><li>Accessible through a web browser </li></ul><ul><li>Static views </li></ul><ul><ul><li>All synteny bloc...
All Blocks ↔ Sequenced Chromosomes
Blocks ↔ Sequenced Chromosome
Genome ↔ Genome Dot Plot
Chromosome ↔ Chromosome Dot Plot
Block ↔ Sequenced Chromosome
Subset Flipped
Contig ↔ Sequenced Chromosome
Filters and Controls
FPC ↔ Sequenced Chromosome ↔ FPC
FPC ↔ FPC
Close-up of Gene
SyMAP Implementation <ul><li>Caching is needed: </li></ul><ul><ul><li>Downloads large amounts of data from remote database...
Results <ul><li>www.agcol.arizona.edu/symap </li></ul><ul><ul><li>Maize and sorghum aligned to rice </li></ul></ul><ul><ul...
Acknowledgements <ul><li>Thesis Committee </li></ul><ul><ul><li>Dr. Cari Soderlund, thesis advisor </li></ul></ul><ul><ul>...
Upcoming SlideShare
Loading in...5
×

SyMAP Master's Thesis Presentation

1,283
-1

Published on

My master's thesis on SyMAP, a synteny mapping and analysis program.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,283
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
20
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Hello. I am Austin Shoemaker and I’m going to be presenting SyMAP, a synteny mapping and analysis program.
  • SyMAP Master's Thesis Presentation

    1. 1. SyMAP Synteny Mapping and Analysis Program Austin Shoemaker
    2. 2. SyMAP Team <ul><li>Dr. Cari Soderlund </li></ul><ul><li>Dr. Will Nelson </li></ul><ul><li>Austin Shoemaker </li></ul><ul><ul><li>Interactive SyMAP views </li></ul></ul><ul><ul><li>Sytry </li></ul></ul><ul><ul><ul><li>Testing environment for synteny finding algorithms </li></ul></ul></ul><ul><ul><li>Worked with the team on: </li></ul></ul><ul><ul><ul><li>The synteny finding algorithm </li></ul></ul></ul><ul><ul><ul><li>MySQL database schema </li></ul></ul></ul>
    3. 3. Background <ul><li>Comparative Genomics </li></ul><ul><li>Physical Map </li></ul><ul><li>Computing Synteny </li></ul><ul><li>Properties of FPC to Genome Synteny </li></ul>
    4. 4. Comparative Genomics <ul><li>Compare genomes of different species </li></ul><ul><li>Knowledge of one helps understand the other </li></ul><ul><ul><li>Gene Function </li></ul></ul><ul><ul><ul><li>Organism O 1 has a gene G 1 </li></ul></ul></ul><ul><ul><ul><li>Organism O 2 has a gene G 2 with a sequence similar to G 1 </li></ul></ul></ul><ul><ul><ul><li>G 1 and G 2 may have similar functions </li></ul></ul></ul><ul><ul><li>Evolutionary History </li></ul></ul><ul><ul><ul><li>Genome rearrangements </li></ul></ul></ul>
    5. 5. Genome Rearrangements Rearrangement Scenario Result Inversion Duplication Insertion Deletion A BCD E A DCB E A B C AC AB A C B A B CD A B B CD A B C A B C B , A BB C AB AB AB
    6. 6. Whole-Genome Duplication <ul><li>mya (million years ago) </li></ul>Last Common Ancestor rice maize diverged 50-70 mya 70 mya duplication 11 mya duplication
    7. 7. Synteny <ul><li>At least two pairs of genes with similar structure and function on the same chromosome </li></ul><ul><ul><li>Order does not need to be conserved </li></ul></ul><ul><li>Often found using sequenced genomes </li></ul><ul><li>We use a physical map and a genomic sequence </li></ul>Genome A c d e f g c d e f g Genome B
    8. 8. Physical Map <ul><li>Expensive to sequence large genomes </li></ul><ul><li>A physical map provides partial ordering of pieces of DNA and pieces of genes </li></ul>
    9. 9. FPC Map <ul><li>FingerPrinted Contigs </li></ul><ul><ul><ul><li>Soderlund et al. 1997 </li></ul></ul></ul><ul><li>Type of physical map </li></ul><ul><li>Made up of clones </li></ul><ul><ul><li>Snippets of DNA </li></ul></ul><ul><ul><li>We use BAC clones </li></ul></ul><ul><ul><ul><li>Bacterial artificial chromosome clones </li></ul></ul></ul><ul><ul><li>Stored in clone libraries </li></ul></ul>
    10. 10. Making a BAC Clone Library <ul><li>Take thousands of copies of a genome </li></ul><ul><li>Cut it up into overlapping pieces (~150,000 base pairs) </li></ul><ul><ul><li>Restriction enzymes </li></ul></ul><ul><ul><ul><li>Proteins that cut at specific DNA sequences </li></ul></ul></ul><ul><ul><li>Partial digestion </li></ul></ul><ul><ul><ul><li>Restriction enzymes not allowed to cut at all possible locations so that the clones overlap </li></ul></ul></ul>
    11. 11. Clones <ul><li>Each clone is stored in a well on a microtiter plate </li></ul><ul><li>Do not know the order of the clones, or where each clone is on the chromosome </li></ul>
    12. 12. Clone Fingerprinting <ul><li>Clone fingerprints are found to gather more information on a clone </li></ul><ul><li>Fully digest a clone using restriction enzymes </li></ul><ul><li>If two clones share many fragments, they may overlap </li></ul>
    13. 13. Clone Fingerprinting <ul><li>Fragments are run on a gel </li></ul><ul><ul><li>Shorter fragments migrate faster </li></ul></ul><ul><ul><li>Measure migration rate </li></ul></ul><ul><li>False positives and false negatives </li></ul>
    14. 14. FPC <ul><li>Assembles fingerprinted clones into contigs </li></ul><ul><ul><li>Contig -> contiguous overlapping clones </li></ul></ul><ul><li>Assembles into many contigs instead of one large contig </li></ul><ul><ul><li>Unclonable regions </li></ul></ul><ul><ul><li>Uneven distribution </li></ul></ul>
    15. 15. Markers <ul><li>Markers are pieces of DNA </li></ul><ul><ul><li>~ 300 base pairs </li></ul></ul><ul><li>Hybridization </li></ul><ul><ul><li>A marker hybridizes to a clone when the clone contains the marker </li></ul></ul>
    16. 16. BESs <ul><li>Expensive to sequence entire clones </li></ul><ul><li>BAC End Sequences </li></ul><ul><ul><li>BESs are sequences from the ends of BAC clones </li></ul></ul><ul><ul><li>~800 base pairs </li></ul></ul><ul><ul><li>Do not know which end the sequence comes from </li></ul></ul><ul><ul><li>There are errors in the sequence </li></ul></ul>
    17. 17. Anchors <ul><li>Locations of two genomes found to be similar through a comparison of DNA sequences </li></ul><ul><li>We use marker sequences and BESs searched against a known genome sequence </li></ul><ul><ul><li>Maize has an FPC map with markers and BESs </li></ul></ul><ul><ul><li>The rice genome is sequenced </li></ul></ul>G G C C G T G G T G C T C T T T G C A A T G G G G G C T G T G G T G C T C T T C G C A A T G G G
    18. 18. Component Summary
    19. 19. Finding Chains
    20. 20. Key Synteny Finding Algorithms <ul><li>Vandepoele et al. (2002) – ADHoRe </li></ul><ul><ul><li>Variable gap size </li></ul></ul><ul><ul><li>Coefficient of determination to determine the quality of a synteny block </li></ul></ul><ul><li>Haas et al. (2004) – DAGchainer </li></ul><ul><ul><li>Directed acyclic graph </li></ul></ul><ul><ul><li>Dynamic programming </li></ul></ul><ul><ul><li>Gap penalty </li></ul></ul>
    21. 21. Other Synteny Finding Algorithms <ul><li>Key characteristics for us: </li></ul><ul><ul><li>Dynamic programming </li></ul></ul><ul><ul><ul><li>Ordering the anchors to form a DAG </li></ul></ul></ul><ul><ul><li>Gap penalty </li></ul></ul><ul><ul><li>Variable gap size </li></ul></ul><ul><li>Not appropriate for finding synteny using an FPC map </li></ul><ul><ul><li>Do not consider the error conditions that arise </li></ul></ul>
    22. 22. FPC to Genome Synteny <ul><li>Properties associated with FPC </li></ul><ul><ul><li>FPC maps do not cover the entire genome </li></ul></ul><ul><ul><li>False+ and False- hybridized markers </li></ul></ul><ul><ul><li>FPC coordinates are approximate </li></ul></ul><ul><ul><li>Which end of the parent clone a BES belongs to is unknown </li></ul></ul>
    23. 23. FPC Synteny Properties 1 x 2 o 3 x 4 x 5 o 6 8 # x 9 x a x b x c x 7 x 1 2 3 4 5 6 7 8 9 a b c Genome A (FPC map) Genome B (sequenced genome)
    24. 24. Noise
    25. 25. SyMAP Algorithm <ul><li>Anchor (a k , b l ) </li></ul><ul><ul><li>a k is the location on the FPC map of genome G A </li></ul></ul><ul><ul><li>b l is the location on the genomic sequence of G B </li></ul></ul><ul><li>Directed Acyclic Graph </li></ul><ul><ul><li>E = {u, v | |a k -a i |  M A and 0  b l -b j  M B } </li></ul></ul><ul><ul><ul><li>where u = (a i , b j ), v = (a k , b l ) are anchors </li></ul></ul></ul><ul><ul><li>Allows edges decreasing along G A </li></ul></ul><ul><ul><ul><li>Catch off-diagonal anchors </li></ul></ul></ul><ul><ul><ul><li>Some inversions </li></ul></ul></ul>
    26. 26. SyMAP Algorithm <ul><li>Manhattan distance function with scaling </li></ul><ul><ul><li>D(v, w) =  |a k - a i | / t A + |b l - b j | / t B  </li></ul></ul><ul><ul><li>Average distance between anchors may be different </li></ul></ul><ul><li>Dynamic Programming </li></ul><ul><ul><li>Node(v) = 1 + Max(0, Max u  P(v) (Node(u) - D(u,v))) </li></ul></ul><ul><ul><ul><li>P(v) is the set of edges (u,v)  E </li></ul></ul></ul><ul><ul><li>1 is the score given to an individual anchor </li></ul></ul><ul><ul><li>Plus the maximum path score for a previous node </li></ul></ul><ul><ul><li>Penalized by the distance between the nodes </li></ul></ul>
    27. 27. SyMAP Algorithm <ul><li>Chains must satisfy constraints </li></ul><ul><ul><ul><li>Number of anchors </li></ul></ul></ul><ul><ul><ul><li>Strength of line </li></ul></ul></ul><ul><ul><ul><ul><li>Pearson correlation coefficient </li></ul></ul></ul></ul><ul><ul><li>Required to be more precisely linear the closer they are to the minimal number of anchors </li></ul></ul><ul><ul><li>Exception for small and dense chains </li></ul></ul><ul><ul><ul><li>Lower correlation due to errors in the assignment of BES ends or clone ordering within a contig </li></ul></ul></ul>
    28. 28. Sytry <ul><li>Tool for testing synteny finding algorithms </li></ul><ul><li>Allows for modifying the parameters of an algorithm and rerunning </li></ul><ul><li>Results are shown as a dot plot </li></ul><ul><ul><li>Need to visually confirm results, as correct </li></ul></ul><ul><ul><li>Correct is what looks right to the user </li></ul></ul>
    29. 29. Automated Parameter Setting <ul><li>Difficult to set parameters (e.g., t A and t B ) </li></ul><ul><ul><li>Effects of changes can be unclear </li></ul></ul><ul><ul><li>Dependent on average distance between anchors and noise </li></ul></ul><ul><ul><ul><li>Optimal values vary between regions </li></ul></ul></ul><ul><li>Have the algorithm set the gap parameters </li></ul><ul><ul><li>Attempt to optimize t x for each chain </li></ul></ul>
    30. 30. Sub-Chains <ul><li>Overall orientation of a synteny chain may not be accurate for sub-chains </li></ul>
    31. 31. Sub-Chain Finder <ul><li>Use only anchors that are part of a chain </li></ul><ul><li>Define distance between anchors in terms of the number of anchors that fall between the anchors </li></ul><ul><li>A significant gap signals the start of a possible inversion </li></ul>
    32. 32. Sub-Chains <ul><li>Evolutionary history </li></ul><ul><ul><li>e.g., total number of inversions </li></ul></ul><ul><li>Assigning an accurate orientation to all anchors in a chain </li></ul><ul><ul><li>Beneficial for fixing the clone end assignment of BES </li></ul></ul>
    33. 33. BES Clone End Assignments <ul><li>BESs are arbitrarily assigned to clone ends </li></ul><ul><ul><li>Algorithm takes this into account </li></ul></ul><ul><ul><li>However, the synteny when viewing can be distorted </li></ul></ul><ul><li>Orientation can be used to correct BES assignments </li></ul>
    34. 34. BES Clone End Assignments <ul><li>positive orientation -> lines should not cross </li></ul>2 x 3 o 4 5 6 o 7 x 8 x 1 2 3 4 5 6 7 8 A B 1 x B A 2 3 4 5 6 7 8 2 3 4 5 6 7 8 1 1
    35. 35. BES Clone End Assignments <ul><li>negative orientation -> lines should cross </li></ul>7 x 6 o 5 4 3 o 2 x 1 x 1 2 3 4 5 6 7 8 A 8 x B B A 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 8
    36. 36. SyMAP Views <ul><li>Accessible through a web browser </li></ul><ul><li>Static views </li></ul><ul><ul><li>All synteny blocks ↔ sequenced chromosomes </li></ul></ul><ul><ul><li>Synteny blocks ↔ sequenced chromosome </li></ul></ul><ul><li>Interactive views </li></ul><ul><ul><li>Dot plot view </li></ul></ul><ul><ul><ul><li>Genome to genome </li></ul></ul></ul><ul><ul><ul><li>Chromosome to chromosome </li></ul></ul></ul><ul><ul><li>Alignment view </li></ul></ul><ul><ul><ul><li>FPC ↔ sequenced chromosome </li></ul></ul></ul><ul><ul><ul><li>FPC ↔ FPC </li></ul></ul></ul><ul><ul><ul><li>FPC ↔ sequenced chromosome ↔ FPC </li></ul></ul></ul><ul><ul><li>Close-up view </li></ul></ul><ul><ul><ul><li>FPC ↔ sequenced chromosome </li></ul></ul></ul>
    37. 37. All Blocks ↔ Sequenced Chromosomes
    38. 38. Blocks ↔ Sequenced Chromosome
    39. 39. Genome ↔ Genome Dot Plot
    40. 40. Chromosome ↔ Chromosome Dot Plot
    41. 41. Block ↔ Sequenced Chromosome
    42. 42. Subset Flipped
    43. 43. Contig ↔ Sequenced Chromosome
    44. 44. Filters and Controls
    45. 45. FPC ↔ Sequenced Chromosome ↔ FPC
    46. 46. FPC ↔ FPC
    47. 47. Close-up of Gene
    48. 48. SyMAP Implementation <ul><li>Caching is needed: </li></ul><ul><ul><li>Downloads large amounts of data from remote database </li></ul></ul><ul><ul><li>History feature </li></ul></ul><ul><ul><ul><li>Navigating back and forth between the same views </li></ul></ul></ul><ul><li>Soft References </li></ul><ul><ul><li>Remain alive as long as the memory is available </li></ul></ul><ul><li>Data objects </li></ul><ul><ul><li>Hold data in a compact form </li></ul></ul><ul><ul><li>Converted to view objects when needed </li></ul></ul>
    49. 49. Results <ul><li>www.agcol.arizona.edu/symap </li></ul><ul><ul><li>Maize and sorghum aligned to rice </li></ul></ul><ul><ul><li>Maize FPC aligned to sorghum FPC </li></ul></ul><ul><li>Used in editing the maize FPC maps based on its alignment to rice (Wei et al., in preparation) </li></ul><ul><li>Alignment of maize to rice chromosome 3 </li></ul><ul><ul><li>Buell et al. (2005) </li></ul></ul><ul><li>Used in OMAP project </li></ul><ul><ul><li>Aligning 12 species of rice to the sequenced genome of rice (Wing et al., in preparation) </li></ul></ul>
    50. 50. Acknowledgements <ul><li>Thesis Committee </li></ul><ul><ul><li>Dr. Cari Soderlund, thesis advisor </li></ul></ul><ul><ul><li>Dr. Peter Downey </li></ul></ul><ul><ul><li>Dr. Kobus Bernard </li></ul></ul><ul><li>This work is funded in part by NSF DBI #0115903 </li></ul><ul><li>www.agcol.arizona.edu/symap </li></ul>
    1. ¿Le ha llamado la atención una diapositiva en particular?

      Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

    ×