Tree distance algorithm

14,335 views

Published on

Published in: Education, Business
  • Be the first to comment

Tree distance algorithm

  1. 1. Workshop on tree distance By Hector Franco francoph at tcd dot ie Trinity College of Dublin
  2. 2. summary <ul><li>Levenshtein algorithm & Sub-string matching </li></ul><ul><li>The tree edit distance </li></ul><ul><li>Basic tree concepts </li></ul><ul><li>Maps </li></ul><ul><li>Tai maps </li></ul><ul><li>Tree distance </li></ul><ul><li>Zhang Shasha algorithm & alignment </li></ul>
  3. 3. summary <ul><li>Levenshtein algorithm & Sub-string matching </li></ul><ul><li>The tree edit distance </li></ul><ul><li>Basic tree concepts </li></ul><ul><li>Maps </li></ul><ul><li>Tai maps </li></ul><ul><li>Tree distance </li></ul><ul><li>Zhang Shasha algorithm & alignment </li></ul>
  4. 4. String distance metrics: Levenshtein <ul><li>Edit-distance metrics </li></ul><ul><ul><li>Precursor of tree distance. </li></ul></ul><ul><ul><li>Distance is shortest sequence of edit commands that transform s to t. (meaning the sequence of edit command that sum less cost of mapping s to t) </li></ul></ul><ul><ul><li>Simplest set of operations: </li></ul></ul><ul><ul><ul><li>Copy/map character from s over to t, (cost 0) </li></ul></ul></ul><ul><ul><ul><li>Delete a character in s (cost 1) </li></ul></ul></ul><ul><ul><ul><li>Insert a character in t (cost 1) </li></ul></ul></ul><ul><ul><ul><li>Substitute one character for another (cost 1) </li></ul></ul></ul><ul><ul><li>This is “Levenshtein distance” </li></ul></ul>
  5. 5. Levenshtein distance - example <ul><li>distance(“William Cohen”, “Willliam Cohon”) </li></ul>S Domain: T Range: op cost alignment W I L L I A M _ C O H E N W I L L L I A M _ C O H O N C C C C I C C C C C C C S C 0 0 0 0 1 1 1 1 1 1 1 1 2 2
  6. 6. Computing Levenshtein distance D(i,j) = score of best alignment from s1..si to t1..tj = min D(i-1,j-1) + d(si,tj) //subst/copy D(i-1,j)+1 //insert D(i,j-1)+1 //delete (simplify by letting d(c,d)=0 if c=d, 1 else) also let D(i,0)=i (for i inserts) and D(0,j)=j
  7. 7. Computing Levenshtein distance - 3 D(i,j)= min D(i-1,j-1) + d(si,tj) //subst/copy D(i-1,j)+1 //insert D(i,j-1)+1 //delete The yellow row and column, correspond to the row 0 and column 0 of the table, and they are initiated in increasing order. For (int x = 0; x<size(target), x++) D(0,x) = x; For (int x = 0; x<size(source), x++) D(x,0) = x; d(si,tj) represents the cost of change the letter si into the letter tj, where if the letter is the same the cost will be 0 and if is a different letter the cost will be 1. = D( s,t ) C O H E N 0 1 2 3 4 5 M 1 1 2 3 4 5 C 2 1 2 3 4 5 C 3 2 2 3 4 5 O 4 3 2 3 4 5 H 5 4 3 2 3 4 N 6 5 4 3 3 3
  8. 8. Computing Levenshtein distance - 3 D(i,j)= min D(i-1,j-1) + d(si,tj) //subst/copy D(i-1,j)+1 //insert D(i,j-1)+1 //delete C O H E N 0 1 2 3 4 5 M 1 1 2 3 4 5 C 2 1 2 3 4 5 C 3 2 2 3 4 5 O 4 3 2 3 4 5 H 5 4 3 2 3 4 N 6 5 4 3 3 3 T1 T2 Cost M - 1 C - 1 C C 0 O O 0 H H 0 - E 1 N N 0 T1 T2 Cost M - 1 C C 0 C - 1 O O 0 H H 0 - E 1 N N 0
  9. 9. Practice G O O D G O D G O D TRY TRY G O O D 0 1 2 3 4 G 1 0 1 2 3 O 2 1 0 1 2 D 3 2 1 1 1
  10. 10. Sub string matching <ul><li>What is the best matching substring of S in T? </li></ul>Look for the minimum value “ niver” Cost 0 for delete positions before the sub-string u n i v e r s 0 0 0 0 0 0 0 n 1 0 1 1 1 1 1 i 2 1 0 1 2 2 2 e 3 2 1 1 1 2 3 r 4 3 2 2 2 1 2
  11. 11. summary <ul><li>Levenshtein algorithm & Sub-string matching </li></ul><ul><li>The tree edit distance </li></ul><ul><li>Basic tree concepts </li></ul><ul><li>Maps </li></ul><ul><li>Tai maps </li></ul><ul><li>Tree distance </li></ul><ul><li>Zhang Shasha algorithm & alignment </li></ul>
  12. 12. Edit distance between trees <ul><li>Tai 1979 introduce a criterion for matching nodes between tree representations, and Zhang and Shasha 1989 develop an algorithm that find an optimal matching tree solution for a given pair of trees. </li></ul><ul><li>For this algorithm it’s considered left to right order : what means that the order of the children of each node is important and ancestor : what means that the ancestor of each node is important. </li></ul><ul><li>For match or convert form one tree into another there are three operations allowed, as deleting a node, inserting a node and replace or changing a node. </li></ul><ul><li>When a node n is deleted, all its children are attached to the parent of n, in a insertion, it happen the opposite, it can heritage nodes as it’s own children, and a changing only affects the label of the node without any changes on the tree morphology. </li></ul>
  13. 13. summary <ul><li>Levenshtein algorithm & Sub-string matching </li></ul><ul><li>The tree edit distance </li></ul><ul><li>Basic tree concepts </li></ul><ul><li>Maps </li></ul><ul><li>Tai maps </li></ul><ul><li>Tree distance </li></ul><ul><li>Zhang Shasha algorithm & alignment </li></ul>
  14. 14. Post-order traversal of trees <ul><li>To traverse a non-empty binary tree in  postorder , perform the following operations recursively at each node: </li></ul><ul><li>1 Traverse the left subtree. </li></ul><ul><li>2 Traverse the right subtree. </li></ul><ul><li>3 Visit the node . </li></ul>
  15. 15. practice yourself 1 5 3 2 4
  16. 16. Ancestors:
  17. 17. Left most descendent Function l(x) give as the most left descendent of the node x
  18. 18. Key roots x is a key root if:
  19. 19. summary <ul><li>Levenshtein algorithm & Sub-string matching </li></ul><ul><li>The tree edit distance </li></ul><ul><li>Basic tree concepts </li></ul><ul><li>Maps </li></ul><ul><li>Tai maps </li></ul><ul><li>Tree distance </li></ul><ul><li>Zhang Shasha algorithm & alignment </li></ul>
  20. 20. Mappings M H S I I S S B Domain: Range:  Deleted: in the domain: Changed Exact match Inserted: in the range
  21. 21. Mappings M H S I I S S B Domain: Range:  Transformation = { 1,2 } , { 2,  } , { 3,1 } , { 4,4 } , {  ,3 } Note: this is NOT a tai map. 1 1 2 2 3 3 4 4
  22. 22. The sets TR: Transformations M: Map C: change Cost =1 EX: exact match Cost = 0 I: Insertion Cost = 1 D: deletion Cost= 1 TR = EX + C + I + D M = EX + C M H S I I S S B Domain: Range:  1 1 2 2 3 3 4 4
  23. 23. Mappings <ul><li>Sets: </li></ul><ul><li>C: relabeled (a,b) { 1,2 } </li></ul><ul><li>Ex: exact match (c,c) { 3,1 } , { 4,4 } </li></ul><ul><li>I: insertions (  ,b) {  ,3 } </li></ul><ul><li>D: deletions (a,  ) { 2,  } </li></ul>TR = { 1,2 } , { 2,  } , { 3,1 } , { 4,4 } , {  ,3 } M H S I I S S B Domain: Range:  1 1 2 2 3 3 4 4
  24. 24. Mappings more formal <ul><li>Let V (T) denote the set of nodes of a tree T </li></ul><ul><li>S is the Source tree </li></ul><ul><li>T is the Target tree. </li></ul><ul><li>M mapping </li></ul>
  25. 25. summary <ul><li>Levenshtein algorithm & Sub-string matching </li></ul><ul><li>The tree edit distance </li></ul><ul><li>Basic tree concepts </li></ul><ul><li>Maps </li></ul><ul><li>Tai maps </li></ul><ul><li>Tree distance </li></ul><ul><li>Zhang Shasha algorithm & alignment </li></ul>
  26. 26. Tai mapping: a restricted map <ul><li>M={(a,b), (c,d)} </li></ul><ul><li>1 One to one node </li></ul><ul><li>2 Sibling order preserved: brothers do not change order </li></ul><ul><li>3 Ancestor order preserved: by change are not new ancestors </li></ul><ul><li>1 a=c iff b=d </li></ul><ul><li>a<c iff b<d </li></ul><ul><li>Anc(a,c) iff anc (b,d). </li></ul><ul><li>Sample of tai mapping: </li></ul><ul><li>M= {(c,f),(a,e)} </li></ul>c f b d e a
  27. 27. practice <ul><li>. </li></ul>Ancestry! multiple! Possible tai mapping Sibling order 1 2 3 4 3 2 1 5 4 1 5 3 2 4
  28. 28. summary <ul><li>Levenshtein algorithm & Sub-string matching </li></ul><ul><li>The tree edit distance </li></ul><ul><li>Basic tree concepts </li></ul><ul><li>Maps </li></ul><ul><li>Tai maps </li></ul><ul><li>Tree distance </li></ul><ul><li>Zhang Shasha algorithm & alignment </li></ul>
  29. 29. Definition : Tree distance <ul><li>There can be multiple possible tai mappings between two trees, and at least there is one with the smallest cost. </li></ul><ul><li>“ the tree-distance is the cost of the least expensive Tai mapping” </li></ul><ul><li>Cost = |I| + |D| + |C| </li></ul>
  30. 30. Practice: prove the “triangle inequality” Prove:  (M3)   (M1)+  (M2) | M3 = M1*M2 Clue:  (M0) = |C0|+|I0|+|D0| TR: Transformations M: Map C: change Cost =1 EX: exact match Cost = 0 I: Insertion Cost = 1 D: deletion Cost= 1 TR: Transformations M: Map C: change Cost =1 EX: exact match Cost = 0 I: Insertion Cost = 1 D: deletion Cost= 1
  31. 31. Practice All possible combinations of the set m1 and m2 in m3 gives same or less cost combined, than alone. SET1 COST SET2 COST SET3 COST DIFF EX 0 EX 0 EX 0 0 EX 0 C 1 C 1 0 EX 0 D 1 D 1 0 C 1 EX 0 C 1 0 C 1 C 1 C/EX 1/0 -1/-2 C 1 D 1 D 1 -1 I 1 EX 0 I 1 0 I 1 C 1 I 1 -1 I 1 D 1 None 0 -2 D 1 None 0 D 1 0 D 1 I 1 C/EX 1/0 -1/-2 None 0 I 1 I 1 0
  32. 32. summary <ul><li>Levenshtein algorithm & Sub-string matching </li></ul><ul><li>The tree edit distance </li></ul><ul><li>Basic tree concepts </li></ul><ul><li>Maps </li></ul><ul><li>Tai maps </li></ul><ul><li>Tree distance </li></ul><ul><li>Zhang Shasha algorithm & alignment </li></ul>
  33. 33. Zhang shasha algorithm <ul><li>  finds the tree distance (cost of the least costly Tai mapping) </li></ul><ul><li>Important concepts: </li></ul><ul><li>Tree table -> tscore </li></ul><ul><li>Forest table -> fscore </li></ul>
  34. 34. Forest Distance <ul><li>Forest distance == F(), Tree distance T(). </li></ul><ul><li>Cost function  = F and T </li></ul>Tree Tree Tree Forest Tree
  35. 35. Tree distance  ( , )+  ( )  ( , )= min  ( , )+  ( )  ( , )+  ( ) delete add change
  36. 36. Forest distance  ( , ) = min  ( , )+  ( )  ( , )+  ( ) delete add  ( , )+  ( , )
  37. 37. Why this is not allowed? Clue: check tai mapping restrictions  ( , )+  ( )  ( , ) = min  ( , )+  ( )  ( , )+  ( ) delete add change
  38. 38.  ( , )+  ( )  ( , )= change
  39. 39. Algoritm: psedudo code <ul><li>Preprocessing() // </li></ul><ul><ul><li>Get most left node and key root for each node. </li></ul></ul><ul><li>For s:=1 to |keyroots(t1)| for t:= 1 to |keyroots(t2)| i=keyroots(t1)[s]; j=keyroots(t2)[t]; Treedist(i,j); end </li></ul><ul><li>Return tdist[i,j]; </li></ul>
  40. 40. Algoritm: 2 <ul><ul><li>Treedist(pos1,pos2) { bound1=pos1-left1[pos1]+2; bound2=pos2-left2[pos2]+2; fdist= new int[bound1,bound2]; fdist[0][0] = 0; for(i=1,i<bound1;i++) fdist[i][0]= fdist[i-1][0] + c[i][0] for(i=1,i<bound1;i++) fdist[0][i]= fdist[0][i-1] + c[0][1] </li></ul></ul>prepare forest table
  41. 41. Algoritm: 3 <ul><ul><li>For(k=left[pos1],i=1;k<=pos1;k++,i++) For(l=left[pos1],j=1;l<=pos1;l++,j++) if((left1[k]==left1[pos1]&&(left2[l]==left2[pos2])) { // if both are trees, then tree distance </li></ul></ul><ul><ul><li>/// then: </li></ul></ul><ul><ul><li>Fdist[i][j]=MIN( </li></ul></ul><ul><ul><li>fdist[i-1][j]+c[0][l] </li></ul></ul><ul><ul><li>fdist[i][j-1]+c[ k][0] </li></ul></ul><ul><ul><li>fdist[i-1][j-1]+c[k][l] </li></ul></ul><ul><ul><li>Tdist[k][l]=fdist[i][j]; </li></ul></ul>
  42. 42. Algoritm: 4 <ul><ul><li>}else{ /// else: </li></ul></ul><ul><ul><li>M=left1[k]-left1[pos1]; </li></ul></ul><ul><ul><li>N= left2[l]-left2[pos2]; </li></ul></ul><ul><ul><li>Fdist[i][j]=MIN( </li></ul></ul><ul><ul><li>fdist[i-1][j]+c[0][l] </li></ul></ul><ul><ul><li>fdist[i][j-1]+c[ k][0] </li></ul></ul><ul><ul><li>fdist[m][n]+tdist[k][l]; </li></ul></ul><ul><ul><li>) </li></ul></ul><ul><ul><li>} </li></ul></ul>
  43. 43. Sample <ul><li>There is a permanent tree distance table, and a dynamic forest distance table. </li></ul><ul><li>Let’s follow the algorithm to solve this problem: </li></ul><ul><li>Color means different labels. </li></ul>6 5 2 1 3 4 6 2 4 1 3 5 t1 t2 6 6 5
  44. 44. Sample <ul><li>Position: </li></ul><ul><li>Left1 array = </li></ul><ul><li>Left2 array = </li></ul><ul><li>LR_keyroots1 = </li></ul><ul><li>LR_keyroots2 = </li></ul>t1 t2 6 5 2 1 3 4 6 2 4 1 3 5 0 1 2 3 4 5 6 Nan 1 1 3 4 1 1 Nan 1 1 3 3 5 1 Nan 0 0 1 1 0 1 Nan 0 0 0 1 1 1
  45. 45. Most left <ul><li>Position: </li></ul><ul><li>Left1 array = </li></ul><ul><li>Left2 array = </li></ul><ul><li>LR_keyroots1 = </li></ul><ul><li>LR_keyroots2 = </li></ul>t1 t2 6 5 2 1 3 4 6 2 4 1 3 5 0 1 2 3 4 5 6 Nan 1 1 3 4 1 1 Nan 1 1 3 3 5 1 Nan 0 0 1 1 0 1 Nan 0 0 0 1 1 1
  46. 46. Key roots <ul><li>Position: </li></ul><ul><li>Left1 array = </li></ul><ul><li>Left2 array = </li></ul><ul><li>LR_keyroots1 = </li></ul><ul><li>LR_keyroots2 = </li></ul>t1 t2 6 5 2 1 3 4 6 2 4 1 3 5 0 1 2 3 4 5 6 Nan 1 1 3 4 1 1 Nan 1 1 3 3 5 1 Nan 0 0 1 1 0 1 Nan 0 0 0 1 1 1
  47. 47. Atomic cost: c <ul><li>Measure of change the label node n to the node m. </li></ul><ul><li>First row and column corresponds to the cost of delete/insert a node with such label. </li></ul>6 5 2 1 3 4 6 2 4 1 3 5 t1 t2 1 2 3 4 5 6 1 2 3 4 5 6   - 1 1 1 1 1 1 1 0 1 0 1 0 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 1 1
  48. 48. Tree distance table <ul><li>First row and column are unless , it just simplify the code. </li></ul><ul><li>The other cell are suppose to be start by 0, but we use N = Nan to illustrate that we only use the cells with value. </li></ul><ul><li>Matrix 6 *6 as the number of nodes </li></ul><ul><li>Each position x means the sub-tree T[l(x) ..x). </li></ul>t1 t2 0 1 2 3 4 5 6 01 23 456 6 5 2 1 3 4 6 2 4 1 3 5 - - - - - - - - N N N N N N - N N N N N N - N N N N N N - N N N N N N - N N N N N N - N N N N N N
  49. 49. Treedist(3,4) <ul><li>Calculations: </li></ul><ul><li>Bound1 = 3-3+2=2 </li></ul><ul><li>Bound2 = 4-3+2=3 </li></ul><ul><li>Fdist = new int[2][3] </li></ul><ul><li>Prepare forest distance table </li></ul><ul><li>Forest distance </li></ul>- - - - - - - - N N N N N N - N N N N N N - N N N N N N - N N N N N N - N N N N N N - N N N N N N
  50. 50. Treedist(3,4) step2 <ul><li>Calculations: </li></ul><ul><li>k = 3, i = 1 </li></ul><ul><li>L = 3, j = 1 </li></ul><ul><li>Is a tree? -> yes </li></ul><ul><li>Set the value in Forest distance table and in tree distance table </li></ul><ul><li>L++, j++ </li></ul><ul><li>Forest distance </li></ul>T( , )= min 3 3 F(  , )+  ( )=2 3 F( ,  )+  ( )=2 F(  ,  )+  ( )=0 3 3 3 3 <ul><li>It is a tree so: </li></ul><ul><li>We can look in diagonal </li></ul><ul><li>We must copy the value in tree table </li></ul>0 1 2 1 0 - - - - - - - - N N N N N N - N N N N N N - N N 0 N N N - N N N N N N - N N N N N N - N N N N N N
  51. 51. Treedist(3,4) step3 <ul><li>Calculations: </li></ul><ul><li>k = 3, i = 1 </li></ul><ul><li>L = 4, j = 2 </li></ul><ul><li>Is a tree? -> yes </li></ul><ul><li>Set the value in Forest distance table and in tree distance table </li></ul><ul><li>L++, j++ </li></ul><ul><li>Forest distance </li></ul>T( , )= min 3 F(  , )+  ( )=3 F( , )+  ( )=1 F(  , ) +  ( )=2 3 3 4 4 4 3 4 3 3 3 0 1 2 1 0 1 - - - - - - - - N N N N N N - N N N N N N - N N 0 1 N N - N N N N N N - N N N N N N - N N N N N N
  52. 52. Treedist(3,5) step1 <ul><li>Calculations: </li></ul><ul><li>Bound1 = 3-3+2=2 </li></ul><ul><li>Bound2 = 5-5+2=2 </li></ul><ul><li>Fdist = new int[2][2] </li></ul><ul><li>Prepare forest distance table </li></ul><ul><li>Forest distance </li></ul>- - - - - - - - N N N N N N - N N N N N N - N N 0 1 N N - N N N N N N - N N N N N N - N N N N N N
  53. 53. Treedist(3,5) step2 <ul><li>Calculations: </li></ul><ul><li>k = 3, i = 1 </li></ul><ul><li>L = 5, j = 1 </li></ul><ul><li>Is a tree? -> yes </li></ul><ul><li>Forest distance </li></ul>T( , )= min 3 5 F(  , )+  ( )=2 5 F( ,  )+  ( )=2 F(  ,  )+  ( )=0 3 3 5 5 - - - - - - - - N N N N N N - N N N N N N - N N 0 1 0 N - N N N N N N - N N N N N N - N N N N N N 0 1 1 0
  54. 54. Treedist(3,6) step1 <ul><li>Calculations: </li></ul><ul><li>Bound1 = 3-3+2=2 </li></ul><ul><li>Bound2 = 6-1+2=7 </li></ul><ul><li>Fdist = new int[2][7] </li></ul><ul><li>Prepare forest distance table </li></ul><ul><li>Forest distance </li></ul>0 1 2 3 4 5 6 1 - - - - - - - - N N N N N N - N N N N N N - N N 0 1 0 N - N N N N N N - N N N N N N - N N N N N N
  55. 55. Treedist(3,6) step2 <ul><li>Calculations: </li></ul><ul><li>k = 3, i = 1 </li></ul><ul><li>L = 1, j = 1 </li></ul><ul><li>Is a tree? -> yes </li></ul><ul><li>Forest distance </li></ul>T( , )= min 3 F(  , )+  ( )=2 F( ,  )+  ( )=2 F(  ,  )+  ( )=0 3 3 1 1 1 1 0 1 2 3 4 5 6 1 0 - - - - - - - - N N N N N N - N N N N N N - 0 N 0 1 0 N - N N N N N N - N N N N N N - N N N N N N
  56. 56. Treedist(3,6) step3 <ul><li>Calculations: </li></ul><ul><li>k = 3, i = 1 </li></ul><ul><li>L = 2, j = 2 </li></ul><ul><li>Is a tree? -> yes </li></ul><ul><li>Forest distance </li></ul>T( , )= min 3 F(  , )+  ( )=2 F( , )+  ( )=1 F(  , )+  ( )=2 3 3 2 2 2 1 2 1 1 1 0 1 2 3 4 5 6 1 0 1 - - - - - - - - N N N N N N - N N N N N N - 0 1 0 1 0 N - N N N N N N - N N N N N N - N N N N N N
  57. 57. Treedist(3,6) step4 <ul><li>Calculations: </li></ul><ul><li>k = 3, i = 1 </li></ul><ul><li>L = 4, j = 4 </li></ul><ul><li>Is a tree? -> NO </li></ul><ul><li>M = 0 </li></ul><ul><li>N = 2 </li></ul><ul><li>Forest distance </li></ul>T( , )= min 3 F(  , )+  ( )=4 F( , )+  ( )=2 F(  , )+T( , )=2 3 3 2 3 3 <ul><li>It is NOT a tree so: </li></ul><ul><li>We can NOT look in diagonal </li></ul>0 1 2 3 4 5 6 1 0 1 2 - - - - - - - - N N N N N N - N N N N N N - 0 1 0 1 0 N - N N N N N N - N N N N N N - N N N N N N 2 1 3 2 1 3 2 1 2 1
  58. 58. Treedist(3,6) step5 <ul><li>Calculations: </li></ul><ul><li>k = 3, i = 1 </li></ul><ul><li>L = 3, j = 3 </li></ul><ul><li>Is a tree? -> NO </li></ul><ul><li>M = 0 </li></ul><ul><li>N = 2 </li></ul><ul><li>Forest distance </li></ul>T( , )= min 3 F(  , )+  ( )=5 F( , )+  ( )=3 F(  , )+T( , )=3 3 3 2 3 Size of forest 0 1 2 3 4 5 6 1 0 1 2 3 - - - - - - - - N N N N N N - N N N N N N - 0 1 0 1 0 N - N N N N N N - N N N N N N - N N N N N N 2 1 2 4 1 3 2 4 1 3 2 1 3 4 3
  59. 60. <ul><li>Tree distance (3,4)-> starting in [3,3] (most left). </li></ul><ul><ul><li>F(3..3,3..3),F(3..3,3..4) </li></ul></ul>Key roots t1 6 5 2 1 3 4 2 4 1 3 t2
  60. 61. <ul><li>Tree distance (3,4)-> starting in [3,3] (most left). </li></ul><ul><ul><li>F(3..3,3..3),F(3..3,3..4) </li></ul></ul><ul><li>Tree distance (4,4)-> starting in [4,3] (most left). </li></ul><ul><ul><li>F(4..4,3..3),F(4..4,3..4) </li></ul></ul>Key roots t1 6 5 2 1 3 4 2 4 1 3 t2
  61. 62. <ul><li>Tree distance (3,4)-> starting in [3,3] (most left). </li></ul><ul><ul><li>F(3..3,3..3),F(3..3,3..4) </li></ul></ul><ul><li>Tree distance (4,4)-> starting in [4,3] (most left). </li></ul><ul><ul><li>F(4..4,3..3),F(4..4,3..4) </li></ul></ul><ul><li>Tree distance (6,4)-> starting in [1,3] (most left). </li></ul><ul><ul><li>F(4..4,3..3),F(4..4,3..4) </li></ul></ul>Key roots t1 6 5 2 1 3 4 2 4 1 3 t2
  62. 63. <ul><li>Tree distance (4,4)-> starting in [4,3] (most left). </li></ul><ul><ul><li>F(4..4,3..3),F(3..3,3..4) </li></ul></ul>Key roots t1 6 5 2 1 3 4 2 4 1 3 t2
  63. 64. <ul><li>Tree distance (6,4)-> starting in [1,3] (most left). </li></ul><ul><ul><li>F(1..1,3..4), tree F(1..2,3..4), tree F(1..3,3..4), forest F(1..4,3..4), forest F(1..5,3..4), tree F(1..6,3..4), tree </li></ul></ul>Key roots t1 We need the result of the matching for the sub-trees nodes 3 and 4, and it happen that hey are key roots! 6 5 2 1 3 4 2 4 1 3 t2 2 1 3 4  ( , ) = min  ( , )+  ( )  ( , )+  ( ) delete add  ( , )+  ( , )
  64. 65. Practice: <ul><li>Calculate the tree distance between this two trees and observe the similarity with Levenshtein algorithm. (Only 1 key root.) </li></ul>G o o d G o d Forest table = tree table = levenshtein table.
  65. 66. Tree distance <ul><li>Time: </li></ul><ul><li>Space: </li></ul>
  66. 67. Calculate the alignment <ul><li>Tree distance </li></ul><ul><li>Forest distance 6*6 </li></ul>The algorithm can be extended in order to get the alignment: All tables are need it. This talbes corresponds to the same example as we looked before. - - - - - - - - 0 1 0 1 0 5 - 1 0 1 0 1 4 - 0 1 0 1 0 5 - 0 1 0 1 0 5 - 4 3 4 3 4 2 - 5 4 5 4 5 3 0 1 2 3 4 5 6 1 0 1 2 3 4 5 2 1 0 1 2 3 4 3 2 1 0 1 2 3 4 3 2 1 2 1 2 5 4 3 2 3 2 2 6 5 4 3 4 3 3
  67. 68. Calculate the alignment <ul><li>Tree distance </li></ul><ul><li>Forest distance 6*6 </li></ul>(6,6) - - - - - - - - 0 1 0 1 0 5 - 1 0 1 0 1 4 - 0 1 0 1 0 5 - 0 1 0 1 0 5 - 4 3 4 3 4 2 - 5 4 5 4 5 3 0 1 2 3 4 5 6 1 0 1 2 3 4 5 2 1 0 1 2 3 4 3 2 1 0 1 2 3 4 3 2 1 2 1 2 5 4 3 2 3 2 2 6 5 4 3 4 3 3
  68. 69. Calculate the alignment <ul><li>Tree distance </li></ul><ul><li>Forest distance 6*6 </li></ul>(6,6), - - - - - - - - 0 1 0 1 0 5 - 1 0 1 0 1 4 - 0 1 0 1 0 5 - 0 1 0 1 0 5 - 4 3 4 3 4 2 - 5 4 5 4 5 3 0 1 2 3 4 5 6 1 0 1 2 3 4 5 2 1 0 1 2 3 4 3 2 1 0 1 2 3 4 3 2 1 2 1 2 5 4 3 2 3 2 2 6 5 4 3 4 3 3
  69. 70. Calculate the alignment <ul><li>Tree distance </li></ul><ul><li>Forest distance 6*6 </li></ul>(6,6), (4,5) - - - - - - - - 0 1 0 1 0 5 - 1 0 1 0 1 4 - 0 1 0 1 0 5 - 0 1 0 1 0 5 - 4 3 4 3 4 2 - 5 4 5 4 5 3 0 1 2 3 4 5 6 1 0 1 2 3 4 5 2 1 0 1 2 3 4 3 2 1 0 1 2 3 4 3 2 1 2 1 2 5 4 3 2 3 2 2 6 5 4 3 4 3 3
  70. 71. Calculate the alignment <ul><li>Tree distance </li></ul><ul><li>Forest distance 6*6 </li></ul>(6,6), (4,5) - - - - - - - - 0 1 0 1 0 5 - 1 0 1 0 1 4 - 0 1 0 1 0 5 - 0 1 0 1 0 5 - 4 3 4 3 4 2 - 5 4 5 4 5 3 0 1 2 3 4 5 6 1 0 1 2 3 4 5 2 1 0 1 2 3 4 3 2 1 0 1 2 3 4 3 2 1 2 1 2 5 4 3 2 3 2 2 6 5 4 3 4 3 3
  71. 72. Calculate the alignment <ul><li>Tree distance </li></ul><ul><li>Forest distance 6*6 </li></ul>(6,6), (4,5), (2,2) - - - - - - - - 0 1 0 1 0 5 - 1 0 1 0 1 4 - 0 1 0 1 0 5 - 0 1 0 1 0 5 - 4 3 4 3 4 2 - 5 4 5 4 5 3 0 1 2 3 4 5 6 1 0 1 2 3 4 5 2 1 0 1 2 3 4 3 2 1 0 1 2 3 4 3 2 1 2 1 2 5 4 3 2 3 2 2 6 5 4 3 4 3 3
  72. 73. Calculate the alignment <ul><li>Tree distance </li></ul><ul><li>Forest distance 6*6 </li></ul>(6,6), (4,5), (2,2), (1,1). - - - - - - - - 0 1 0 1 0 5 - 1 0 1 0 1 4 - 0 1 0 1 0 5 - 0 1 0 1 0 5 - 4 3 4 3 4 2 - 5 4 5 4 5 3 0 1 2 3 4 5 6 1 0 1 2 3 4 5 2 1 0 1 2 3 4 3 2 1 0 1 2 3 4 3 2 1 2 1 2 5 4 3 2 3 2 2 6 5 4 3 4 3 3
  73. 74. THANKS FOR YOUR ATENTION

×