Top-k Approach For Compact
   Storage Structure


Guided By,
Dr. Radha Senthilkumar   By,
                         S.Meenakshi,
Assistant Professor
                         2011611009,
Department of IT         M.Tech I.T
Problem Definition
 Evaluating the tree edit distance for large xml trees is
  difficult.
 The best known xml algorithm have cubic run time and
  quadratic complexity is not scalable.
 A core problem is to efficiently prune sub trees.
Literature Survey cont…
 “Efficient Top-k Approximate Subtree Matchingin Small
  Memory “Nikolaus Augsten, Denilson Barbosa, Michael M. Bo¨
  hlen, and Themis Palpanas, IEEE transactions on knowledge and
  data engineering, vol. 22, no. 8, August 2011.

 The top-k approximatec matches of a small query tree Q within a
  large document tree.
 Using prefix ring buffer that allows to efficiently prune subtrees.
 TASM is portable because it relies on the postorder queue structure
  which can be implemented by any xml processing that allows an
  efficient postorder traversal of trees.
Literature Survey cont…
 Jiaheng Lu, Pierre Senellart, Chunbin Lin, Xiaoyong Du, Shan
  Wang, Xinxing ChenMay “Optimal top-k generation of attribute
  combinations based on ranked lists” proc. ACM SIGMOD Int’l
  Conf. on Management of Data pp.1-12,2012.

• A novel top-k query type, called top-k,m queries.
• Suppose we are given a set of groups and each group contains a set of
  attributes, each of which is associated with a ranked list of tuples.
• All lists are ranked in decreasing order of the scores of tuples. We
  want the top-k combinations of attributes according to the
  corresponding top-m tuples with matching IDs.
Literature Survey cont..
 K.-C. Tai, “The Tree-to-Tree Correction Problem,” J. ACM, vol. 26,no. 3,
  pp. 422-433, 1979.


• The string-to-string correction problem, which is to determine the
  distance between two strings as measured by the minimum cost
  sequence of edit operations needed to transform one string into the
  other.
 Three edit operations: changing one node of a tree into another node,
  deleting one node from a tree, or inserting a node into a tree; and they
  presented an algorithm that computes the distance between two
  strings in time O(m* n), where m and n are the lengths of the two
  given strings.
Objective
 To implement the concept of dominating queries
  by the approach of Top-k Approximate Subtree
  Matching Problem.
 To evaluate the performance of dominating
  queries in the compact storage structure.
Dominating Queries
 The number of result is controllable.
 The result is Scaling invariant.
 No user defined ranking function is requierd.
 Each point is assigned an intuitive score which determines
  its rank.

TASM:
• The problem of ranking the k best approximate matches of
  a small query tree in the large document tree.
References
 “Efficient Top-k Approximate Subtree Matchingin Small Memory
  “Nikolaus Augsten, Denilson Barbosa, Michael M. Bo¨ hlen, and
  Themis Palpanas, IEEE transactions on knowledge and data
  engineering, vol. 22, no. 8, August 2011.
 Jiaheng Lu, Pierre Senellart, Chunbin Lin, Xiaoyong Du, Shan Wang,
  Xinxing ChenMay “Optimal top-k generation of attribute
  combinations based on ranked lists” proc. ACM SIGMOD Int’l Conf.
  on Management of Data pp.1-12,2012.
 N. Augsten, M.H. Bo¨ hlen, C.E. Dyreson, and J.
  Gamper,“Approximate Joins for Data-Centric XML,” Proc. IEEE 24th
  Int’lConf. Data Eng. (ICDE), pp. 814-823, 2008.
 K.-C. Tai, “The Tree-to-Tree Correction Problem,” J. ACM, vol. 26,no.
  3, pp. 422-433, 1979.
Timeline Chart
PHASE        REVIEW 1         REVIEW II           REVIEW III

          Learning to work   Implement the      Evaluate the
            with TASM           concept of     dominating
PHASE I         (July)         dominating      queries in compact
                             queries(August-   storage structure
                               September)         ( October and
                                                    November)
Thank You

2011611009

  • 1.
    Top-k Approach ForCompact Storage Structure Guided By, Dr. Radha Senthilkumar By, S.Meenakshi, Assistant Professor 2011611009, Department of IT M.Tech I.T
  • 2.
    Problem Definition  Evaluatingthe tree edit distance for large xml trees is difficult.  The best known xml algorithm have cubic run time and quadratic complexity is not scalable.  A core problem is to efficiently prune sub trees.
  • 3.
    Literature Survey cont… “Efficient Top-k Approximate Subtree Matchingin Small Memory “Nikolaus Augsten, Denilson Barbosa, Michael M. Bo¨ hlen, and Themis Palpanas, IEEE transactions on knowledge and data engineering, vol. 22, no. 8, August 2011.  The top-k approximatec matches of a small query tree Q within a large document tree.  Using prefix ring buffer that allows to efficiently prune subtrees.  TASM is portable because it relies on the postorder queue structure which can be implemented by any xml processing that allows an efficient postorder traversal of trees.
  • 4.
    Literature Survey cont… Jiaheng Lu, Pierre Senellart, Chunbin Lin, Xiaoyong Du, Shan Wang, Xinxing ChenMay “Optimal top-k generation of attribute combinations based on ranked lists” proc. ACM SIGMOD Int’l Conf. on Management of Data pp.1-12,2012. • A novel top-k query type, called top-k,m queries. • Suppose we are given a set of groups and each group contains a set of attributes, each of which is associated with a ranked list of tuples. • All lists are ranked in decreasing order of the scores of tuples. We want the top-k combinations of attributes according to the corresponding top-m tuples with matching IDs.
  • 5.
    Literature Survey cont.. K.-C. Tai, “The Tree-to-Tree Correction Problem,” J. ACM, vol. 26,no. 3, pp. 422-433, 1979. • The string-to-string correction problem, which is to determine the distance between two strings as measured by the minimum cost sequence of edit operations needed to transform one string into the other.  Three edit operations: changing one node of a tree into another node, deleting one node from a tree, or inserting a node into a tree; and they presented an algorithm that computes the distance between two strings in time O(m* n), where m and n are the lengths of the two given strings.
  • 6.
    Objective  To implementthe concept of dominating queries by the approach of Top-k Approximate Subtree Matching Problem.  To evaluate the performance of dominating queries in the compact storage structure.
  • 7.
    Dominating Queries  Thenumber of result is controllable.  The result is Scaling invariant.  No user defined ranking function is requierd.  Each point is assigned an intuitive score which determines its rank. TASM: • The problem of ranking the k best approximate matches of a small query tree in the large document tree.
  • 8.
    References  “Efficient Top-kApproximate Subtree Matchingin Small Memory “Nikolaus Augsten, Denilson Barbosa, Michael M. Bo¨ hlen, and Themis Palpanas, IEEE transactions on knowledge and data engineering, vol. 22, no. 8, August 2011.  Jiaheng Lu, Pierre Senellart, Chunbin Lin, Xiaoyong Du, Shan Wang, Xinxing ChenMay “Optimal top-k generation of attribute combinations based on ranked lists” proc. ACM SIGMOD Int’l Conf. on Management of Data pp.1-12,2012.  N. Augsten, M.H. Bo¨ hlen, C.E. Dyreson, and J. Gamper,“Approximate Joins for Data-Centric XML,” Proc. IEEE 24th Int’lConf. Data Eng. (ICDE), pp. 814-823, 2008.  K.-C. Tai, “The Tree-to-Tree Correction Problem,” J. ACM, vol. 26,no. 3, pp. 422-433, 1979.
  • 9.
    Timeline Chart PHASE REVIEW 1 REVIEW II REVIEW III Learning to work Implement the Evaluate the with TASM concept of dominating PHASE I (July) dominating queries in compact queries(August- storage structure September) ( October and November)
  • 10.