1. Top-k Approach For Compact
Storage Structure
Guided By,
Dr. Radha Senthilkumar By,
S.Meenakshi,
Assistant Professor
2011611009,
Department of IT M.Tech I.T
2. Problem Definition
Evaluating the tree edit distance for large xml trees is
difficult.
The best known xml algorithm have cubic run time and
quadratic complexity is not scalable.
A core problem is to efficiently prune sub trees.
3. Literature Survey cont…
“Efficient Top-k Approximate Subtree Matchingin Small
Memory “Nikolaus Augsten, Denilson Barbosa, Michael M. Bo¨
hlen, and Themis Palpanas, IEEE transactions on knowledge and
data engineering, vol. 22, no. 8, August 2011.
The top-k approximatec matches of a small query tree Q within a
large document tree.
Using prefix ring buffer that allows to efficiently prune subtrees.
TASM is portable because it relies on the postorder queue structure
which can be implemented by any xml processing that allows an
efficient postorder traversal of trees.
4. Literature Survey cont…
Jiaheng Lu, Pierre Senellart, Chunbin Lin, Xiaoyong Du, Shan
Wang, Xinxing ChenMay “Optimal top-k generation of attribute
combinations based on ranked lists” proc. ACM SIGMOD Int’l
Conf. on Management of Data pp.1-12,2012.
• A novel top-k query type, called top-k,m queries.
• Suppose we are given a set of groups and each group contains a set of
attributes, each of which is associated with a ranked list of tuples.
• All lists are ranked in decreasing order of the scores of tuples. We
want the top-k combinations of attributes according to the
corresponding top-m tuples with matching IDs.
5. Literature Survey cont..
K.-C. Tai, “The Tree-to-Tree Correction Problem,” J. ACM, vol. 26,no. 3,
pp. 422-433, 1979.
• The string-to-string correction problem, which is to determine the
distance between two strings as measured by the minimum cost
sequence of edit operations needed to transform one string into the
other.
Three edit operations: changing one node of a tree into another node,
deleting one node from a tree, or inserting a node into a tree; and they
presented an algorithm that computes the distance between two
strings in time O(m* n), where m and n are the lengths of the two
given strings.
6. Objective
To implement the concept of dominating queries
by the approach of Top-k Approximate Subtree
Matching Problem.
To evaluate the performance of dominating
queries in the compact storage structure.
7. Dominating Queries
The number of result is controllable.
The result is Scaling invariant.
No user defined ranking function is requierd.
Each point is assigned an intuitive score which determines
its rank.
TASM:
• The problem of ranking the k best approximate matches of
a small query tree in the large document tree.
8. References
“Efficient Top-k Approximate Subtree Matchingin Small Memory
“Nikolaus Augsten, Denilson Barbosa, Michael M. Bo¨ hlen, and
Themis Palpanas, IEEE transactions on knowledge and data
engineering, vol. 22, no. 8, August 2011.
Jiaheng Lu, Pierre Senellart, Chunbin Lin, Xiaoyong Du, Shan Wang,
Xinxing ChenMay “Optimal top-k generation of attribute
combinations based on ranked lists” proc. ACM SIGMOD Int’l Conf.
on Management of Data pp.1-12,2012.
N. Augsten, M.H. Bo¨ hlen, C.E. Dyreson, and J.
Gamper,“Approximate Joins for Data-Centric XML,” Proc. IEEE 24th
Int’lConf. Data Eng. (ICDE), pp. 814-823, 2008.
K.-C. Tai, “The Tree-to-Tree Correction Problem,” J. ACM, vol. 26,no.
3, pp. 422-433, 1979.
9. Timeline Chart
PHASE REVIEW 1 REVIEW II REVIEW III
Learning to work Implement the Evaluate the
with TASM concept of dominating
PHASE I (July) dominating queries in compact
queries(August- storage structure
September) ( October and
November)