2011611009

Top-k Approach For Compact
Storage Structure

Guided By,
Dr. Radha Senthilkumar By,
S.Meenakshi,
Assistant Professor
2011611009,
Department of IT M.Tech I.T

Problem Definition
 Evaluating the tree edit distance for large xml trees is
difficult.
 The best known xml algorithm have cubic run time and
quadratic complexity is not scalable.
 A core problem is to efficiently prune sub trees.

Literature Survey cont…
 “Efficient Top-k Approximate Subtree Matchingin Small
Memory “Nikolaus Augsten, Denilson Barbosa, Michael M. Bo¨
hlen, and Themis Palpanas, IEEE transactions on knowledge and
data engineering, vol. 22, no. 8, August 2011.

 The top-k approximatec matches of a small query tree Q within a
large document tree.
 Using prefix ring buffer that allows to efficiently prune subtrees.
 TASM is portable because it relies on the postorder queue structure
which can be implemented by any xml processing that allows an
efficient postorder traversal of trees.

Literature Survey cont…
 Jiaheng Lu, Pierre Senellart, Chunbin Lin, Xiaoyong Du, Shan
Wang, Xinxing ChenMay “Optimal top-k generation of attribute
combinations based on ranked lists” proc. ACM SIGMOD Int’l
Conf. on Management of Data pp.1-12,2012.

• A novel top-k query type, called top-k,m queries.
• Suppose we are given a set of groups and each group contains a set of
attributes, each of which is associated with a ranked list of tuples.
• All lists are ranked in decreasing order of the scores of tuples. We
want the top-k combinations of attributes according to the
corresponding top-m tuples with matching IDs.

Literature Survey cont..
 K.-C. Tai, “The Tree-to-Tree Correction Problem,” J. ACM, vol. 26,no. 3,
pp. 422-433, 1979.

• The string-to-string correction problem, which is to determine the
distance between two strings as measured by the minimum cost
sequence of edit operations needed to transform one string into the
other.
 Three edit operations: changing one node of a tree into another node,
deleting one node from a tree, or inserting a node into a tree; and they
presented an algorithm that computes the distance between two
strings in time O(m* n), where m and n are the lengths of the two
given strings.

Objective
 To implement the concept of dominating queries
by the approach of Top-k Approximate Subtree
Matching Problem.
 To evaluate the performance of dominating
queries in the compact storage structure.

Dominating Queries
 The number of result is controllable.
 The result is Scaling invariant.
 No user defined ranking function is requierd.
 Each point is assigned an intuitive score which determines
its rank.

TASM:
• The problem of ranking the k best approximate matches of
a small query tree in the large document tree.

References
 “Efficient Top-k Approximate Subtree Matchingin Small Memory
“Nikolaus Augsten, Denilson Barbosa, Michael M. Bo¨ hlen, and
Themis Palpanas, IEEE transactions on knowledge and data
engineering, vol. 22, no. 8, August 2011.
 Jiaheng Lu, Pierre Senellart, Chunbin Lin, Xiaoyong Du, Shan Wang,
Xinxing ChenMay “Optimal top-k generation of attribute
combinations based on ranked lists” proc. ACM SIGMOD Int’l Conf.
on Management of Data pp.1-12,2012.
 N. Augsten, M.H. Bo¨ hlen, C.E. Dyreson, and J.
Gamper,“Approximate Joins for Data-Centric XML,” Proc. IEEE 24th
Int’lConf. Data Eng. (ICDE), pp. 814-823, 2008.
 K.-C. Tai, “The Tree-to-Tree Correction Problem,” J. ACM, vol. 26,no.
3, pp. 422-433, 1979.

Timeline Chart
PHASE REVIEW 1 REVIEW II REVIEW III

Learning to work Implement the Evaluate the
with TASM concept of dominating
PHASE I (July) dominating queries in compact
queries(August- storage structure
September) ( October and
November)

2011611009

More Related Content

What's hot

Viewers also liked

Similar to 2011611009

Recently uploaded

2011611009