View stunning SlideShares in full-screen with the new iOS app!Introducing SlideShare for AndroidExplore all your favorite topics in the SlideShare appGet the SlideShare app to Save for Later — even offline
View stunning SlideShares in full-screen with the new Android app!View stunning SlideShares in full-screen with the new iOS app!
Efficient and Effective Duplicate Detection in Hierarchical DataAbstract:Although there is a long line of work on identifying duplicates in relational data, only afew solutions focus on duplicate detection in more complex hierarchical structures, like XMLdata. In this paper, we present a novel method for XML duplicate detection, called XMLDup.XMLDup uses a Bayesian network to determine the probability of two XML elements beingduplicates, considering not only the information within the elements, but also the way thatinformation is structured. In addition, to improve the efficiency of the network evaluation, anovel pruning strategy, capable of significant gains over the unoptimized version of thealgorithm, is presented. Through experiments, we show that our algorithm is able to achieve highprecision and recall scores in several datasets. XMLDup is also able to outperform another stateof the art duplicate detection solution, both in terms of efficiency and of effectiveness. Finally,we also study how important the structure of elements is in the duplicate detection process. Weobserve that, not only structure can clearly influence the outcome, but also that, by ensuring astructure that is adequate to the characteristics of the data, we can actually improve the quality ofthe results.Soft ware and hard ware requirementsHardware Required:System : Pentium IVHard Disk : 80 GBRAM : 512 MBSoftware Required:O/S : Windows XPLanguage : Visual C#www.nanocdac.com www.nsrcnano.com branches: hyderabad nagpur