Your SlideShare is downloading. ×
Refactoring Metadata:
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Refactoring Metadata:

576
views

Published on

Talk at DAMA/Metadata 2004 (2-6 May 2004, Los Angeles)

Talk at DAMA/Metadata 2004 (2-6 May 2004, Los Angeles)

Published in: Technology, Education

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
576
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Refactoring Metadata: Finding architectural compatibility through structural comparisons Baden Hughes Department of Computer Science and Software Engineering The University of Melbourne May 2-6, 2004 Copyright © 2004 Baden Hughes 1
  • 2. Agenda • Motivation for Refactoring Metadata • Setting the Context • Identifying Points of Comparison • Goals for Structural Comparison • Methods for Structural Comparison • Refactoring in Practice • Principles for Robust Instances • Conclusion May 2-6, 2004 Copyright © 2004 Baden Hughes 2
  • 3. Motivations for Refactoring Metadata • The need to addressing the problem of metadata volatility is conceptually juxtaposed with the motivation for metadata creation • XML technologies have become pervasive within the metadata domain • Different communities = different standards = different degrees of maturity in metadata adoption resulting in metadata being (highly?) variable even within an organization • Systematically determining similarity and difference is the key to effective refactoring of metadata • Automatically determining similarity and difference is the key to efficient refactoring of metadata May 2-6, 2004 Copyright © 2004 Baden Hughes 3
  • 4. Setting the Context • XML-based metadata from natural language engineering and digital libraries • Wide variety of – traditions of metadata development – technologies for metadata implementation – objects described by metadata – granularity of metadata descriptions • Motivated by interoperability analysis • Seeking to leverage processes not dissimilar to database schema comparisons May 2-6, 2004 Copyright © 2004 Baden Hughes 4
  • 5. Identifying Points of Comparison • Robust instances require both syntactic and semantic analysis • Points of comparison – XML Document Instance – DTDs – Schemata – Namespaces – RDF Instances – Ontologies • Likely that different methods are required for each different input May 2-6, 2004 Copyright © 2004 Baden Hughes 5
  • 6. Goals of Structural Comparison • While validation of XML based metadata does contribute to the quality of metadata, it does not necessarily assist in determining architectural compatibility • Systematic, iterative evaluation of metadata architectures can contribute to maturity of XML based metadata • Quantifying the degree of syntactic and semantic similarity is an important first step in the refactoring process – it may in fact demonstrate viability May 2-6, 2004 Copyright © 2004 Baden Hughes 6
  • 7. Methods for Structural Comparison • Different methods for structural comparison depending on the input – XML documents: trees – DTDs: regexps and feature structures – XML Namespaces: feature structures – XML Schemas: regexps and graph matching – RDF Instances: graph matching – Ontologies: feature structures and graph matching May 2-6, 2004 Copyright © 2004 Baden Hughes 7
  • 8. Tree Matching • Common conception of an XML document as a tree structure • Tree matching is a widely used IE/IR technique for structured data, and is applicable to XML based metadata • Tree matching is largely derivative from pattern matching, and is largely independent of syntactic or semantic constraints • While tree matching can provide basic information about the similarity of two documents, for architectural compatibility a deeper analysis is required May 2-6, 2004 Copyright © 2004 Baden Hughes 8
  • 9. Regular Expression Matching • DTD syntax is derived from regular expressions • Well known evaluation methodologies for regexps are applicable to DTDs • In contrast to pure syntactic comparison, regexp matching allows the discovery of the legal constituents of syntactic structures • Regexp evaluation is is a highly efficient exercise even on large metadata collections, and widely implemented in common programming languages May 2-6, 2004 Copyright © 2004 Baden Hughes 9
  • 10. Feature Structure Matching • Typed feature structures are widely used for deriving controlled vocabularies – XML attribute instances are typically able to be reduced to typed feature structures for comparison • Evaluation of the semantic content of feature structures is well grounded in formal logic • Feature structure comparisons can also reveal syntactic constraints expressed as dimensionality of feature matrices May 2-6, 2004 Copyright © 2004 Baden Hughes 10
  • 11. Graph Matching • Rich XML representations such as RDF can be construed as a series of arcs and nodes, allowing the adoption of graph theory techniques for the determination of isomorphism • Finding the minimum and maximum common subgraphs is a technique which can be used to determine architectural compatibility in the syntactic domain • Graph matching is primarily syntactic, although it can also be applied to semantic analysis on sources such as ontologies May 2-6, 2004 Copyright © 2004 Baden Hughes 11
  • 12. Refactoring in Practice • XML Documents • DTDs • Namespaces • Schemata • RDF instances • Ontologies • See http://www.cs.mu.oz.au/~badenh/projects/metadata-comparison for demo materials May 2-6, 2004 Copyright © 2004 Baden Hughes 12
  • 13. Principles for Robust Instances • Both syntactic and semantic analysis are required • Initiate comparisons at the highest level, and proceed downwards – higher level incompatibilities are more complex to resolve • Quantifying degree of similarity is extremely important as it impacts directly on the complexity of refactoring processes • Accurately identified commonalities at both syntactic and semantic levels can be leveraged efficiently May 2-6, 2004 Copyright © 2004 Baden Hughes 13
  • 14. Conclusion • Adopting and permuting a range of techniques for structural comparison from a variety of other disciplines can lead to efficient methods for metadata structural analysis and consequently refactoring • Large scale metadata management requires an automated approach to both syntactic and semantic evaluation in order to contribute to ROI May 2-6, 2004 Copyright © 2004 Baden Hughes 14
  • 15. Acknowledgements • National Science Foundation Grant Number 9910603 (International Standards in Language Engineering) • National Science Foundation Grant Number 0317826 (Querying Linguistic Databases) May 2-6, 2004 Copyright © 2004 Baden Hughes 15