If We're Not There Yet, How Far Do We Have To Go ? Web Metadata at The University of Melbourne


Published on

Paper at DC-ANZ 2005 (May 2005, Melbourne)

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

If We're Not There Yet, How Far Do We Have To Go ? Web Metadata at The University of Melbourne

  1. 1. If we’re not there yet, how far do we have to go ? A review of web metadata at The University of Melbourne Eve Young, Metadata Coordinator Information Acquisition and Organisation Section Information Division Baden Hughes, Research Fellow Department of Computer Science and Software Engineering The University of Melbourne Young & Hughes, DC-ANZ 2005 1
  2. 2. Overview Background Web publishing policies circa 1999, 2001 Research projects Towards standardization Dublin Core UniMelb administrative metadata Broad scale compliance analysis UniMelb web environment DC Metadata on the UniMelb Web UniMelb Metadata on the UniMelb Web Reflections and challenges for the future Young & Hughes, DC-ANZ 2005 2
  3. 3. Before Metadata on UniM site Existing standard (1999) not widely adopted 9 metadata tags expiryDate, maintainer, authoriser, author, description, keywords, lastModified, distribution, contentType Operational and implementation issues Difficulty finding information Suspected non-compliance Investigate and analyze Manual research Young & Hughes, DC-ANZ 2005 3
  4. 4. Expiry Tag Analysis Expiry tag functionality important Analysis into non-compliance (608 pages) Only 27% of pages audited were compliant Of the remainder of pages reviewed, 441 had no date, or NA as value Young & Hughes, DC-ANZ 2005 4
  5. 5. A to Z Index: Compliance Audit Audit of metadata on 78 web pages Highest compliance 84.6% (content type) Lowest 11.5 % (expiry date) More unknown than known maintainers Default value tags had high degree of compliance Page specific tags (keywords) had lowest Young & Hughes, DC-ANZ 2005 5
  6. 6. Metadata Working Group Advise on implementation of a uniform approach to the creation of metadata Membership drew on expertise from across the university - academics, IT, web, metadata, and library Reviewed metadata standards, DC, IMS, AGLS Metadata use in large information –rich organizations, eg, Aust Govt, UK Government, UNSW libraries Young & Hughes, DC-ANZ 2005 6
  7. 7. UniMelb metadata standard 19 elements (meta tags) to describe and manage a resource 2003 revised standard endorsed by Information Strategy Committee. Requirement on all University of Melbourne web pages Young & Hughes, DC-ANZ 2005 7
  8. 8. Why Dublin Core (besides this being a DC-ANZ conference) ? ISO 15836 15 elements, simple International consensus Well supported Offers semantic interoperability Extensible Easy to implement in our environment Young & Hughes, DC-ANZ 2005 8
  9. 9. University of Melbourne DC Metadata Elements D C.Ti le t D C.Right s D C.Creato r D C.Date D C.Subject D C.Date.Mod fiied D C.Descr ton ip i D C.Language D C.Publi sher D C.Format D C.Contr t ibu or D C. dent f r I iie Young & Hughes, DC-ANZ 2005 9
  10. 10. University of Melbourne Administrative Metadata Elements U M.Creato .Emai r l U M.Author .Name iser U M.Author .T te iser il U M.Mainta .Name iner U M.Mainta .Depar ment iner t U M.Mainta .Ema l iner i U M.Date.Revi ewDue Young & Hughes, DC-ANZ 2005 10
  11. 11. Broad Scale Compliance Analysis Full crawl of the University of Melbourne web presence in March 2005 Used was the Internet Archive's Heritrix suite (an open-source, extensible, web-scale, archival-quality web crawler) Total 57Gb of data was retrieved from www.unimelb.edu.au and its associated sub- domains over a period of 146 hours 1.4 million documents were retrieved Young & Hughes, DC-ANZ 2005 11
  12. 12. The UniMelb Web Environment Format Demographics of UniMelb Web text/html image/jpeg image/gif application/pdf text/plain application/msword application/msexcel application/mspowerpoint application/postscript others Young & Hughes, DC-ANZ 2005 12
  13. 13. Observations HTML is no longer the dominant format UniMelb’s metadata creation processes primarily oriented at creating Dublin Core-extended metadata as simple HTML meta tags Pure HTML content in fact is no longer dominant format Web-accessibility of “non-native” document types Many MIME Types are not addressed by the UniMelb guidelines for metadata creation but which do offer some potential for restricted metadata inclusion Emerging document types such as XML and RDF do not easily allow for the embedding of metadata internal to the resource. The emergence of dynamic documents Analysis of “All Other” categories shows many (~38%) of these documents are dynamic, generated server side on demand by PHP, ASP, JSP etc. No thought currently given to inclusion of metadata in automatically generated documents of this type Young & Hughes, DC-ANZ 2005 13
  14. 14. DC Metadata on the UniMelb Web Usage of DC Elements 90.0 80.0 70.0 % Coverage 60.0 % HTML Pages with 50.0 Metatdata in <HEAD> 40.0 30.0 20.0 %HTML Pages with 10.0 Metadata in <BODY> 0.0 element DC.Subject and DC.Publisher DC.Language Overall Average Dc.Contributor DC.DateModified DC.Identifier DC.Format DC.Title DC.Description DC.Rights DC.Creator DC.Date Total % HTML Pages containing Metadata in either <HEAD> or <BODY> DC Metadata Element Young & Hughes, DC-ANZ 2005 14
  15. 15. Observations Alignment with broad Dublin Core norms These figures are generally in line with the findings of broad scale Dublin Core-oriented metadata communities OAI (Ward, 2003) OLAC (Hughes, 2004) Young & Hughes, DC-ANZ 2005 15
  16. 16. UM Metadata on the UniMelb Web Usage of UM Elements 80.0 70.0 % HTML Pages with 60.0 Metatdata in <HEAD> % Coverage 50.0 %HTML Pages with Metadata 40.0 in <BODY> element 30.0 Total % HTML Pages 20.0 containing Metadata in either <HEAD> or <BODY> 10.0 0.0 e e ue ge l l ai ai itl am m D m .T ra ew r.E r.N .E ve er or ris ll A vi ne ne at Re ho ai ai ra re nt ut e. nt ve .C ai at ai .A O M .M .M .D M U U M M M U U U UM Metadata Element Young & Hughes, DC-ANZ 2005 16
  17. 17. Observations Differences between core Dublin Core and institutional metadata institutional metadata is more regularly contributed, despite the automatic creation of some DC by content creation applications Correlation with manual inspection statistics these experiments suggest trends detected in earlier focused studies such as Zajacek (2002a, 2002b) are valid. Differences between metadata included in <HEAD> vs <BODY> elements for institutional metadata, there is a strong tendency to include metadata in the <BODY> elements where it is immediately visible on the page rather than in the <HEAD> elements which may reflects the emphasis of the training materials Young & Hughes, DC-ANZ 2005 17
  18. 18. Reflections and Challenges 1 % coverage of HTML sources is relatively low, but it does account for a large number of documents (650K total in this survey) Many documents are non-compliant for identifiable reasons – eg exclusion of metadata in template based pages such as those within the learning management system External search engines like are not using meta tag information any more but perform full text indexing (see Richardson, 2004) Benefit to general web searchers of institutional metadata creation is almost zero May still retain currency for other administrative purposes eg the authorization of web content publication. Need to distinguish between the institutions need for web content management, and how metadata facilitates this goal, and decoupling from web search experience in general. Young & Hughes, DC-ANZ 2005 18
  19. 19. Reflections and Challenges 2 Potential impact of institution wide Content Management System Existing metadata standards failed to address distributed content creation (or underestimated the pervasive effect of “publish to web” type technologies to all staff), Opportunity to increase compliance with new generation tools and practices. Revisiting motivation for web metadata: search assistance or administrative processes ? Changes to work practices required for web publishing authorisation “Compliance audit” service for run time verification of metadata compliance, with a “watermarking” service which automatically imprimaturs compliant pages in the absence of manual inspection. Require the formalisation of University of Melbourne metadata as a true Dublin Core application profile and an associated formal schema, and the creation of controlled vocabularies for extensions. Young & Hughes, DC-ANZ 2005 19
  20. 20. Reflections and Challenges 3 Large number of pages which will be updated only at an irregular interval Substantially increasing the coverage of institutional metadata in the short to medium term may require the deployment of an automated metadata creation service such as DCdot (Powell, 2000) or an augmentation service such as OLACdot (Hughes, 2005). Early experiments with DCdot show significant promise, but need to be more carefully evaluated in light of recent research in the area (Greenberg, 2005). Training of critical importance Significant effort was invested in training key personnel, and the propagation of the institutional standards and training notes online, only a small number of face to face classes have been held. Young & Hughes, DC-ANZ 2005 20
  21. 21. Conclusion UniMelb was identified as one of the leading universities with regard to metadata implementation (Ivanova, 2004) Empirical evidence suggests that The University of Melbourne still faces significant challenges Compliance in the age of moving standards - over a 2 year period the evolution of external standards, web content creation tools, and web content demography is significant Strong basis for institutional metadata was formed by the adoption of Dublin Core the disparate content creation environment and rapidly changing composition of web content has induced a less than satisfactory application of these standards. Automated metadata creation and assessment, forming a significant component of future work may address this problem in part Young & Hughes, DC-ANZ 2005 21
  22. 22. Questions / Comments http://eprints.unimelb.edu.au/archive/00000983 Eve Young Metadata Coordinator Information Acquisition and Organisation Section Information Division e.young@unimelb.edu.au Baden Hughes Research Fellow Department of Computer Science and Software Engineering badenh@cs.mu.oz.au Young & Hughes, DC-ANZ 2005 22