The paper trail:steps towards a reference model for the metadata ecology


Published on

The paper trail: steps towards a reference model for the metadata ecology, presentation at ~CoLIS5 workshop. Presentation with Jane Barton.
Archiving- from June 2005.
please note this presentation is currently all rights reserved until i contact the other author.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Introduction to exercise The first thing you do look for your work or people you know – new paper, book,search tool Looking for…[citation] Indulge the example - Jane’s paper Highlight simple object, simple purpose of metadata: discovery and citation e-print, single version – in the wider community represented here its an easy object in itself: not LO; Image; video; museum object; dataset or anything else the desirable metadata is geared towards resource discovery and to some extent citation – that’s it. Key to diagrams 4 type2 represented in our discovery environment as you’d expect a variety of entities in the wider environment. Places where a digital copy of the paper sits; harvesters of metadata; metadata creators; automatic web tools
  • Which object do they link to? Note paper is OA on DC/conf site Explain contents of slide, what are all these things? Explain why erpanet x2 Examine links and comment Visibility in Google? Oai harvester’s aren’t pointing to ‘real’ object
  • Metadata transfer Note on zniff No md connections in catalogues or resource ,ists no m2m relationship Where m2m inspection suggests little human ibtervention Note on the unexpected isolation of worldcat The obvious observation – far fewer connections, all OAI based – no apparent importing into repositories or catalogues
  • Created by author Created within 15 feet of the author
  • Note that this is a very simple example (lifecycle’s and associated workflows would be more complex outside of the e-print situation) Further note that metadata for e-prints are likely to serve an obvious purpose resource discovery/ management and will not occur that much outside of this domain Lifecycle- note include specification of repository, design workflow, implementation Lifecycle’s also straight forward – single iteration most likely so lifecycle is ingest workflow, otherwise metadata fairly static These 4 lifecycles provide a snapshot of the metadata creation processes involved in creating the paper trail we’ve just seen.
  • [check md set for diff reps] No revisions Time taken by redoing this Md referred back to by reps editors if questions Intellectual decision about content of object
  • Set up of worldcat Choice of controlled etc. Periodic updating of these tools and md review Choice to use aacr2 – resource types and semantic follow from that Located electronic copy Examples of authority files and controlled vocabularies used Name authority lcnaf; lcsh; resource type Export though unused key Subject classification process This should produce one of the ‘best’ records Exporting native marc/ xml/ dc/ oai potential/ link usage and other through yahoo Should be able to import…
  • Other resource list example needed as this is both most frequent and thinnest Title (often as link ) and abstractin some form are often the only md
  • Harvester set up choices of what to store harvest/ software/ and set up choices What changes in harvester repository? Transformations Subject Date normalisation url of source
  • A lot of metadata activity – investment not trivial So paper found in all these places, what’s return on the effort expended? If it was someone else looking for the paper what do they need to know? Citation needs [list from later] Assessment made on what md is found/ visible
  • Based on visible md
  • Scores not finished What to comment on – hairst Worldcat Zniff Metalis Other odd points Pages Conference location/dates Publication editor Subject terms (variety) PURL
  • DELOS workshop breakout session on developing a reference model for repositories – note focus on individual repositories. Noted ‘repository’ is an accepted term for certain sorts of collections against which services can be offered, choice of technical solutions for delivering repository services. Discussed repositories as social constructs with many roles, considered it would be useful to build up a list of services that would be offered under each of these roles, and to identify those which are 'common services' and those which are specific to a particular sort of repository. Johns Hopkins U. is currently working on a "A Technology Analysis of Repositories and Services", funded by the Mellon Foundation. They are: * gathering and writing repository use cases and scenarios * mapping functionality to various repository interfaces * looking at the attributes of a repository interface layer Presentations at CNI and DLF. Currently inviting use cases and scenarios for repositories.
  • Use Heatherbank example: physical exhibition digitised and repurposed as a virtual museum exhibit based on a collection of digital library objects and as an interconnected set of learning objects – two object lifecycles developed in parallel to produce an extended object lifecycle Related object lifecycles arise from other uses of the exhibition content eg some photographs have previously been used as illustrations in books and journal articles; individual digital library objects or learning objects may subsequently be reused or repurposed elsewhere
  • The paper trail:steps towards a reference model for the metadata ecology

    1. 1. The paper trail: steps towards a reference model for the metadata ecology R. John Robertson & Jane Barton Centre for Digital Library Research University of Strathclyde, UK
    2. 2. Overview <ul><li>The paper trail: </li></ul><ul><ul><li>tracking an object and its metadata </li></ul></ul><ul><ul><li>views of the object’s metadata lifecycle </li></ul></ul><ul><ul><li>analysis of metadata quality </li></ul></ul><ul><li>Modelling the metadata ecology: </li></ul><ul><ul><li>metadata lifecycle, extended lifecycle & ecology </li></ul></ul><ul><ul><li>a continuum of reference models </li></ul></ul><ul><ul><li>components of the metadata ecology model </li></ul></ul><ul><ul><li>existing models & frameworks </li></ul></ul><ul><ul><li>applications of the model </li></ul></ul>
    3. 3. Tracking an object & its metadata <ul><li>Introduction to exercise </li></ul><ul><ul><li>The first thing you do </li></ul></ul><ul><ul><li>Looking for Barton, J. Currier, S. and Hey, J. M. N. (2003) Building quality assurance into metadata creation: an analysis based on the learning objects and e-prints communities of practice . In Sutton, S. and Greenberg, J. and Tennis, J., Eds. Proceedings 2003 Dublin Core Conference: Supporting Communities of Discourse and Practice - Metadata Research and Applications , Seattle, Washington (USA), 39-48. </li></ul></ul><ul><ul><li>Highlight simple object, simple purpose of metadata </li></ul></ul><ul><li>Key to diagrams </li></ul>E-Prints Soton E-Prints UK Worldcat zniff Copy of paper and metadata OAI harvested record Automatically created metadata Manually created metadata
    4. 4. Tracking the object DC2003/ DCMI E-lis Strathprints E-Prints Soton E-Prints UK Arc (ODU) Oaister Metalis HAIRST Citebase Worldcat CDLR pubs Erpanet Stephen’s Web zniff Tardis list CIS pubs Erpanet Other resource lists
    5. 5. Tracking the object’s metadata DC2003/ DCMI E-lis Strathprints E-Prints Soton E-Prints UK Arc (ODU) Oaister Metalis HAIRST Citebase Worldcat CDLR pubs Erpanet Stephen’s Web zniff Tardis list CIS pubs Erpanet Other resource lists
    6. 6. Tracking the object’s metadata DC2003/ DCMI E-lis Strathprints E-Prints Soton E-Prints UK Arc (ODU) Oaister Metalis HAIRST Citebase Worldcat CDLR pubs Erpanet Stephen’s Web zniff Tardis list CIS pubs Erpanet Other resource lists
    7. 7. Four metadata lifecycles for the object <ul><li>Zoom in to look at metadata activity in sections of previous diagram. </li></ul><ul><li>What do these relationships imply for the metadata lifecycles associated with this paper? </li></ul><ul><li>What does this look like in terms of metadata workflows? </li></ul><ul><ul><ul><li>Author deposit </li></ul></ul></ul><ul><ul><ul><li>Worldcat </li></ul></ul></ul><ul><ul><ul><li>Resource list </li></ul></ul></ul><ul><ul><ul><li>Harvester </li></ul></ul></ul>
    8. 8. Author metadata lifecycle for the object DC2003 Strathprints CDLR pubs CIS pubs Erpanet Title Authors Abstract Publisher Date Url Pages Title Authors Publisher Date Url bibtex Review process Review process
    9. 9. Worldcat metadata lifecycle for the object E-Prints Soton Worldcat Title Authors Publisher Date Url DC MaRC Yahoo search Authority files Controlled vocabulary tools Subject
    10. 10. Resource list metadata lifecycle for the object DC2003 Erpanet Stephen’s Web Tardis zniff Other resource lists Title Authors Date Url Title Url
    11. 11. Harvester metadata lifecycle for the object DC2003 E-lis Strathprints E-Prints Soton E-Prints UK Oaister Metalis HAIRST Citebase Erpanet GET GET GET GET GET GET GET GET Review process
    12. 12. Analysis of metadata quality <ul><li>The return on the metadata investment in this paper </li></ul><ul><ul><li>What metadata do we look for when searching for a doc? </li></ul></ul><ul><ul><ul><li>Author </li></ul></ul></ul><ul><ul><ul><li>Title </li></ul></ul></ul><ul><ul><ul><li>Date </li></ul></ul></ul><ul><ul><ul><li>url </li></ul></ul></ul><ul><ul><li>Searching for a citation </li></ul></ul>
    13. 13. Analysis of discovery metadata y y y y n n conference paper url y y n y y n Date n n n y y p Author y y n y y y Title erpanet stephen's web zniff cdlr worldcat HAIRST Element n n n y p y conference paper url p n n p y y Date y y n y y y Author y y y y y y Title metalis e-Prints UK strathprints e-lis e-prints Soton DC2003   Repository Element
    14. 14. Analysis of Citation completeness 6 2 2 2 9 5 6 Citation score Tardis list metalis e-Prints UK strathprints e-lis e-prints Soton DC2003 Repository 2 (typically) 4 5 2 6 5 1 Citation score resource list erpanet stephen's web zniff cdlr worldcat HAIRST Repository URL Pages Publication place Editors Publisher Conference date / location Conference Date Title Author
    15. 15. Reflecting on this paper trail <ul><li>Duplication of effort </li></ul><ul><li>Confusion rather than good diversity </li></ul><ul><li>Points in system capable of metadata exchange or augmentation- not happening; neither are tools in use. </li></ul><ul><li>The possibility of joining lifecycles up and so addressing these issues depends on being able to locate and understand relevant sections of ecology – this is in turn dependent on model for this ecology </li></ul>
    16. 16. Defining the metadata ecology <ul><li>The ‘metadata ecology’ captures all metadata activity associated with a single object: </li></ul><ul><ul><li>the object’s metadata lifecycle at any given point in the system </li></ul></ul><ul><ul><li>extended metadata lifecycles for the object integrated across several points in the system </li></ul></ul><ul><ul><li>the relationships between all metadata lifecycles associated with the object throughout the system </li></ul></ul>
    17. 17. Illustrating the metadata ecology: metadata lifecycle Strathprints Review process HAIRST GET Author/ depositor
    18. 18. Illustrating the metadata ecology: extended metadata lifecycle Strathprints CDLR pubs CIS pubs Review process HAIRST GET
    19. 19. Illustrating the metadata ecology: metadata relationships Strathprints CDLR pubs CIS pubs Review process HAIRST GET E-lis Metalis Review process GET Metadata relationship
    20. 20. Reference models <ul><li>The ‘metadata ecology’ is part of a continuum of reference models for the distributed information environment at various levels of granularity: </li></ul><ul><ul><li>ecology of repositories </li></ul></ul><ul><ul><li>object ecology </li></ul></ul><ul><ul><li>metadata ecology </li></ul></ul>
    21. 21. Ecology of repositories <ul><li>provides a typology of repositories and associated services </li></ul><ul><li>models the relationships between them and between their domains </li></ul><ul><li>requires an understanding of the purpose(s) of repositories locally and in the wider community, as well as their technical profiles and interactions </li></ul>
    22. 22. Object ecology <ul><li>profiles objects within repositories </li></ul><ul><li>maps their movement, transformation and adaptation within individual repositories and in the wider environment </li></ul><ul><li>goes beyond object lifecycle to include extended object lifecycle and associated relationships </li></ul><ul><li>requires resolution of persistent object identification and digital rights issues </li></ul><ul><li>possible parallels with the learning object economy or the scholarly publishing model </li></ul>
    23. 23. Metadata ecology <ul><li>profiles metadata within repositories </li></ul><ul><li>maps the movement, augmentation and enhancement of metadata in the wider system </li></ul><ul><li>distinguishes between local metadata requirements and those of the wider system </li></ul><ul><li>enables clusters of similar repositories to be identified and relationships established </li></ul><ul><li>includes metadata activity resulting from these relationships, formal or informal </li></ul>
    24. 24. Components of the ecology model
    25. 25. Existing models & frameworks <ul><li>Existing models that relate to (parts of) the reference models: </li></ul><ul><li>the E-Learning Framework </li></ul><ul><li>McLean & Blinco’s cosmic view </li></ul><ul><li>the JISC Information Environment </li></ul><ul><li>CORDRA </li></ul><ul><li>the work of Gon ç alves et al </li></ul><ul><li>CIDOC Conceptual Reference Model (CRM) </li></ul><ul><li>FRBR data model </li></ul>
    26. 26. The E-Learning Framework (ELF) <ul><li>A common approach to service oriented architectures for education via: </li></ul><ul><li>a definitional model of service components </li></ul><ul><li>standards & tools to support their interoperability </li></ul><ul><li>Addresses a specific domain & provides a typology of functions within that domain </li></ul><ul><li>(The E-Learning Framework. ) </li></ul>
    27. 27. McLean & Blinco’s cosmic view <ul><li>A service domain typology of repositories: </li></ul><ul><li>more comprehensive than ELF but less detailed </li></ul><ul><li>highlights potential for cross-domain approach </li></ul><ul><li>identifies need for better articulation of context & methodologies to deal with complex contextual issues </li></ul><ul><li>(McLean, N. The ecology of repository services: a cosmic view. ECDL, 2004. ) </li></ul>
    28. 28. The JISC Information Environment <ul><li>Provides convenient access to a comprehensive collection of scholarly & educational materials </li></ul><ul><li>can be viewed as a specific implementation of ELF </li></ul><ul><li>provides a superstructure to inform & co-ordinate technical infrastructure development </li></ul><ul><li>focuses on technical solutions to support structural & syntactical interoperability </li></ul><ul><li>taking a lead in addressing unresolved issues in the object lifecycle </li></ul><ul><li>(JISC. Strategic activities: Information Environment. 2004. ) </li></ul>
    29. 29. CORDRA <ul><li>Enables access to wide range of learning object repositories through federated searching: </li></ul><ul><li>high common denominator for participating repositories </li></ul><ul><li>creates a community of repositories behind an interoperability boundary </li></ul><ul><li>assumes federation as the method of interaction, with metadata integration rather than interoperability </li></ul><ul><li>(Kraan,W. & Mason,J. Issues in federating repositories: a report on the first International CORDRA Workshop. D-Lib Magazine, 11(3), 2005.) </li></ul>
    30. 30. Gon ç alves et al’s 5S <ul><li>Complex formal taxonomy of repositories: </li></ul><ul><li>comprehensively catalogues repositories from five perspectives </li></ul><ul><li>engages with all three reference models but does not engage with interactions & offers only a static view </li></ul><ul><li>(Gon ç alves,M.A. et al. Streams, structures, spaces, scenarios, societies (5S): a formal model for digital libraries. ACM Transactions on Information Systems, 22(2), 2004.) </li></ul>
    31. 31. Existing models & frameworks <ul><li>In general, existing models </li></ul><ul><li>address structural & syntactic interactions to a degree but do not address semantic interactions </li></ul><ul><li>provide voices, vocabularies & grammar for repositories </li></ul><ul><li>could usefully be extended to profile not only what repositories do but how they might interact with each other </li></ul>
    32. 32. Moving forward… <ul><li>Development and exploitation of the metadata ecology requires: </li></ul><ul><li>a standard way of profiling repositories at repository, object and metadata level </li></ul><ul><li>clear articulation of metadata requirements, in terms of structure, semantics and syntax, and of associated metadata workflows </li></ul><ul><li>integration with registries of repositories, standards, application profiles and vocabularies </li></ul>
    33. 33. Potential applications <ul><li>The metadata ecology enables repositories to optimise metadata workflow and quality by: </li></ul><ul><li>exploiting known metadata sources via intelligent import or harvesting </li></ul><ul><li>exploiting formal metadata relationships between repositories via negotiation and establishment of minimum standards </li></ul><ul><li>providing a framework for assessing the cost/benefit of eg implementing metadata elements or participating in consortia </li></ul>