Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Open for Business - Open Archives, OpenURL, RSS and the Dublin Core



A presentation at UKSG 2004, Manchester.

A presentation at UKSG 2004, Manchester.



Total Views
Views on SlideShare
Embed Views



1 Embed 2 2



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Open for Business - Open Archives, OpenURL, RSS and the Dublin Core Presentation Transcript

  • 1. UKOLN is supported by: Open for Business Open Archives, OpenURL, RSS and the Dublin Core Andy Powell, UKOLN, University of Bath [email_address] UKSG 2004, Manchester a centre of expertise in digital information management
  • 2. Contents
    • context – metasearching and open ‘context sensitive’ linking
    • bluffer’s guides to…
      • Dublin Core
      • OAI Protocol for Metadata Harvesting
      • RSS
      • OpenURL
    • discussion about the benefits, problems and issues of using these standards in the publishing ‘business’ environment…
  • 3. Things to note…
    • this is a briefing session about technologies…
    • …but it is not intended to be overly technical
    • you should leave with an understanding of what the key technologies are – but not necessarily be expert in them!
  • 4. Important
    • this is a briefing session…
    • … please feel free to ask questions as we go through!
  • 5. Context: metasearching and context sensitive linking
  • 6. The ‘problem’…
    • end-user often has access to large number of heterogeneous collections - full-text, A&I, images, video, data, etc. (e.g. thru JISC licening agreements)
    • however, experience of these collections is less than optimal:
      • end-users not aware of available content
      • end-user has to interact with (search or browse) multiple different Web sites to work across range of content
      • content ‘discovery’ services not joined-up with delivery services
  • 7. Or, to put it another way…
    • from perspective of ‘data consumer’
      • need to interact with multiple collections of stuff - bibliographic, full-text, data, image, video, etc.
      • delivered thru multiple Web sites
      • few cross-collection discovery services (with exception of big search engines like Google, but still some issues with use of Google – e.g. the ‘invisible Web’, the lack of metadata, keywords with multiple meanings, etc.)
    • from perspective of ‘data provider’
      • few agreed mechanisms for disclosing availability of content
  • 8. A solution…
    • an ‘information environment’
    • framework of machine-oriented services allowing the end-user to
      • discover , access , use , publish resources across a range of content providers
      • move away from lots of stand-alone Web sites...
    • content providers expose metadata for
      • searching, harvesting , alerting
    • develop end-user services and tools that bring stuff together…
    • …based on open ‘standards’
  • 9. End-user services and tools
    • tend to focus on library portal (metasearch) tools (e.g. Encompass, MetaLib or ZPortal)
    • but, there will be lots of user-focused services and tools…
      • subject portals developed within academia
      • reading list and other tools in VLE (e.g. externally hosted by Sentient Discover)
      • commercial ‘portals’ (ISI Web of Knowledge, ingenta, Bb Resource Center, etc.)
      • SFX service component (or other OpenURL resolver)
      • personal desktop reference manager (e.g. Endnote)
  • 10. Link resolvers
    • ‘ discovery’ is only part of the problem…
    • in the case of books, journals, journal articles, end-user wants access to the most appropriate copy
    • need to join up discovery services with access/delivery services (local library OPAC, ingentaJournals, Amazon, etc.)
    • need localised view of available services
    • linking services that provide access to the most appropriate copy
      • user and institutional preferences, cost, access rights, location, etc.
  • 11. A shared problem space
    • the problems outlined here are shared across sectors and communities
      • student or researcher looking for information from variety of bibliographic sources
      • lecturer searching for e-learning resources from multiple learning object repositories
      • researcher working across multiple data-sets and compute servers on the Grid
      • a GP searching the National electronic Library for Health
      • school child searching BBC, museum and library Web sites for homework project
      • someone searching across multiple e-government Sites
      • even someone looking to buy or sell a second-hand car…
  • 12. Technologies
    • require global, standards-based, cross-domain solutions…
    • cross-searching
      • Z39.50 – Bath Profile, a profile of Z39.50 SRW (Search and Retrieve Web-service) (Web services implementation of Z39.50)
    • harvesting
      • OAI-PMH - Open Archives Initiative Protocol for Metadata Harvesting
    • alerting
      • RSS - RDF/Rich Site Summary
    • linking
      • OpenURL
    … and cross-domain metadata
  • 13. Bluffer’s Guide to… Dublin Core
  • 14. Bluffer’s guide to DC
    • DC short for Dublin Core
    • simple metadata standard, supporting ‘cross-domain’ resource discovery
    • original focus on Web resources but that is no longer the case – e.g. usage to describe physical artefacts in museums
    • current usage across wide range of sectors – academic, e-government, museums, libraries, business, semantic Web
  • 15. Bluffer’s Guide to DC
    • ‘simple DC’ provides 15 elements (metadata properties)
    • multiple encoding syntaxes including HTML <meta> tags, XML and RDF/XML (XML schema are available)
    dc:rights dc:identifier dc:publisher dc:coverage dc:format dc:description dc:relation dc:type dc:subject dc:language dc:date dc:creator dc:source dc:contributor dc:title
  • 16. Bluffer’s Guide to DC
    • relatively slow programme of adding new terms to ‘qualified DC’
      • new elements (e.g. dcterms:audience)
      • element refinements (e.g. dcterms:dateCopyrighted)
      • encoding schemes (e.g. dcterms:LCSH and dcterms:W3CDTF
      • 48 elements and 17 encoding schemes
  • 17. Bluffer’s Guide to DC
    • DC can be embedded into HTML pages but almost none of the big search engines will use it! Why? Lack of trust…
      • meta-spam
      • meta-crap
      • however, embedding DC in HTML may be worthwhile if your own site search engine uses it
    • however, simple DC forms baseline metadata format for the OAI protocol…
  • 18. Bluffer’s Guide to OAI Protocol for Metadata Harvesting
  • 19. OAI roots
    • the roots of OAI lie in the development of eprint archives…
      • arXiv, CogPrints, NACA (NASA), RePEc, NDLTD, NCSTRL
    • each offered Web interface for deposit of articles and for end-user searches
    • difficult for end-users to work across archives without having to learn multiple different interfaces
    • recognised need for single search interface to all archives
      • Universal Pre-print Service (UPS)
  • 20. Searching vs. harvesting
    • two possible approaches to building a single search interface to multiple eprint archives…
      • cross-searching multiple archives based on protocol like Z39.50
      • harvesting metadata into one or more ‘central’ services – bulk move data to the user-interface
    • US digital library experience in this area indicated that cross-searching not preferred approach
      • distributed searching of N nodes viable, but only for small values of N
  • 21. Harvesting requirements
    • in order that harvesting approach can work there need to be agreements about…
      • transport protocols – HTTP vs. FTP vs. …
      • metadata formats – DC vs. MARC vs. …
      • quality assurance – mandatory elements, mechanisms for naming of people, subjects, etc., handling duplicated records, best-practice
      • intellectual property and usage rights – who can do what with the records
    • work in this area resulted in the “Santa Fe Convention”
  • 22. Development of OAI-PMH
    • 2 year metamorphosis thru various names
      • Santa Fe Convention, OAI-PMH versions 1.0, 1.1…
      • OAI Protocol for Metadata Harvesting 2.0
    • development steered by international technical committee
    • inter-version stability helped developer confidence
    • move from focus on eprints to more generic protocol
      • move from OAI-specific metadata schema to mandatory support for DC
  • 23. Bluffer’s guide to OAI
    • OAI-PMH short for Open Archives Initiative Protocol for Metadata Harvesting
    • a low-cost mechanism for harvesting metadata records
      • from ‘data providers’ to ‘service providers’
    • allows ‘service provider’ to say ‘give me some or all of your metadata records’
      • where ‘some’ is based on date-stamps, sets, metadata formats
    • eprint heritage but widely deployed
      • images, museum artefacts, learning objects, …
  • 24. Bluffer’s guide to OAI
    • based on HTTP and XML
      • simple, Web-friendly, fast deployment
    • OAI-PMH is not a search protocol
      • but use can underpin search-based services based on Z39.50 or SRW or SOAP or…
    • OAI-PMH carries only metadata
      • content (e.g. full-text or image) made available separately – typically at URL in metadata
    • mandates simple DC as record format
      • but extensible to any XML format – IMS metadata, IEEE LOM, ONIX, MARC, METS, MPEG-21, etc.
  • 25. Bluffer’s guide to OAI
    • metadata and ‘content’ often made freely available – but not a requirement
      • OAI-PMH can be used between closed groups
      • or, can make metadata available but restrict access to content in some way
    • underlying HTTP protocol provides
      • access control – e.g. HTTP BASIC
      • compression mechanisms (for improving performance of harvesters)
      • could, in theory, also provide encryption if required
  • 26. Bluffer’s Guide to… RSS
  • 27. Bluffer’s guide to RSS
    • simple XML application for sharing (syndicating) ‘news’ feeds on the Web
    • RDF Site Summary or Rich Site Summary (depending on who you ask)
    • ‘news’ can be interpreted quite loosely, e.g. new items added to database
    • uses ‘channel’ and ‘item’ terminology
    • a ‘channel’ is an XML document that is made available on a Web-site – to update the channel, simply update the XML
  • 28. Bluffer’s guide to RSS
    • each ‘item’ has simple metadata (title, description) and URL link to resource (news story or whatever)
    • RSS also provides channel branding (logo, etc.)
    • three versions currently 0.9, 1.0 and 2.0 - 1.0 is based on RDF and is more flexible (but slightly more complex) (Also worth noting Atom – an attempt to resolve some of the tensions in RSS)
    • no single registry of all channels yet
  • 29. Bluffer’s guide to RSS
    • fairly widespread usage, e.g. channels available from the BBC, Microsoft, Apple, … as well as from several academic sites and services (RDN, LTSN, …)
    • easy to use within ‘portals’ (e.g. uPortal)
    • lots of software and toolkits available – open source and commercial
  • 30. Bluffer’s Guide to… OpenURLs
  • 31. OpenURL roots
    • the context
      • distributed information environment (e.g. the JISC IE)
      • multiple A&I and other discovery services
      • rapidly growing e-journal collection
      • need to interlink available resources
    • the problem
      • links controlled by external info services
      • links not sensitive to user’s context (appropriate copy problem)
      • links dependent on vendor agreements
      • links don’t cover complete collection
    a library perspective?
  • 32. The problem
    • the context
      • distributed information environment (e.g. the JISC IE)
      • multiple A&I and other discovery services
      • rapidly growing e-journal collection
      • need to interlink available resources
    • the REAL problem
      • libraries have no say in linking
      • libraries losing core part of ‘organising information’ task
      • expensive collection not used optimally
      • users not well served
    a library perspective?
  • 33. The solution…
    • do NOT hardwire a link to a single service on the referenced item (e.g. a link from an A&I service to the corresponding full-text)
    • BUT rather
      • provide a link that transports metadata about the referenced item
      • to another service that is better placed to provide service links
    OpenURL OpenURL resolver (link server)
  • 34. Non-OpenURL linking resolution of metadata into a link (typically a URL) A&I service document delivery service link to referenced work . reference link destination link source
  • 35. OpenURL linking . user-specific resolution of metadata & identifiers into services reference provision of OpenURL transportation of metadata & identifiers context-sensitive A&I service document delivery service link source OpenURL OpenURL resolver link link destination link link destination link link destination link link destination
  • 36. Example 1
    • journal article
    • from Web of Science to ingenta Journals
  • 37. button indicating OpenURL ‘link’ is available
  • 38. OpenURL resolver offering context-sensitive links, including link to ingenta
  • 39.  
  • 40. also links to other services such as Google search for related information
  • 41.  
  • 42. Example 2
    • book
    • from University of Bath OPAC to Amazon
  • 43. button indicating OpenURL ‘link’ is available
  • 44. OpenURL resolver offering context-sensitive links, including link to Amazon
  • 45.  
  • 46. also links to other services such as Google search for related information
  • 47.  
  • 48. Summary… ISI Web of Science University of Bath OPAC OpenURL resolver ingenta Google Amazon OpenURL Source OpenURL Resolver OpenURL Target
  • 49. Summary (2)
    • OpenURL source
      • a service that embeds OpenURLs into its user-interface in order to enable linking to most appropriate copy
    • OpenURL resolver
      • a service that links to appropriate copy(ies) and other value added services based on metadata in OpenURL
    • OpenURL target
      • a service that can be linked to from an OpenURL resolver using metadata in OpenURL
  • 50. Bluffer’s guide to OpenURLs
    • standard for linking ‘discovery’ services to ‘delivery’ services
    • supports linking from OpenURL ‘source’ to OpenURL ‘target’ via OpenURL ‘resolver’
    End-user source resolver target e.g. Web of Science e.g. ingenta
      • ?genre=article&
      • atitle=Information%20gateways:%20collaboration
      • %20on%20content &title=Online%20Information
      • %20Review &issn=1468-4527&volume=24&
      • spage=40&epage=45 &artnum=1&aulast=Heery&
      • aufirst=Rachel
  • 51. Bluffer’s guide to OpenURLs
    • the OpenURL is a URL that carries metadata from the ‘source’ service to the user’s preferred resolver
    • resolver typically offered by institution
    • currently deployed OpenURLs are often version 0.1 - focus on bibliographic resources (books and journal articles)
    • version 1.0 (the standard) – more generic and extensible, e.g. could carry metadata about learning objects or research data
  • 52. Bluffer’s guide to OpenURLs
    • ‘ sources’ need to maintain knowledge about end-user’s preferred resolver
    • resolvers and targets need to share knowledge about ‘link-to’ syntaxes
    • most library automation vendors will either have (or be developing) an OpenURL resolver solution for their customers
    • some open-source solutions also available – but expect to work quite hard with these
  • 53. Discussion…
  • 54. Summary
  • 55. Summary
    • protocols presented here fill space between ‘information providers’ and other services (‘portals’, VLEs, etc.)
      • allow integration of remote information resources more seamlessly
      • allow separation of ‘discovery’ and ‘content delivery’
      • enable user-focused, context-sensitive linking
      • can be viewed as ways of getting users to your site
    • but… there are some issues to beware of
  • 56. What can you do?
    • consider exposing metadata about your content for harvesting (or searching)
    • consider making ‘alerting’ channels available
    • consider supporting use of OpenURLs for linking to appropriate-copy
    • consider how your content will be used in e-learning context
    • consider how external services ‘link to’ your resources (i.e. support persistent deep linking to your content)