UKOLN is supported  by: Re-usable metadata, re-usable content Paul Walk Technical Manager [email_address] A centre of expe...
harvesting, searching, syndicating <ul><li>options for metadata and content: </li></ul><ul><li>the lines can be blurred </...
being harvestable (1) <ul><li>Open Archives Initiative </li></ul><ul><ul><li>OAI-PMH </li></ul></ul><ul><ul><li>repositori...
being harvestable (2) <ul><li>what is your metadata record actually going to point to? </li></ul><ul><ul><li>more than one...
being searchable (1) <ul><li>exposing your content to search engines </li></ul><ul><li>search engine optimisation (SEO) </...
being searchable (2) <ul><li>Z39.50 </li></ul><ul><ul><li>from the library domain </li></ul></ul><ul><ul><li>allows the ta...
being searchable (3) <ul><li>search portals </li></ul><ul><li>community portals </li></ul><ul><li>institutional portals/VL...
be syndicable, enable re-use by 3rd parties <ul><li>consider RSS (and the Atom syndication format) </li></ul><ul><ul><li>i...
human and machine interfaces (1) <ul><li>they’re completely different....right? </li></ul><ul><li>well, not necessarily </...
human and machine interfaces (2) <ul><li>‘ screen-scraping’ is back in fashion </li></ul><ul><li>plain old semantic HTML (...
future design: taking a REST from service provision <ul><li>the  resource -oriented-architecture </li></ul><ul><li>ReST: <...
my suggestions <ul><li>using  web  protocols </li></ul><ul><li>make content  addressable  - and persistently so </li></ul>...
acknowledgements <ul><li>in preparation for this presentation, I blogged about giving this presentation and asked my reade...
comments <ul><li>Ian Ibbotson said: </li></ul><ul><ul><li>It’s very hard to engineer a consistent search user interface wh...
questions?
Upcoming SlideShare
Loading in …5
×

Re-usable metadata, re-usable content

1,908 views

Published on

A short presentation about re-usable content (and metadata) given to a JISC Digitisation Programme meeting)

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,908
On SlideShare
0
From Embeds
0
Number of Embeds
55
Actions
Shares
0
Downloads
24
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Re-usable metadata, re-usable content

  1. 1. UKOLN is supported by: Re-usable metadata, re-usable content Paul Walk Technical Manager [email_address] A centre of expertise in digital information management www.ukoln.ac.uk
  2. 2. harvesting, searching, syndicating <ul><li>options for metadata and content: </li></ul><ul><li>the lines can be blurred </li></ul><ul><ul><li>search engines also harvest! </li></ul></ul><ul><li>your metadata may be my content </li></ul>metadata content harvestable searchable ✓ ✓ syndicable ✓
  3. 3. being harvestable (1) <ul><li>Open Archives Initiative </li></ul><ul><ul><li>OAI-PMH </li></ul></ul><ul><ul><li>repositories </li></ul></ul><ul><ul><li>OAI-ORE </li></ul></ul><ul><li>aggregators: </li></ul><ul><li>Intute Institutional Repository Search </li></ul><ul><ul><li>currently harvesting eprints metadata records from 88 institutions </li></ul></ul><ul><ul><li>planning to explore the harvesting of metadata for: </li></ul></ul><ul><ul><ul><li>images </li></ul></ul></ul><ul><ul><ul><li>learning objects </li></ul></ul></ul><ul><ul><ul><li>other media..... </li></ul></ul></ul><ul><li>MLA’s Discover Service </li></ul><ul><ul><li>your content is of interest to other domains </li></ul></ul>
  4. 4. being harvestable (2) <ul><li>what is your metadata record actually going to point to? </li></ul><ul><ul><li>more than one item of content? </li></ul></ul><ul><ul><li>a ‘jumping off’ page? </li></ul></ul><ul><ul><li>is this consistent? </li></ul></ul><ul><li>what metadata format are you going to use? </li></ul><ul><ul><li>is it commonly supported? </li></ul></ul><ul><ul><li>are you using it correctly? (you’d be surprised.....) </li></ul></ul><ul><li>where/how is your metadata going to be used? </li></ul><ul><ul><li>this is necessarily out of your control! </li></ul></ul>
  5. 5. being searchable (1) <ul><li>exposing your content to search engines </li></ul><ul><li>search engine optimisation (SEO) </li></ul><ul><ul><li>make it easy for the search engines </li></ul></ul><ul><ul><li>have content people want </li></ul></ul><ul><ul><li>make it eminently linkable </li></ul></ul><ul><li>Google is your friend! </li></ul><ul><ul><li>SiteMaps - describe your content in ways Google can understand </li></ul></ul><ul><ul><li>OAI-PMH interface can be treated as a SiteMap </li></ul></ul>
  6. 6. being searchable (2) <ul><li>Z39.50 </li></ul><ul><ul><li>from the library domain </li></ul></ul><ul><ul><li>allows the target to participate in a cross search </li></ul></ul><ul><ul><li>very mature, very widely deployed </li></ul></ul><ul><ul><li>not a web protocol </li></ul></ul><ul><li>SRU </li></ul><ul><ul><li>web-ified Z39.50 </li></ul></ul><ul><ul><li>ReSTful </li></ul></ul><ul><ul><li>Common Query Language (CQL) </li></ul></ul><ul><li>SRW </li></ul><ul><ul><li>as above, but for heavier SOA/Web Services use </li></ul></ul><ul><li>OpenSearch </li></ul><ul><ul><li>piggyback on RSS/Atom </li></ul></ul>
  7. 7. being searchable (3) <ul><li>search portals </li></ul><ul><li>community portals </li></ul><ul><li>institutional portals/VLEs </li></ul>
  8. 8. be syndicable, enable re-use by 3rd parties <ul><li>consider RSS (and the Atom syndication format) </li></ul><ul><ul><li>in some ways the lingua franca of Web 2.0 </li></ul></ul><ul><ul><li>machine and human friendly </li></ul></ul><ul><ul><li>surprsing how much content lends itself to this structure </li></ul></ul><ul><li>RSS2.0 can also ‘enclose’ binary data </li></ul><ul><ul><li>syndicating podcasts </li></ul></ul><ul><li>“ the coolest use of your data will be thought of by someone else” </li></ul><ul><li>be mashup friendly: </li></ul><ul><ul><li>addressable content </li></ul></ul><ul><ul><li>cool URLs </li></ul></ul><ul><ul><li>simple formats </li></ul></ul><ul><ul><li>aspire to APIs that need no documentation! </li></ul></ul>
  9. 9. human and machine interfaces (1) <ul><li>they’re completely different....right? </li></ul><ul><li>well, not necessarily </li></ul><ul><ul><li>RSS! </li></ul></ul><ul><ul><li>OAI-PMH with a CSS stylesheet referenced from the XML </li></ul></ul>
  10. 10. human and machine interfaces (2) <ul><li>‘ screen-scraping’ is back in fashion </li></ul><ul><li>plain old semantic HTML (POSH) </li></ul><ul><li>linked-data (the semantic web with a small ‘s’) </li></ul><ul><li>the web of data is imminent! </li></ul>
  11. 11. future design: taking a REST from service provision <ul><li>the resource -oriented-architecture </li></ul><ul><li>ReST: </li></ul><ul><ul><li>resources with cool URLs </li></ul></ul><ul><ul><li>4 HTTP verbs: get, put, post & delete </li></ul></ul><ul><ul><li>CRUD for the Web (create, retrieve, update, delete) </li></ul></ul><ul><li>make everything addressable with URLs </li></ul><ul><li>be cool! </li></ul><ul><ul><li>make the URLs persistent </li></ul></ul><ul><ul><li>make them human-parsable </li></ul></ul><ul><ul><li>e.g. </li></ul></ul><ul><ul><ul><li>http://www.myserver.com/gallery/collections/pictures/image_0001.jpg </li></ul></ul></ul><ul><ul><li>is better than: </li></ul></ul><ul><ul><ul><li>http://www.myserver.com/gallery.php?collection_id=7&item_id=0001 </li></ul></ul></ul>
  12. 12. my suggestions <ul><li>using web protocols </li></ul><ul><li>make content addressable - and persistently so </li></ul><ul><li>reduce barriers to third-parties developing other (competing!?) UIs </li></ul><ul><ul><li>are our UIs really just ‘gateways’ to information (implying that there is a wall around that information) </li></ul></ul><ul><li>making the machine APIs the heart of our services </li></ul><ul><ul><li>a good design principle is to use the machine API as the API used by our own user-interfaces </li></ul></ul><ul><ul><li>we just can’t know for sure all the ways in which our information services might be used </li></ul></ul>
  13. 13. acknowledgements <ul><li>in preparation for this presentation, I blogged about giving this presentation and asked my readers: </li></ul><ul><ul><li>“ Aside from the obvious stuff like OAI-PMH, Google, RSS, what should I be talking about? Persistent identifiers? Cool URLs? Any other suggestions?” </li></ul></ul><ul><li>6 responses - all containing great suggestions which I have incorporated into this presentation, from the following people: </li></ul><ul><ul><li>Jim Downing, Owen Stephens, Ian Ibbotson, Pete Johnston, Mike Ellis </li></ul></ul><ul><li>thanks!! </li></ul><ul><li>you can read all of the comments, and find links/addresses for these people on my blog at: </li></ul><ul><ul><li>http://blog.paulwalk.net/2008/02/11/making-digitised-content-available-for-searching-and-harvesting/ </li></ul></ul>
  14. 14. comments <ul><li>Ian Ibbotson said: </li></ul><ul><ul><li>It’s very hard to engineer a consistent search user interface when half the metadata refers to the actual digital artefact, and half to a front page. It’s useful to have both links, as you can then negotiate with providers if they feel you need to go through a front page for stats and marketing.... </li></ul></ul><ul><li>Pete Johnstone said: </li></ul><ul><ul><li>a shift away from the “repository” towards the “collection” or “collections” (which I think is the consequence of a more “resource-oriented view”) </li></ul></ul><ul><li>Owen Stephens said: </li></ul><ul><ul><li>Integration of resources into the wider web - e.g. LoC experiment with Flickr to expose content. Many projects in this area create a new silo of material that is hidden from the wider web [...] reusable metadata as well as objects. </li></ul></ul><ul><li>Jim Downing said: </li></ul><ul><ul><li>....making the content reusable (not a hard sell in eLearning?). Recent use of RDF and Atom in a cultural setting: Asemantics BBC aggregator </li></ul></ul><ul><li>Mike Ellis said: </li></ul><ul><ul><li>....RSS, and possibly “programmable” RSS (for example, surfacing search results by adding query parameters to the feed address, etc).... </li></ul></ul>
  15. 15. questions?

×