Does metadata matter?

or… should we be interested in metadata and, if so, why?

I’m going to try to deliver 130 slides in 30 minutes

then you can ask questions (yes… I really did say “130 slides”)

MARC - Machine-Readable Catalogue

still the predominant metadata standard (in the library world)

a distributed search standard called… Z39.50 so that multiple library catalogues can be searched from one place

AACR2 currently being replaced by… RDA (more generic – i.e. not just books!)

Z39.50 supplemented by SRW and SRU (Web-friendly variants)

none of which needs bother you…

metadata tends to get more complicated the longer you think about it

but recognised that finding stuff was going to start getting difficult

people (mainly librarians) began trying to catalogue it by hand

http://www.intute.ac.uk/ http://www.intute.ac.uk/

AltaVista (first major Web search engine – circa 1995)

people began to realise that the metadata they embedded into Web pages might be important

hang on… did I just say “metadata embedded into Web pages”?

<html> <head> <title>A web page</title> <meta name=“keywords” content=“some, key, words” /> <meta name=“description” content=“a summary” /> </head> <body> …

and the rest, as they say, is history

Google takes note of links between pages Google PageRank

but places less emphasis on embedded metadata

metaspam <meta name=“keywords” content=“coca cola” />

metacrap <title>put your title here</title>

despite that, work continued on embedded metadata most notably in the form of…

initially 15 metadata elements a.k.a properties a.k.a. attribute/value pairs

contributor coverage creator date description format identifier language publisher relation rights source subject title type

intention was to improve indexing by search engines

“ metaspam” and “metacrap”

and so, by and large, search engines still ignore embedded metadata

despite that, there has been fairly widespread adoption in policy terms

particularly in e-Government (e.g. UK eGMS)

but also in other areas – education, health, environmental agencies, libraries, cultural heritage sector, …

growth of rules around metadata content (i.e. cataloguing rules) (everyone’s rules are different)

and growth in use of additional elements for particular communities (and everyone’s additions are different)

such usage documented in the form of “application profiles”

Dublin Core Metadata Initiative coordinating “standards” body (note again: growing complexity over time)

the W3C developed the Resource Description Framework (RDF) (circa 1999)

the standard for the “Semantic Web”

Tim Berners-Lee’s vision for a machine-readable Web of data

allowing software to navigate and reason about Web content automatically

RDF, RDFS, OWL, FOAF, … (but also Microformats, RSS, …)

elearning community was busy developing its own standards

IEEE LOM (Learning Object Metadata)

same as DC …but different! (different elements, different syntax)

URI (Uniform Resource Identifier) is the identifier system of the Web

but some issues… e.g. around persistence

it turns out that ‘http’ URIs (a.k.a. URLs) are the worst kind of Web identifier…

(not everyone agrees with that!)

but arXiv was not the only repository

recognised need for aggregating metadata from different repositories into a single place so that it could be searched

OAI-PMH (a protocol for metadata harvesting)

harvesting metadata into repository search engines of which OAIster is best known (but it isn’t really used much)

and the major search engines like Google don’t support the OAI-PMH

because it isn’t mainstream enough (and because of “metaspam” and “metacrap”)

political agenda around institutional repositories

in order to store, manage and disclose…

institutional assets research papers learning objects research data

exposing metadata about these things using the OAI-PMH to search services

which services? err… OAIster :-(

what kind of metadata? typically DC

or a variant of DC because simple DC is really too simple to be very useful

…DC variants tend to be more complex and therefore metadata can’t be created very easily by ordinary researchers

metadata in sites like the Science Museum is mostly locked away within the site

can expect growing pressure to expose it on the Web for others to “mash up”

in HE, we are operating in an “institutional repository” political environment

we (should?) have an interest in repositories of research publications and research data particularly research data

metadata comes to the fore in scenarios where content is non-textual (e.g. data) and where required information can’t easily be derived from textual content (e.g. author name)

OK I lied… …it was 128 slides

unless you count these last two

Does metadata matter?

More Related Content

What's hot

Viewers also liked

Similar to Does metadata matter?

More from Eduserv Foundation

Recently uploaded

Does metadata matter?