Does metadata matter?
or… should we be interested in metadata and, if so, why?
I’m going to try to deliver 130 slides in 30 minutes
then you can ask questions (yes… I really did say “130 slides”)
non-technical
metadata jargon
first, some history…
 
 
metadata is…
machine-readable
descriptive
for the purposes of…
resource discovery
resource management
delivery / access control
use / re-use
long term preservation
 
MARC - Machine-Readable Catalogue
 
 
still the predominant metadata standard (in the library world)
 
a distributed search standard called… Z39.50 so that multiple library catalogues can be searched from one place
AACR2 currently being replaced by… RDA (more generic – i.e. not just books!)
Z39.50 supplemented by SRW and SRU (Web-friendly variants)
FRBR
none of which needs bother you…
other than to note that…
metadata tends to get more complicated the longer you think about it
1994
 
 
a few 10s of 1000s of pages
but recognised that finding stuff was going to start getting difficult
people (mainly librarians) began trying to catalogue it by hand
 
 
http://www.intute.ac.uk/ http://www.intute.ac.uk/
meanwhile…
AltaVista (first major Web search engine – circa 1995)
people began to realise that the metadata they embedded into Web pages might be important
hang on… did I just say “metadata embedded into Web pages”?
<html> <head> <title>A web page</title> </head> <body> … </body> </html>
<html> <head> <title>A web page</title> <meta name=“keywords” content=“some, key, words” /> <meta name=“description” content=“a summary” /> </head> <body> …
birth of the SEO industry
then came Google
and the rest, as they say, is history
Google takes note of links between pages Google PageRank
but places less emphasis on embedded metadata
metaspam <meta name=“keywords” content=“coca cola” />
metacrap <title>put your title here</title>
despite that, work continued on embedded metadata most notably in the form of…
Dublin Core (circa 1995)
initially 15 metadata elements a.k.a properties a.k.a. attribute/value pairs
contributor coverage creator date description format identifier language publisher relation rights source subject title type
embedded into Web pages
or encoded using XML
intention was to improve indexing by search engines
but people forgot about…
“ metaspam” and “metacrap”
the search engines didn’t!
and so, by and large, search engines still ignore embedded metadata
despite that, there has been fairly widespread adoption in policy terms
particularly in e-Government (e.g. UK eGMS)
but also in other areas – education, health, environmental agencies, libraries, cultural heritage sector, …
growth of rules around metadata content (i.e. cataloguing rules) (everyone’s rules are different)
and growth in use of additional elements for particular communities (and everyone’s additions are different)
such usage documented in the form of  “application profiles”
Dublin Core Metadata Initiative coordinating “standards” body (note again: growing complexity over time)
meanwhile…
the W3C developed the Resource Description Framework (RDF) (circa 1999)
the standard for the “Semantic Web”
Tim Berners-Lee’s vision for a machine-readable Web of data
allowing software to navigate and reason about Web content automatically
a Web of “Linked Data”
RDF, RDFS, OWL, FOAF, … (but also Microformats, RSS, …)
meanwhile…
elearning community was busy developing its own standards
IEEE LOM (Learning Object Metadata)
same as DC …but different! (different elements, different syntax)
<cough />
a brief aside
identifiers are important
URI (Uniform Resource Identifier) is the identifier system of the Web
but some issues… e.g. around persistence
to cut a long story short
it turns out that ‘http’ URIs (a.k.a. URLs) are the worst kind of Web identifier…
…apart from all the others
(not everyone agrees with that!)
<breath />
repositories
 
 
but arXiv was not the only repository
recognised need for aggregating metadata from different repositories into a single place so that it could be searched
OAI-PMH (a protocol for metadata harvesting)
harvesting metadata into repository search engines of which OAIster is best known (but it isn’t really used much)
and the major search engines like Google don’t support the OAI-PMH
because it isn’t mainstream enough (and because of “metaspam” and “metacrap”)
and so to 2008…
political agenda around institutional repositories
in order to store, manage and disclose…
institutional assets research papers learning objects research data
exposing metadata about these things using the OAI-PMH to search services
which services? err… OAIster :-(
what kind of metadata? typically DC
or a variant of DC because simple DC is really too simple to be very useful
but unfortunately…
…DC variants tend to be more complex and therefore metadata can’t be created very easily by ordinary researchers
but that’s another story!
ok…
why are we interested?
are we interested?
metadata is everywhere
 
 
 
 
 
 
metadata in sites like the Science Museum is mostly locked away within the site
can expect growing pressure to expose it on the Web for others to “mash up”
in HE, we are operating in an “institutional repository” political environment
we (should?) have an interest in repositories of research publications and research data particularly research data
metadata comes to the fore in scenarios where content is non-textual (e.g. data) and where required information can’t easily be derived from textual content (e.g. author name)
thank you
OK I lied… …it was 128 slides
unless you count these last two

Does metadata matter?