Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Does metadata matter?

A lunchtime seminar for Eduserv staff.

  • Login to see the comments

Does metadata matter?

  1. Does metadata matter?
  2. <ul><li>or… </li></ul><ul><li>should we be interested in metadata and, if so, why? </li></ul>
  3. <ul><li>I’m going to try to deliver 130 slides in 30 minutes </li></ul>
  4. <ul><li>then you can ask questions </li></ul><ul><li>(yes… I really did say “130 slides”) </li></ul>
  5. <ul><li>non-technical </li></ul>
  6. <ul><li>metadata jargon </li></ul>
  7. <ul><li>first, some history… </li></ul>
  10. <ul><li>metadata is… </li></ul>
  11. <ul><li>machine-readable </li></ul>
  12. <ul><li>descriptive </li></ul>
  13. <ul><li>for the purposes of… </li></ul>
  14. <ul><li>resource discovery </li></ul>
  15. <ul><li>resource management </li></ul>
  16. <ul><li>delivery / access control </li></ul>
  17. <ul><li>use / re-use </li></ul>
  18. <ul><li>long term preservation </li></ul>
  20. <ul><li>MARC - Machine-Readable Catalogue </li></ul>
  23. <ul><li>still the predominant </li></ul><ul><li>metadata standard </li></ul><ul><li>(in the library world) </li></ul>
  25. <ul><li>a distributed search standard called… </li></ul><ul><li>Z39.50 </li></ul><ul><li>so that multiple library catalogues can be searched from one place </li></ul>
  26. <ul><li>AACR2 currently being replaced by… </li></ul><ul><li>RDA </li></ul><ul><li>(more generic – i.e. not just books!) </li></ul>
  27. <ul><li>Z39.50 supplemented by SRW and SRU </li></ul><ul><li>(Web-friendly variants) </li></ul>
  28. <ul><li>FRBR </li></ul>
  29. <ul><li>none of which needs bother you… </li></ul>
  30. <ul><li>other than to note that… </li></ul>
  31. <ul><li>metadata tends to get more complicated the longer you think about it </li></ul>
  32. <ul><li>1994 </li></ul>
  35. <ul><li>a few 10s of 1000s of pages </li></ul>
  36. <ul><li>but recognised that finding stuff was going to start getting difficult </li></ul>
  37. <ul><li>people (mainly librarians) began trying to catalogue it by hand </li></ul>
  41. <ul><li>meanwhile… </li></ul>
  42. <ul><li>AltaVista </li></ul><ul><li>(first major Web search engine – circa 1995) </li></ul>
  43. <ul><li>people began to realise that the metadata they embedded into Web pages might be important </li></ul>
  44. <ul><li>hang on… </li></ul><ul><li>did I just say “metadata embedded into Web pages”? </li></ul>
  45. <ul><li><html> </li></ul><ul><li><head> </li></ul><ul><li><title>A web page</title> </li></ul><ul><li></head> </li></ul><ul><li><body> </li></ul><ul><li>… </li></ul><ul><li></body> </li></ul><ul><li></html> </li></ul>
  46. <ul><li><html> </li></ul><ul><li><head> </li></ul><ul><li><title>A web page</title> </li></ul><ul><li><meta name=“keywords” content=“some, key, words” /> </li></ul><ul><li><meta name=“description” content=“a summary” /> </li></ul><ul><li></head> </li></ul><ul><li><body> </li></ul><ul><li>… </li></ul>
  47. <ul><li>birth of the SEO industry </li></ul>
  48. <ul><li>then came Google </li></ul>
  49. <ul><li>and the rest, as they say, is history </li></ul>
  50. <ul><li>Google takes note of links between pages </li></ul><ul><li>Google PageRank </li></ul>
  51. <ul><li>but places less emphasis on embedded metadata </li></ul>
  52. <ul><li>metaspam </li></ul><ul><li><meta name=“keywords” content=“coca cola” /> </li></ul>
  53. <ul><li>metacrap </li></ul><ul><li><title>put your title here</title> </li></ul>
  54. <ul><li>despite that, work continued on embedded metadata </li></ul><ul><li>most notably in the form of… </li></ul>
  55. <ul><li>Dublin Core </li></ul><ul><li>(circa 1995) </li></ul>
  56. <ul><li>initially 15 metadata elements </li></ul><ul><li>a.k.a properties </li></ul><ul><li>a.k.a. attribute/value pairs </li></ul>
  57. <ul><li>contributor </li></ul><ul><li>coverage </li></ul><ul><li>creator </li></ul><ul><li>date </li></ul><ul><li>description </li></ul><ul><li>format </li></ul><ul><li>identifier </li></ul><ul><li>language </li></ul><ul><li>publisher </li></ul><ul><li>relation </li></ul><ul><li>rights </li></ul><ul><li>source </li></ul><ul><li>subject </li></ul><ul><li>title </li></ul><ul><li>type </li></ul>
  58. <ul><li>embedded into Web pages </li></ul>
  59. <ul><li>or encoded using XML </li></ul>
  60. <ul><li>intention was to improve indexing by search engines </li></ul>
  61. <ul><li>but people forgot about… </li></ul>
  62. <ul><li>“ metaspam” and “metacrap” </li></ul>
  63. <ul><li>the search engines didn’t! </li></ul>
  64. <ul><li>and so, by and large, </li></ul><ul><li>search engines still ignore embedded metadata </li></ul>
  65. <ul><li>despite that, there has been fairly widespread adoption in policy terms </li></ul>
  66. <ul><li>particularly in e-Government </li></ul><ul><li>(e.g. UK eGMS) </li></ul>
  67. <ul><li>but also in other areas – education, health, environmental agencies, libraries, cultural heritage sector, … </li></ul>
  68. <ul><li>growth of rules around metadata content </li></ul><ul><li>(i.e. cataloguing rules) </li></ul><ul><li>(everyone’s rules are different) </li></ul>
  69. <ul><li>and growth in use of additional elements for particular communities </li></ul><ul><li>(and everyone’s additions are different) </li></ul>
  70. <ul><li>such usage documented in the form of “application profiles” </li></ul>
  71. <ul><li>Dublin Core Metadata Initiative </li></ul><ul><li>coordinating “standards” body </li></ul><ul><li>(note again: growing complexity over time) </li></ul>
  72. <ul><li>meanwhile… </li></ul>
  73. <ul><li>the W3C developed the Resource Description Framework (RDF) </li></ul><ul><li>(circa 1999) </li></ul>
  74. <ul><li>the standard for the “Semantic Web” </li></ul>
  75. <ul><li>Tim Berners-Lee’s vision for a machine-readable Web of data </li></ul>
  76. <ul><li>allowing software to navigate and reason about Web content automatically </li></ul>
  77. <ul><li>a Web of “Linked Data” </li></ul>
  78. <ul><li>RDF, RDFS, OWL, FOAF, … </li></ul><ul><li>(but also Microformats, RSS, …) </li></ul>
  79. <ul><li>meanwhile… </li></ul>
  80. <ul><li>elearning community was busy developing its own standards </li></ul>
  81. <ul><li>IEEE LOM </li></ul><ul><li>(Learning Object Metadata) </li></ul>
  82. <ul><li>same as DC </li></ul><ul><li>…but different! </li></ul><ul><li>(different elements, different syntax) </li></ul>
  83. <ul><li><cough /> </li></ul>
  84. <ul><li>a brief aside </li></ul>
  85. <ul><li>identifiers are important </li></ul>
  86. <ul><li>URI (Uniform Resource Identifier) is the identifier system of the Web </li></ul>
  87. <ul><li>but some issues… </li></ul><ul><li>e.g. around persistence </li></ul>
  88. <ul><li>to cut a long story short </li></ul>
  89. <ul><li>it turns out that ‘http’ URIs (a.k.a. URLs) are the worst kind of Web identifier… </li></ul>
  90. <ul><li>…apart from all the others </li></ul>
  91. <ul><li>(not everyone agrees with that!) </li></ul>
  92. <ul><li><breath /> </li></ul>
  93. <ul><li>repositories </li></ul>
  96. <ul><li>but arXiv was not the only repository </li></ul>
  97. <ul><li>recognised need for aggregating metadata from different repositories into a single place so that it could be searched </li></ul>
  98. <ul><li>OAI-PMH </li></ul><ul><li>(a protocol for metadata harvesting) </li></ul>
  99. <ul><li>harvesting metadata into repository search engines </li></ul><ul><li>of which OAIster is best known </li></ul><ul><li>(but it isn’t really used much) </li></ul>
  100. <ul><li>and the major search engines like Google don’t support the OAI-PMH </li></ul>
  101. <ul><li>because it isn’t mainstream enough </li></ul><ul><li>(and because of “metaspam” and “metacrap”) </li></ul>
  102. <ul><li>and so to 2008… </li></ul>
  103. <ul><li>political agenda around institutional repositories </li></ul>
  104. <ul><li>in order to store, manage and disclose… </li></ul>
  105. <ul><li>institutional assets </li></ul><ul><li>research papers </li></ul><ul><li>learning objects </li></ul><ul><li>research data </li></ul>
  106. <ul><li>exposing metadata about these things using the OAI-PMH to search services </li></ul>
  107. <ul><li>which services? </li></ul><ul><li>err… OAIster :-( </li></ul>
  108. <ul><li>what kind of metadata? </li></ul><ul><li>typically DC </li></ul>
  109. <ul><li>or a variant of DC </li></ul><ul><li>because simple DC is really too simple to be very useful </li></ul>
  110. <ul><li>but unfortunately… </li></ul>
  111. <ul><li>…DC variants tend to be more complex </li></ul><ul><li>and therefore metadata can’t be created very easily by ordinary researchers </li></ul>
  112. <ul><li>but that’s another story! </li></ul>
  113. <ul><li>ok… </li></ul>
  114. <ul><li>why are we interested? </li></ul>
  115. <ul><li>are we interested? </li></ul>
  116. <ul><li>metadata is everywhere </li></ul>
  123. <ul><li>metadata in sites like the Science Museum is mostly locked away within the site </li></ul>
  124. <ul><li>can expect growing pressure to expose it on the Web for others to “mash up” </li></ul>
  125. <ul><li>in HE, we are operating in an “institutional repository” political environment </li></ul>
  126. <ul><li>we (should?) have an interest in repositories of research publications and research data </li></ul><ul><li>particularly research data </li></ul>
  127. <ul><li>metadata comes to the fore in scenarios where content is non-textual (e.g. data) and where required information can’t easily be derived from textual content (e.g. author name) </li></ul>
  128. <ul><li>thank you </li></ul>
  129. <ul><li>OK I lied… </li></ul><ul><li>…it was 128 slides </li></ul>
  130. <ul><li>unless you count these last two </li></ul>