Does metadata matter?

29,476 views
27,651 views

Published on

A lunchtime seminar for Eduserv staff.

Published in: Education, Technology
4 Comments
39 Likes
Statistics
Notes
No Downloads
Views
Total views
29,476
On SlideShare
0
From Embeds
0
Number of Embeds
3,481
Actions
Shares
0
Downloads
321
Comments
4
Likes
39
Embeds 0
No embeds

No notes for slide

Does metadata matter?

  1. Does metadata matter?
  2. <ul><li>or… </li></ul><ul><li>should we be interested in metadata and, if so, why? </li></ul>
  3. <ul><li>I’m going to try to deliver 130 slides in 30 minutes </li></ul>
  4. <ul><li>then you can ask questions </li></ul><ul><li>(yes… I really did say “130 slides”) </li></ul>
  5. <ul><li>non-technical </li></ul>
  6. <ul><li>metadata jargon </li></ul>
  7. <ul><li>first, some history… </li></ul>
  8.  
  9.  
  10. <ul><li>metadata is… </li></ul>
  11. <ul><li>machine-readable </li></ul>
  12. <ul><li>descriptive </li></ul>
  13. <ul><li>for the purposes of… </li></ul>
  14. <ul><li>resource discovery </li></ul>
  15. <ul><li>resource management </li></ul>
  16. <ul><li>delivery / access control </li></ul>
  17. <ul><li>use / re-use </li></ul>
  18. <ul><li>long term preservation </li></ul>
  19.  
  20. <ul><li>MARC - Machine-Readable Catalogue </li></ul>
  21.  
  22.  
  23. <ul><li>still the predominant </li></ul><ul><li>metadata standard </li></ul><ul><li>(in the library world) </li></ul>
  24.  
  25. <ul><li>a distributed search standard called… </li></ul><ul><li>Z39.50 </li></ul><ul><li>so that multiple library catalogues can be searched from one place </li></ul>
  26. <ul><li>AACR2 currently being replaced by… </li></ul><ul><li>RDA </li></ul><ul><li>(more generic – i.e. not just books!) </li></ul>
  27. <ul><li>Z39.50 supplemented by SRW and SRU </li></ul><ul><li>(Web-friendly variants) </li></ul>
  28. <ul><li>FRBR </li></ul>
  29. <ul><li>none of which needs bother you… </li></ul>
  30. <ul><li>other than to note that… </li></ul>
  31. <ul><li>metadata tends to get more complicated the longer you think about it </li></ul>
  32. <ul><li>1994 </li></ul>
  33.  
  34.  
  35. <ul><li>a few 10s of 1000s of pages </li></ul>
  36. <ul><li>but recognised that finding stuff was going to start getting difficult </li></ul>
  37. <ul><li>people (mainly librarians) began trying to catalogue it by hand </li></ul>
  38.  
  39.  
  40. http://www.intute.ac.uk/ http://www.intute.ac.uk/
  41. <ul><li>meanwhile… </li></ul>
  42. <ul><li>AltaVista </li></ul><ul><li>(first major Web search engine – circa 1995) </li></ul>
  43. <ul><li>people began to realise that the metadata they embedded into Web pages might be important </li></ul>
  44. <ul><li>hang on… </li></ul><ul><li>did I just say “metadata embedded into Web pages”? </li></ul>
  45. <ul><li><html> </li></ul><ul><li><head> </li></ul><ul><li><title>A web page</title> </li></ul><ul><li></head> </li></ul><ul><li><body> </li></ul><ul><li>… </li></ul><ul><li></body> </li></ul><ul><li></html> </li></ul>
  46. <ul><li><html> </li></ul><ul><li><head> </li></ul><ul><li><title>A web page</title> </li></ul><ul><li><meta name=“keywords” content=“some, key, words” /> </li></ul><ul><li><meta name=“description” content=“a summary” /> </li></ul><ul><li></head> </li></ul><ul><li><body> </li></ul><ul><li>… </li></ul>
  47. <ul><li>birth of the SEO industry </li></ul>
  48. <ul><li>then came Google </li></ul>
  49. <ul><li>and the rest, as they say, is history </li></ul>
  50. <ul><li>Google takes note of links between pages </li></ul><ul><li>Google PageRank </li></ul>
  51. <ul><li>but places less emphasis on embedded metadata </li></ul>
  52. <ul><li>metaspam </li></ul><ul><li><meta name=“keywords” content=“coca cola” /> </li></ul>
  53. <ul><li>metacrap </li></ul><ul><li><title>put your title here</title> </li></ul>
  54. <ul><li>despite that, work continued on embedded metadata </li></ul><ul><li>most notably in the form of… </li></ul>
  55. <ul><li>Dublin Core </li></ul><ul><li>(circa 1995) </li></ul>
  56. <ul><li>initially 15 metadata elements </li></ul><ul><li>a.k.a properties </li></ul><ul><li>a.k.a. attribute/value pairs </li></ul>
  57. <ul><li>contributor </li></ul><ul><li>coverage </li></ul><ul><li>creator </li></ul><ul><li>date </li></ul><ul><li>description </li></ul><ul><li>format </li></ul><ul><li>identifier </li></ul><ul><li>language </li></ul><ul><li>publisher </li></ul><ul><li>relation </li></ul><ul><li>rights </li></ul><ul><li>source </li></ul><ul><li>subject </li></ul><ul><li>title </li></ul><ul><li>type </li></ul>
  58. <ul><li>embedded into Web pages </li></ul>
  59. <ul><li>or encoded using XML </li></ul>
  60. <ul><li>intention was to improve indexing by search engines </li></ul>
  61. <ul><li>but people forgot about… </li></ul>
  62. <ul><li>“ metaspam” and “metacrap” </li></ul>
  63. <ul><li>the search engines didn’t! </li></ul>
  64. <ul><li>and so, by and large, </li></ul><ul><li>search engines still ignore embedded metadata </li></ul>
  65. <ul><li>despite that, there has been fairly widespread adoption in policy terms </li></ul>
  66. <ul><li>particularly in e-Government </li></ul><ul><li>(e.g. UK eGMS) </li></ul>
  67. <ul><li>but also in other areas – education, health, environmental agencies, libraries, cultural heritage sector, … </li></ul>
  68. <ul><li>growth of rules around metadata content </li></ul><ul><li>(i.e. cataloguing rules) </li></ul><ul><li>(everyone’s rules are different) </li></ul>
  69. <ul><li>and growth in use of additional elements for particular communities </li></ul><ul><li>(and everyone’s additions are different) </li></ul>
  70. <ul><li>such usage documented in the form of “application profiles” </li></ul>
  71. <ul><li>Dublin Core Metadata Initiative </li></ul><ul><li>coordinating “standards” body </li></ul><ul><li>(note again: growing complexity over time) </li></ul>
  72. <ul><li>meanwhile… </li></ul>
  73. <ul><li>the W3C developed the Resource Description Framework (RDF) </li></ul><ul><li>(circa 1999) </li></ul>
  74. <ul><li>the standard for the “Semantic Web” </li></ul>
  75. <ul><li>Tim Berners-Lee’s vision for a machine-readable Web of data </li></ul>
  76. <ul><li>allowing software to navigate and reason about Web content automatically </li></ul>
  77. <ul><li>a Web of “Linked Data” </li></ul>
  78. <ul><li>RDF, RDFS, OWL, FOAF, … </li></ul><ul><li>(but also Microformats, RSS, …) </li></ul>
  79. <ul><li>meanwhile… </li></ul>
  80. <ul><li>elearning community was busy developing its own standards </li></ul>
  81. <ul><li>IEEE LOM </li></ul><ul><li>(Learning Object Metadata) </li></ul>
  82. <ul><li>same as DC </li></ul><ul><li>…but different! </li></ul><ul><li>(different elements, different syntax) </li></ul>
  83. <ul><li><cough /> </li></ul>
  84. <ul><li>a brief aside </li></ul>
  85. <ul><li>identifiers are important </li></ul>
  86. <ul><li>URI (Uniform Resource Identifier) is the identifier system of the Web </li></ul>
  87. <ul><li>but some issues… </li></ul><ul><li>e.g. around persistence </li></ul>
  88. <ul><li>to cut a long story short </li></ul>
  89. <ul><li>it turns out that ‘http’ URIs (a.k.a. URLs) are the worst kind of Web identifier… </li></ul>
  90. <ul><li>…apart from all the others </li></ul>
  91. <ul><li>(not everyone agrees with that!) </li></ul>
  92. <ul><li><breath /> </li></ul>
  93. <ul><li>repositories </li></ul>
  94.  
  95.  
  96. <ul><li>but arXiv was not the only repository </li></ul>
  97. <ul><li>recognised need for aggregating metadata from different repositories into a single place so that it could be searched </li></ul>
  98. <ul><li>OAI-PMH </li></ul><ul><li>(a protocol for metadata harvesting) </li></ul>
  99. <ul><li>harvesting metadata into repository search engines </li></ul><ul><li>of which OAIster is best known </li></ul><ul><li>(but it isn’t really used much) </li></ul>
  100. <ul><li>and the major search engines like Google don’t support the OAI-PMH </li></ul>
  101. <ul><li>because it isn’t mainstream enough </li></ul><ul><li>(and because of “metaspam” and “metacrap”) </li></ul>
  102. <ul><li>and so to 2008… </li></ul>
  103. <ul><li>political agenda around institutional repositories </li></ul>
  104. <ul><li>in order to store, manage and disclose… </li></ul>
  105. <ul><li>institutional assets </li></ul><ul><li>research papers </li></ul><ul><li>learning objects </li></ul><ul><li>research data </li></ul>
  106. <ul><li>exposing metadata about these things using the OAI-PMH to search services </li></ul>
  107. <ul><li>which services? </li></ul><ul><li>err… OAIster :-( </li></ul>
  108. <ul><li>what kind of metadata? </li></ul><ul><li>typically DC </li></ul>
  109. <ul><li>or a variant of DC </li></ul><ul><li>because simple DC is really too simple to be very useful </li></ul>
  110. <ul><li>but unfortunately… </li></ul>
  111. <ul><li>…DC variants tend to be more complex </li></ul><ul><li>and therefore metadata can’t be created very easily by ordinary researchers </li></ul>
  112. <ul><li>but that’s another story! </li></ul>
  113. <ul><li>ok… </li></ul>
  114. <ul><li>why are we interested? </li></ul>
  115. <ul><li>are we interested? </li></ul>
  116. <ul><li>metadata is everywhere </li></ul>
  117.  
  118.  
  119.  
  120.  
  121.  
  122.  
  123. <ul><li>metadata in sites like the Science Museum is mostly locked away within the site </li></ul>
  124. <ul><li>can expect growing pressure to expose it on the Web for others to “mash up” </li></ul>
  125. <ul><li>in HE, we are operating in an “institutional repository” political environment </li></ul>
  126. <ul><li>we (should?) have an interest in repositories of research publications and research data </li></ul><ul><li>particularly research data </li></ul>
  127. <ul><li>metadata comes to the fore in scenarios where content is non-textual (e.g. data) and where required information can’t easily be derived from textual content (e.g. author name) </li></ul>
  128. <ul><li>thank you </li></ul>
  129. <ul><li>OK I lied… </li></ul><ul><li>…it was 128 slides </li></ul>
  130. <ul><li>unless you count these last two </li></ul>

×