Does metadata matter?
<ul><li>or… </li></ul><ul><li>should we be interested in metadata and, if so, why? </li></ul>
<ul><li>I’m going to try to deliver 130 slides in 30 minutes </li></ul>
<ul><li>then you can ask questions </li></ul><ul><li>(yes… I really did say “130 slides”) </li></ul>
<ul><li>non-technical </li></ul>
<ul><li>metadata jargon </li></ul>
<ul><li>first, some history… </li></ul>
 
 
<ul><li>metadata is… </li></ul>
<ul><li>machine-readable </li></ul>
<ul><li>descriptive </li></ul>
<ul><li>for the purposes of… </li></ul>
<ul><li>resource discovery </li></ul>
<ul><li>resource management </li></ul>
<ul><li>delivery / access control </li></ul>
<ul><li>use / re-use </li></ul>
<ul><li>long term preservation </li></ul>
 
<ul><li>MARC - Machine-Readable Catalogue  </li></ul>
 
 
<ul><li>still the predominant </li></ul><ul><li>metadata standard </li></ul><ul><li>(in the library world) </li></ul>
 
<ul><li>a distributed search standard called… </li></ul><ul><li>Z39.50 </li></ul><ul><li>so that multiple library catalogu...
<ul><li>AACR2 currently being replaced by… </li></ul><ul><li>RDA </li></ul><ul><li>(more generic – i.e. not just books!) <...
<ul><li>Z39.50 supplemented by SRW and SRU </li></ul><ul><li>(Web-friendly variants) </li></ul>
<ul><li>FRBR </li></ul>
<ul><li>none of which needs bother you… </li></ul>
<ul><li>other than to note that… </li></ul>
<ul><li>metadata tends to get more complicated the longer you think about it </li></ul>
<ul><li>1994 </li></ul>
 
 
<ul><li>a few 10s of 1000s of pages </li></ul>
<ul><li>but recognised that finding stuff was going to start getting difficult </li></ul>
<ul><li>people (mainly librarians) began trying to catalogue it by hand </li></ul>
 
 
http://www.intute.ac.uk/ http://www.intute.ac.uk/
<ul><li>meanwhile… </li></ul>
<ul><li>AltaVista </li></ul><ul><li>(first major Web search engine – circa 1995) </li></ul>
<ul><li>people began to realise that the metadata they embedded into Web pages might be important </li></ul>
<ul><li>hang on… </li></ul><ul><li>did I just say “metadata embedded into Web pages”? </li></ul>
<ul><li><html> </li></ul><ul><li><head> </li></ul><ul><li><title>A web page</title> </li></ul><ul><li></head> </li></ul><u...
<ul><li><html> </li></ul><ul><li><head> </li></ul><ul><li><title>A web page</title> </li></ul><ul><li><meta name=“keywords...
<ul><li>birth of the SEO industry </li></ul>
<ul><li>then came Google </li></ul>
<ul><li>and the rest, as they say, is history </li></ul>
<ul><li>Google takes note of links between pages </li></ul><ul><li>Google PageRank </li></ul>
<ul><li>but places less emphasis on embedded metadata </li></ul>
<ul><li>metaspam </li></ul><ul><li><meta name=“keywords” content=“coca cola” /> </li></ul>
<ul><li>metacrap </li></ul><ul><li><title>put your title here</title> </li></ul>
<ul><li>despite that, work continued on embedded metadata </li></ul><ul><li>most notably in the form of… </li></ul>
<ul><li>Dublin Core </li></ul><ul><li>(circa 1995) </li></ul>
<ul><li>initially 15 metadata elements </li></ul><ul><li>a.k.a properties </li></ul><ul><li>a.k.a. attribute/value pairs <...
<ul><li>contributor </li></ul><ul><li>coverage </li></ul><ul><li>creator </li></ul><ul><li>date </li></ul><ul><li>descript...
<ul><li>embedded into Web pages </li></ul>
<ul><li>or encoded using XML </li></ul>
<ul><li>intention was to improve indexing by search engines </li></ul>
<ul><li>but people forgot about… </li></ul>
<ul><li>“ metaspam” and “metacrap” </li></ul>
<ul><li>the search engines didn’t! </li></ul>
<ul><li>and so, by and large, </li></ul><ul><li>search engines still ignore embedded metadata </li></ul>
<ul><li>despite that, there has been fairly widespread adoption in policy terms </li></ul>
<ul><li>particularly in e-Government </li></ul><ul><li>(e.g. UK eGMS) </li></ul>
<ul><li>but also in other areas – education, health, environmental agencies, libraries, cultural heritage sector, … </li><...
<ul><li>growth of rules around metadata content </li></ul><ul><li>(i.e. cataloguing rules) </li></ul><ul><li>(everyone’s r...
<ul><li>and growth in use of additional elements for particular communities </li></ul><ul><li>(and everyone’s additions ar...
<ul><li>such usage documented in the form of  “application profiles” </li></ul>
<ul><li>Dublin Core Metadata Initiative </li></ul><ul><li>coordinating “standards” body </li></ul><ul><li>(note again: gro...
<ul><li>meanwhile… </li></ul>
<ul><li>the W3C developed the Resource Description Framework (RDF) </li></ul><ul><li>(circa 1999) </li></ul>
<ul><li>the standard for the “Semantic Web” </li></ul>
<ul><li>Tim Berners-Lee’s vision for a machine-readable Web of data </li></ul>
<ul><li>allowing software to navigate and reason about Web content automatically </li></ul>
<ul><li>a Web of “Linked Data” </li></ul>
<ul><li>RDF, RDFS, OWL, FOAF, … </li></ul><ul><li>(but also Microformats, RSS, …) </li></ul>
<ul><li>meanwhile… </li></ul>
<ul><li>elearning community was busy developing its own standards </li></ul>
<ul><li>IEEE LOM </li></ul><ul><li>(Learning Object Metadata) </li></ul>
<ul><li>same as DC </li></ul><ul><li>…but different! </li></ul><ul><li>(different elements, different syntax) </li></ul>
<ul><li><cough /> </li></ul>
<ul><li>a brief aside </li></ul>
<ul><li>identifiers are important </li></ul>
<ul><li>URI (Uniform Resource Identifier) is the identifier system of the Web </li></ul>
<ul><li>but some issues… </li></ul><ul><li>e.g. around persistence </li></ul>
<ul><li>to cut a long story short </li></ul>
<ul><li>it turns out that ‘http’ URIs (a.k.a. URLs) are the worst kind of Web identifier… </li></ul>
<ul><li>…apart from all the others </li></ul>
<ul><li>(not everyone agrees with that!) </li></ul>
<ul><li><breath /> </li></ul>
<ul><li>repositories </li></ul>
 
 
<ul><li>but arXiv was not the only repository </li></ul>
<ul><li>recognised need for aggregating metadata from different repositories into a single place so that it could be searc...
<ul><li>OAI-PMH </li></ul><ul><li>(a protocol for metadata harvesting) </li></ul>
<ul><li>harvesting metadata into repository search engines </li></ul><ul><li>of which OAIster is best known </li></ul><ul>...
<ul><li>and the major search engines like Google don’t support the OAI-PMH </li></ul>
<ul><li>because it isn’t mainstream enough </li></ul><ul><li>(and because of “metaspam” and “metacrap”) </li></ul>
<ul><li>and so to 2008… </li></ul>
<ul><li>political agenda around institutional repositories </li></ul>
<ul><li>in order to store, manage and disclose… </li></ul>
<ul><li>institutional assets </li></ul><ul><li>research papers </li></ul><ul><li>learning objects </li></ul><ul><li>resear...
<ul><li>exposing metadata about these things using the OAI-PMH to search services </li></ul>
<ul><li>which services? </li></ul><ul><li>err… OAIster :-( </li></ul>
<ul><li>what kind of metadata? </li></ul><ul><li>typically DC </li></ul>
<ul><li>or a variant of DC </li></ul><ul><li>because simple DC is really too simple to be very useful </li></ul>
<ul><li>but unfortunately… </li></ul>
<ul><li>…DC variants tend to be more complex </li></ul><ul><li>and therefore metadata can’t be created very easily by ordi...
<ul><li>but that’s another story! </li></ul>
<ul><li>ok… </li></ul>
<ul><li>why are we interested? </li></ul>
<ul><li>are we interested? </li></ul>
<ul><li>metadata is everywhere </li></ul>
 
 
 
 
 
 
<ul><li>metadata in sites like the Science Museum is mostly locked away within the site </li></ul>
<ul><li>can expect growing pressure to expose it on the Web for others to “mash up” </li></ul>
<ul><li>in HE, we are operating in an “institutional repository” political environment </li></ul>
<ul><li>we (should?) have an interest in repositories of research publications and research data </li></ul><ul><li>particu...
<ul><li>metadata comes to the fore in scenarios where content is non-textual (e.g. data) and where required information ca...
<ul><li>thank you </li></ul>
<ul><li>OK I lied… </li></ul><ul><li>…it was 128 slides </li></ul>
<ul><li>unless you count these last two </li></ul>
Upcoming SlideShare
Loading in...5
×

Does metadata matter?

19,627

Published on

A lunchtime seminar for Eduserv staff.

Published in: Education, Technology
4 Comments
37 Likes
Statistics
Notes
No Downloads
Views
Total Views
19,627
On Slideshare
0
From Embeds
0
Number of Embeds
24
Actions
Shares
0
Downloads
312
Comments
4
Likes
37
Embeds 0
No embeds

No notes for slide

Does metadata matter?

  1. 1. Does metadata matter?
  2. 2. <ul><li>or… </li></ul><ul><li>should we be interested in metadata and, if so, why? </li></ul>
  3. 3. <ul><li>I’m going to try to deliver 130 slides in 30 minutes </li></ul>
  4. 4. <ul><li>then you can ask questions </li></ul><ul><li>(yes… I really did say “130 slides”) </li></ul>
  5. 5. <ul><li>non-technical </li></ul>
  6. 6. <ul><li>metadata jargon </li></ul>
  7. 7. <ul><li>first, some history… </li></ul>
  8. 10. <ul><li>metadata is… </li></ul>
  9. 11. <ul><li>machine-readable </li></ul>
  10. 12. <ul><li>descriptive </li></ul>
  11. 13. <ul><li>for the purposes of… </li></ul>
  12. 14. <ul><li>resource discovery </li></ul>
  13. 15. <ul><li>resource management </li></ul>
  14. 16. <ul><li>delivery / access control </li></ul>
  15. 17. <ul><li>use / re-use </li></ul>
  16. 18. <ul><li>long term preservation </li></ul>
  17. 20. <ul><li>MARC - Machine-Readable Catalogue </li></ul>
  18. 23. <ul><li>still the predominant </li></ul><ul><li>metadata standard </li></ul><ul><li>(in the library world) </li></ul>
  19. 25. <ul><li>a distributed search standard called… </li></ul><ul><li>Z39.50 </li></ul><ul><li>so that multiple library catalogues can be searched from one place </li></ul>
  20. 26. <ul><li>AACR2 currently being replaced by… </li></ul><ul><li>RDA </li></ul><ul><li>(more generic – i.e. not just books!) </li></ul>
  21. 27. <ul><li>Z39.50 supplemented by SRW and SRU </li></ul><ul><li>(Web-friendly variants) </li></ul>
  22. 28. <ul><li>FRBR </li></ul>
  23. 29. <ul><li>none of which needs bother you… </li></ul>
  24. 30. <ul><li>other than to note that… </li></ul>
  25. 31. <ul><li>metadata tends to get more complicated the longer you think about it </li></ul>
  26. 32. <ul><li>1994 </li></ul>
  27. 35. <ul><li>a few 10s of 1000s of pages </li></ul>
  28. 36. <ul><li>but recognised that finding stuff was going to start getting difficult </li></ul>
  29. 37. <ul><li>people (mainly librarians) began trying to catalogue it by hand </li></ul>
  30. 40. http://www.intute.ac.uk/ http://www.intute.ac.uk/
  31. 41. <ul><li>meanwhile… </li></ul>
  32. 42. <ul><li>AltaVista </li></ul><ul><li>(first major Web search engine – circa 1995) </li></ul>
  33. 43. <ul><li>people began to realise that the metadata they embedded into Web pages might be important </li></ul>
  34. 44. <ul><li>hang on… </li></ul><ul><li>did I just say “metadata embedded into Web pages”? </li></ul>
  35. 45. <ul><li><html> </li></ul><ul><li><head> </li></ul><ul><li><title>A web page</title> </li></ul><ul><li></head> </li></ul><ul><li><body> </li></ul><ul><li>… </li></ul><ul><li></body> </li></ul><ul><li></html> </li></ul>
  36. 46. <ul><li><html> </li></ul><ul><li><head> </li></ul><ul><li><title>A web page</title> </li></ul><ul><li><meta name=“keywords” content=“some, key, words” /> </li></ul><ul><li><meta name=“description” content=“a summary” /> </li></ul><ul><li></head> </li></ul><ul><li><body> </li></ul><ul><li>… </li></ul>
  37. 47. <ul><li>birth of the SEO industry </li></ul>
  38. 48. <ul><li>then came Google </li></ul>
  39. 49. <ul><li>and the rest, as they say, is history </li></ul>
  40. 50. <ul><li>Google takes note of links between pages </li></ul><ul><li>Google PageRank </li></ul>
  41. 51. <ul><li>but places less emphasis on embedded metadata </li></ul>
  42. 52. <ul><li>metaspam </li></ul><ul><li><meta name=“keywords” content=“coca cola” /> </li></ul>
  43. 53. <ul><li>metacrap </li></ul><ul><li><title>put your title here</title> </li></ul>
  44. 54. <ul><li>despite that, work continued on embedded metadata </li></ul><ul><li>most notably in the form of… </li></ul>
  45. 55. <ul><li>Dublin Core </li></ul><ul><li>(circa 1995) </li></ul>
  46. 56. <ul><li>initially 15 metadata elements </li></ul><ul><li>a.k.a properties </li></ul><ul><li>a.k.a. attribute/value pairs </li></ul>
  47. 57. <ul><li>contributor </li></ul><ul><li>coverage </li></ul><ul><li>creator </li></ul><ul><li>date </li></ul><ul><li>description </li></ul><ul><li>format </li></ul><ul><li>identifier </li></ul><ul><li>language </li></ul><ul><li>publisher </li></ul><ul><li>relation </li></ul><ul><li>rights </li></ul><ul><li>source </li></ul><ul><li>subject </li></ul><ul><li>title </li></ul><ul><li>type </li></ul>
  48. 58. <ul><li>embedded into Web pages </li></ul>
  49. 59. <ul><li>or encoded using XML </li></ul>
  50. 60. <ul><li>intention was to improve indexing by search engines </li></ul>
  51. 61. <ul><li>but people forgot about… </li></ul>
  52. 62. <ul><li>“ metaspam” and “metacrap” </li></ul>
  53. 63. <ul><li>the search engines didn’t! </li></ul>
  54. 64. <ul><li>and so, by and large, </li></ul><ul><li>search engines still ignore embedded metadata </li></ul>
  55. 65. <ul><li>despite that, there has been fairly widespread adoption in policy terms </li></ul>
  56. 66. <ul><li>particularly in e-Government </li></ul><ul><li>(e.g. UK eGMS) </li></ul>
  57. 67. <ul><li>but also in other areas – education, health, environmental agencies, libraries, cultural heritage sector, … </li></ul>
  58. 68. <ul><li>growth of rules around metadata content </li></ul><ul><li>(i.e. cataloguing rules) </li></ul><ul><li>(everyone’s rules are different) </li></ul>
  59. 69. <ul><li>and growth in use of additional elements for particular communities </li></ul><ul><li>(and everyone’s additions are different) </li></ul>
  60. 70. <ul><li>such usage documented in the form of “application profiles” </li></ul>
  61. 71. <ul><li>Dublin Core Metadata Initiative </li></ul><ul><li>coordinating “standards” body </li></ul><ul><li>(note again: growing complexity over time) </li></ul>
  62. 72. <ul><li>meanwhile… </li></ul>
  63. 73. <ul><li>the W3C developed the Resource Description Framework (RDF) </li></ul><ul><li>(circa 1999) </li></ul>
  64. 74. <ul><li>the standard for the “Semantic Web” </li></ul>
  65. 75. <ul><li>Tim Berners-Lee’s vision for a machine-readable Web of data </li></ul>
  66. 76. <ul><li>allowing software to navigate and reason about Web content automatically </li></ul>
  67. 77. <ul><li>a Web of “Linked Data” </li></ul>
  68. 78. <ul><li>RDF, RDFS, OWL, FOAF, … </li></ul><ul><li>(but also Microformats, RSS, …) </li></ul>
  69. 79. <ul><li>meanwhile… </li></ul>
  70. 80. <ul><li>elearning community was busy developing its own standards </li></ul>
  71. 81. <ul><li>IEEE LOM </li></ul><ul><li>(Learning Object Metadata) </li></ul>
  72. 82. <ul><li>same as DC </li></ul><ul><li>…but different! </li></ul><ul><li>(different elements, different syntax) </li></ul>
  73. 83. <ul><li><cough /> </li></ul>
  74. 84. <ul><li>a brief aside </li></ul>
  75. 85. <ul><li>identifiers are important </li></ul>
  76. 86. <ul><li>URI (Uniform Resource Identifier) is the identifier system of the Web </li></ul>
  77. 87. <ul><li>but some issues… </li></ul><ul><li>e.g. around persistence </li></ul>
  78. 88. <ul><li>to cut a long story short </li></ul>
  79. 89. <ul><li>it turns out that ‘http’ URIs (a.k.a. URLs) are the worst kind of Web identifier… </li></ul>
  80. 90. <ul><li>…apart from all the others </li></ul>
  81. 91. <ul><li>(not everyone agrees with that!) </li></ul>
  82. 92. <ul><li><breath /> </li></ul>
  83. 93. <ul><li>repositories </li></ul>
  84. 96. <ul><li>but arXiv was not the only repository </li></ul>
  85. 97. <ul><li>recognised need for aggregating metadata from different repositories into a single place so that it could be searched </li></ul>
  86. 98. <ul><li>OAI-PMH </li></ul><ul><li>(a protocol for metadata harvesting) </li></ul>
  87. 99. <ul><li>harvesting metadata into repository search engines </li></ul><ul><li>of which OAIster is best known </li></ul><ul><li>(but it isn’t really used much) </li></ul>
  88. 100. <ul><li>and the major search engines like Google don’t support the OAI-PMH </li></ul>
  89. 101. <ul><li>because it isn’t mainstream enough </li></ul><ul><li>(and because of “metaspam” and “metacrap”) </li></ul>
  90. 102. <ul><li>and so to 2008… </li></ul>
  91. 103. <ul><li>political agenda around institutional repositories </li></ul>
  92. 104. <ul><li>in order to store, manage and disclose… </li></ul>
  93. 105. <ul><li>institutional assets </li></ul><ul><li>research papers </li></ul><ul><li>learning objects </li></ul><ul><li>research data </li></ul>
  94. 106. <ul><li>exposing metadata about these things using the OAI-PMH to search services </li></ul>
  95. 107. <ul><li>which services? </li></ul><ul><li>err… OAIster :-( </li></ul>
  96. 108. <ul><li>what kind of metadata? </li></ul><ul><li>typically DC </li></ul>
  97. 109. <ul><li>or a variant of DC </li></ul><ul><li>because simple DC is really too simple to be very useful </li></ul>
  98. 110. <ul><li>but unfortunately… </li></ul>
  99. 111. <ul><li>…DC variants tend to be more complex </li></ul><ul><li>and therefore metadata can’t be created very easily by ordinary researchers </li></ul>
  100. 112. <ul><li>but that’s another story! </li></ul>
  101. 113. <ul><li>ok… </li></ul>
  102. 114. <ul><li>why are we interested? </li></ul>
  103. 115. <ul><li>are we interested? </li></ul>
  104. 116. <ul><li>metadata is everywhere </li></ul>
  105. 123. <ul><li>metadata in sites like the Science Museum is mostly locked away within the site </li></ul>
  106. 124. <ul><li>can expect growing pressure to expose it on the Web for others to “mash up” </li></ul>
  107. 125. <ul><li>in HE, we are operating in an “institutional repository” political environment </li></ul>
  108. 126. <ul><li>we (should?) have an interest in repositories of research publications and research data </li></ul><ul><li>particularly research data </li></ul>
  109. 127. <ul><li>metadata comes to the fore in scenarios where content is non-textual (e.g. data) and where required information can’t easily be derived from textual content (e.g. author name) </li></ul>
  110. 128. <ul><li>thank you </li></ul>
  111. 129. <ul><li>OK I lied… </li></ul><ul><li>…it was 128 slides </li></ul>
  112. 130. <ul><li>unless you count these last two </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×