Metadata and ontologies


Published on

Slides from the Introduction and Theoretical Foundations of New Media course of the Interactive Media and Knowledge Environments master program (Tallinn University).

Published in: Education, Technology

Metadata and ontologies

  1. 1. Introduction and Theoretical Foundations of New Media<br />Metadata and Ontologies<br />..<br />
  2. 2. Contents<br />Metadata<br />Ontologies<br />Folksonomies<br />The sematic web<br />The internet of things<br />David Lamas, TLU, 2011<br />2<br />
  3. 3. Metadata<br />David Lamas, TLU, 2011<br />3<br />
  4. 4. Metadata<br />So, why is metadata relevant?<br />Or… why should we care about metadata?<br />David Lamas, TLU, 2011<br />4<br />
  5. 5. Metadata<br />As a concept, is not new<br />Metadata has long been for managing document collections such as the ones kept by libraries<br />But the term itself, was only coined in 1968<br />By Philip Bagley, a pioneer of computerized document retrieval<br />David Lamas, TLU, 2011<br />5<br />
  6. 6. Metadata<br />Literally, a set of data that describes and gives information about other data, metadata in our context is:<br />Machine readable<br />Descriptive<br />For the purposes of resource…<br /> Discovery<br /> Management<br />Delivery<br />Access control<br />Use<br /> Re-use<br />Long term preservation<br />David Lamas, TLU, 2011<br />6<br />
  7. 7. Metadata<br />Or in other words, metadata allows for the description of the…<br />Definition<br />Structure; and<br />Administration<br />of selected resources with all contents in context to ease the further use of the resource<br />David Lamas, TLU, 2011<br />7<br />
  8. 8. MARC<br />Or… Machine Readable Catalogue<br />Is still the main metadata standard in the library world although it is not a full cataloguing scheme being <br />David Lamas, TLU, 2011<br />8<br />
  9. 9. UDC, AARC2 and RDA<br />Universal Decimal Classification<br />A multilingual classification scheme for all fields of knowledge<br /> Available at…<br />Anglo-American Cataloguing Rules<br />For use in the construction of catalogues<br />Available at…<br /><br />Resource description and access<br />Available at…<br /><br />David Lamas, TLU, 2011<br />9<br />
  10. 10. Z39.50, SRW and SRU<br />Z39.50<br />is a client–server protocol for searching and retrieving information widely used in library environments<br />Search & Retrieve Web Service<br />A intended standard web-based text-searching interface<br />Search/Retrieval via URL<br />Astandard XML-focused search protocol for Internet search queries, which uses the Contextual Query Language<br />David Lamas, TLU, 2011<br />10<br />
  11. 11. But…<br />This should not bother you other than to note that…<br />Metadata tends to get more complicated the longer you think about it<br />David Lamas, TLU, 2011<br />11<br />
  12. 12. As for the web…<br />It was early recognized that finding what you need was going to start getting difficult<br />We’re talking about the mid nineties when the web’s size was referred to in terms of tens of thousands<br />Users, mainly information sciences specialists, begun trying to catalogue it by hand<br />Do you remember Yahoo’s earlier versions?<br />David Lamas, TLU, 2011<br />12<br />
  13. 13. As for the web…<br />The first search engines appeared and authors begun to realize that the metadata they embedded into web pages might be important<br /><html><br /><head><br /><title>A web page</title><br /><meta name=“keywords” content=“some, key, words” /><br /><meta name=“description” content=“a summary” /><br /></head><br /><body><br />…<br />David Lamas, TLU, 2011<br />13<br />
  14. 14. As for the web…<br />Then came Google<br />And metadata lost some relevance as Google’s PageRank algorithm takes note of links between pages but places less emphasis on embedded metadata to avoid…<br />Metaspam<br /><meta name=“description” content=“a summary” /><br />Metacrap<br /><title>put your title here</title><br />David Lamas, TLU, 2011<br />14<br />
  15. 15. Dublin Core<br />Despite the initial drawbacks, work continued on embedded metadata and the Dublin Core was and still is one of the main players with its 15 elements…<br />Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, Rights<br />…embedded into web pages or encoded using XML<br />The initial intention was to improve indexing by search engines<br />But whereas its promoters forgot about metaspam and metacrap, the search engines didn’t<br />And so, main search engines still ignore embedded metadata<br />David Lamas, TLU, 2011<br />15<br />
  16. 16. Dublin Core<br />David Lamas, TLU, 2011<br />16<br />
  17. 17. Metadata<br />Remarkably, there has been fairly widespread adoption of metadata principles, specially in policy terms, namely in government<br />(look into and interesting example)<br />And in:<br />Education<br />Health<br />Cultural heritage<br />Environmental agencies, and…<br />Libraries, of course<br />David Lamas, TLU, 2011<br />17<br />
  18. 18. Metadata<br />This resulted in the…<br /> Growth of metadata cataloguing rules<br />(although every community has its own rules)<br />Growth in use of additional elements for particular communities<br />(and again, every community’s additions are different)<br />Adoption of application profiles to document the distinct cataloguing rules and additions<br />Institution of the Dublin Core Metadata Initiative as<br />an organization engaged in the development of interoperable metadata standards that support a broad range of purposes and business models<br />David Lamas, TLU, 2011<br />18<br />
  19. 19. Metadata<br />But the Dublin Core isn’t alone, far from it<br />Many other standards were and are being developed such as these, just to name two:<br />RDF (Resource Description Framework)<br />LOM (Learning Object Metadata)<br />David Lamas, TLU, 2011<br />19<br />
  20. 20. Resource Description Framework<br />The resource description framework was developed by the W3C, the RDF is the envisioned standard for the semantic web<br />Its goal is to allow software to automatically navigate and reason about web content thus enabling…<br /> A web of (linked) data<br />David Lamas, TLU, 2011<br />20<br />
  21. 21. Resource Description Framework<br />David Lamas, TLU, 2011<br />21<br />
  22. 22. Learning Object Metadata<br />Learning Object Metadata is a data model<br />Usually encoded in XML, it is used to describe learning objects and similar digital resources used to support learning.<br />David Lamas, TLU, 2011<br />22<br />
  23. 23. Learning Object Metadata<br />David Lamas, TLU, 2011<br />23<br />
  24. 24. Metadata<br />As said in the beginning…<br />Metadata tends to get more complicated the longer we think about it<br />The current metadata efforts lack of within standards and within communities coherence and cohesion are a good example<br /> And that is why we will next look into Ontologies<br />So… do we care about metadata?<br />Why are we interested?<br />David Lamas, TLU, 2011<br />24<br />
  25. 25. Metadata<br />I guess the answer is yes, we care.<br />And yes, we are interested, because metadata is everywhere<br /> Sometimes it is explicitly available,<br />Other times it is hidden or not so readily available, butanyway…<br />It would be foolish not to make use of it<br />David Lamas, TLU, 2011<br />25<br />
  26. 26. Metadata<br />Further, there is increasing pressure to expose metadata on the web for other to mash up and this is specially true today in settingssuch as…<br />Education;<br />Research; and<br />Government<br />And finally, metadata becomes paramount in scenarios where<br />content is data; or<br />the required information can not easily derived from content<br />David Lamas, TLU, 2011<br />26<br />
  27. 27. Ontologies<br />David Lamas, TLU, 2011<br />27<br />
  28. 28. Ontologies<br />One way of dealing with the lack of within standards and within communities coherence and cohesion of current metadata efforts is to evolve to an ontology-base metadata approach<br />But what does this means?<br />David Lamas, TLU, 2011<br />28<br />
  29. 29. Ontologies<br />An ontology is a logical theory which gives an explicit partial account of a conceptualization<br />An intentional semantic structure which encodes the implicit rules constraining the structure of a piece of reality<br />In this light, the aim of an ontology is to define which primitives, provided with their associated semantics, are necessary for knowledge representation in a given context<br />David Lamas, TLU, 2011<br />Thomas R. Gruber (1993). Toward principles for the design of ontologies used for knowledge sharing. Originally in N. Guarino and R. Poli, (Eds.), International Workshop on Formal Ontology, Padova, Italy. Revised August 1993. Published in International Journal of Human-Computer Studies, Volume 43 , Issue 5-6 Nov./Dec. 1995, Pages: 907-928, special issue on the role of formal ontology in the information technology. <br />
  30. 30. Ontologies<br />Ontologies are usually characterized by their…<br />Coverage<br />The extent to which the primitives mobilized by the perceived usage scenarios are covered by the ontology<br />Specificity<br />The extent to which ontological primitives are precisely identified<br />Granularity<br />The extent to which primitives are precisely and formally defined<br />Formality<br />The extent to which primitives are described in a formal language<br />David Lamas, TLU, 2011<br />30<br />
  31. 31. Ontologies<br />And ontologies are not… taxonomies<br />But taxonomy might be perceived as a specific case of an ontology<br />A taxonomy is a particular classification arranged in a hierarchical structure<br />Typically it is organized by supertype/subtype relationships also called generalization/specialization relationships<br />David Lamas, TLU, 2011<br />31<br />
  32. 32. Why ontologies?<br />David Lamas, TLU, 2011<br />32<br />Pipe<br />
  33. 33. Why ontologies?<br />David Lamas, TLU, 2011<br />33<br />Pipe<br />
  34. 34. Why ontologies?<br />David Lamas, TLU, 2011<br />34<br />Pipe<br />
  35. 35. Why ontologies?<br />In short, we interpret, machines don’t<br />As such, an effort must be undertaken in order to support adequate usage of digital resources<br />So, what’s missing?<br />Among other…<br /> The possibility to share a common understanding of the structure of information within a specific domain<br /> The possibility to reuse domain knowledge<br /> The possibility to make domain assumptions explicit<br />The possibility to analyze domain knowledge<br />David Lamas, TLU, 2011<br />35<br />
  36. 36. Ontologies and the web<br />It is estimated that by 2010…<br />70% of public web pages will have some level of metadata, but only<br />20% will use more extensive semantic web approaches such as ontology-based metadata<br />But why should we care?<br />David Lamas, TLU, 2011<br />36<br /><br />
  37. 37. Ontologies and the web<br />An emerging ontological approach is OWL or…<br />Web Ontology Language<br />A vocabulary extension of the Resource Description Framework, which adds more vocabulary for describing characteristics of properties and classes or relations between classes<br />David Lamas, TLU, 2011<br />37<br />
  38. 38. Web Ontology Language<br />OWL enables ontology-based information sharing and manipulation together with RDF and XML<br />In reverse order…<br />XML allows users to add arbitrary structure to their docuemnts but says nothing about what such structures mean<br />RDF enables expression of meaning over XML (and other) structures<br />Using subject, verb and object triples<br />OWL enables machines to comprehend semantic documents and data<br />David Lamas, TLU, 2011<br />38<br />
  39. 39. Web Ontology Language<br />David Lamas, TLU, 2011<br />39<br /><br />
  40. 40. Ontologies<br />This said and while addressing some of the current metadata efforts weaknesses, present-day ontologies still largely depend on explicit human intervention to be useful<br />And that is why we will next look into folksonomies<br />David Lamas, TLU, 2011<br />40<br />
  41. 41. Folksonomies<br />David Lamas, TLU, 2011<br />
  42. 42. Folksonomies<br />Are mainly a bottom-up social classification system<br />A way to organize and share contents by tagging resources<br />Synonyms are…<br />Ethno-classification; and<br />Collaborative tagging<br />David Lamas, TLU, 2011<br />
  43. 43. Folksonomies<br />Folksonomies are created by users and have…<br />No structure<br />No fixed vocabulary<br />No explicit relationships between terms, and<br />No authority<br />David Lamas, TLU, 2011<br />43<br />
  44. 44. Folksonomies<br />Folksonomies also are…<br />Distributed, and<br />Collaboratively built and maintained<br />You can tag items owned by others<br />You can get instant feedback<br /> All items for the same tag<br /> All tags for the same item<br />You can a adapt your tags to the group norm<br /> But you are never forced<br />David Lamas, TLU, 2011<br />44<br />
  45. 45. Folksonomies<br />Some of their apparent benefits are…<br />Being cheap and easy to build and use<br />Being capable to adapt very quickly to changes and users needs<br />They scale well<br />Foster serendipity<br />Semantic browsing instead of searching<br />Lower the cooperation barriers<br />David Lamas, TLU, 2011<br />45<br />
  46. 46. Folksonomies<br />But they have limits such as…<br />Semantic ambiguity<br />Polysemy, synonymy, cardinality and the use of acronyms<br />Syntax free<br />Spaces and multiple words are used without rules<br />Language<br />Different languages can be used for the same tag<br />Being eventually shortsighted<br />Fail to depict the general overview<br />Lack of (or minimal) structure<br />No explicit relationships between otherwise related tags<br />David Lamas, TLU, 2011<br />46<br />
  47. 47. Folksonomies and ontologies<br />Folksonomies<br />Domains<br />Large corpus<br />Informal categories<br />Unstable entities<br />Unclear edges<br />Participants<br />Naïve cataloguers<br />No authority<br />Uncoordinated users<br />Amateur users<br />Critical mass needed<br />Ontologies<br />Domains<br />Small corpus<br />Formal categories<br />Stable entities<br />Restricted entities<br />Clear edges<br />Participants<br />Expert cataloguers<br />Authoritative sources of judgment<br />Coordinated users<br />Expert users<br />David Lamas, TLU, 2011<br />47<br />
  48. 48. Folksonomies and ontologies<br />How do we choose?<br />Folksonomies are useful when all that is needed is the ability to link items to topics<br />Ontologies are useful when what is needed is to formally define meaning<br />But… do we need to choose?<br />Not really, at least that what current research is exploring<br />David Lamas, TLU, 2011<br />48<br />
  49. 49. Folksonomies and ontologies<br />Research directions include<br />The combination of the folksonomy and ontology approaches into an hybrid system where the most consensual constructs would long last while others would be forgotten or redefined<br />An approach that combines the ease and adaptability of folksonomy with the formality and semantic richness of an ontology<br />Quantitative tag analysis and qualitative use analysis in current online social networking services<br />To understand if tag usage converge or not<br />To understand how a folksonomy is formed<br />To… any ideas?<br />David Lamas, TLU, 2011<br />49<br />
  50. 50. Semantic web<br />David Lamas, TLU, 2011<br />
  51. 51. Semantic Web<br />The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help<br />One of the major obstacles to this has been the fact that most information on the Web is designed for human consumption, and even if it was derived from a database with well defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the web<br />Leaving aside the artificial intelligence problem of training machines to behave like people, the Semantic Web approach instead develops languages for expressing information in a machine processable form.<br />David Lamas, TLU, 2011<br />
  52. 52. Internet of things<br />David Lamas, TLU, 2011<br />
  53. 53. The internet of things<br />The internet of things might be described as a self-configuring wireless network of sensors whose purpose would be to interconnect all things<br />And the concept is attributed to the former Auto-ID Center, founded in 1999, based at the time at the MIT<br />An alternative viewfocuses instead on making all things addressable by the existing naming protocols<br />In the current vision, objects themselves do not interact, but they may now be referred to by other agents, such as centralized servers acting for their human users<br />David Lamas, TLU, 2011<br />
  54. 54. Metadata and Ontologies recap<br />Metadata<br />Ontologies<br />Folksonomies<br />The sematic web<br />The internet of things<br />David Lamas, TLU, 2011<br />54<br />