Classification Systems

2,363 views

Published on

A quick and opinionated presentation on four major classification systems: Folksonomies, Thesauri, Taxonomies and Ontologies.

Published in: Education, Technology

Classification Systems

  1. 1. .. Classifica on Systems Alberto Simões ambs@ilch.uminho.pt March 14th , 2012 Alberto Simões Classifica on Systems
  2. 2. . Classifica on Systems Humans tend to organize; ``disorganiza on is a kind of organiza on This organiza on is usually done by classifica on; Classifica on can be as simple as tagging an object; ``this is the pile of important documents, that of the non-important ones Classifica on is used everywhere! Alberto Simões Classifica on Systems
  3. 3. . Where are classifica on systems used? Internet Social Networks (tagging); Libraries (universal decimal classifica on); Medicine (illness classifica on); Chemistry (periodic table); Geography (geographic taxonomy); Biology (protein classifica on); Linguis cs (languages classifica on); Alberto Simões Classifica on Systems
  4. 4. . Classifica on Systems Classes Classifica on Systems can also be classified; One way to classify classifica on systems is by their ability to include proper es and rela ons between the classified objects; We will discuss four types of classifica on systems: Folksonomies Taxonomies Thesauri Ontologies Alberto Simões Classifica on Systems
  5. 5. . Class Task A B C D E F Alberto Simões Classifica on Systems
  6. 6. . Folksonomies Alberto Simões Classifica on Systems
  7. 7. . Folksonomies A folksonomy is a system of classifica on derived from the prac ce and method of collabora vely crea ng and managing tags to annotate and categorize content; this prac ce is also known as collabora ve tagging, social classifica on, social indexing, and social tagging. Folk- sonomy, a term coined by Thomas Vander Wal, is a port- manteau of folk and taxonomy. Folksonomy (Wikipedia, 2012) Alberto Simões Classifica on Systems
  8. 8. . Folsksonomies: How they work The other classifica on techniques we will see, define someone or some group in charge of crea ng the classifica on system structure (authority); This group of people see the world from a specific point of view, that can be, or not, shared by others; Folksonomies solve this problem: power to the people; Instead of par oning the world accordingly with a view, lets the user present facets of objects; Users assign keywords (or tags, or labels) to objects (individuals); These keywords can be searched, indexed, and mathema cal models can be applied to this data. Alberto Simões Classifica on Systems
  9. 9. . Folksonomies An empirical analysis of the complex dynamics of tag- ging systems, published in 2007, has shown that consen- sus around stable distribu ons and shared vocabularies does emerge, even in the absence of a central controlled vocabulary. For content to be searchable, it should be categorized and grouped. While this was believed to require commonly agreed on sets of content describing tags (much like keywords of a journal ar cle), recent re- search has found that, in large folksonomies, common structures also emerge on the level of categoriza ons. Accordingly, it is possible to devise mathema cal mod- els that allow for transla ng from personal tag vocab- ularies (personomies) to the vocabulary shared by most users. Folksonomy (Wikipedia, 2012) Alberto Simões Classifica on Systems
  10. 10. . Folksonomies: example Top categories in the PT Wikipedia (those that dont have spaces): 375 Sociologia 383 Ponerinae 395 Afro-brasileiros 404 Drilliidae 413 Filosofia 415 Coleophoridae 424 Psicologia 428 Terebridae 445 Clathurellinae 445 Digimons 445 Teuto-brasileiros 451 Apiaceae 483 Asteroides 486 Luso-brasileiros 492 Acaena 526 Rubiaceae 537 Dolichoderinae 730 Agonoxenidae 735 Acalypha 753 Mangeliinae 762 Crambidae 787 Poaceae 808 Coletâneas 824 Theraphosidae 854 Myrmicinae 962 Fabaceae 974 Formicidae 1065 Agrostis 1096 Formicinae 1177 Aloe 1328 Conus 1338 Ítalo-brasileiros 1395 Asteraceae 1433 Coleophora 1514 Arctiidae 1516 Alchemilla 1689 Turridae 1879 Camponotus 2163 Acer 2744 Acacia Alberto Simões Classifica on Systems
  11. 11. . Folksonomies: Pros and Cons Pros: doesnt require expert cataloguers, authorita ve sources or expert users; capability of matching users real needs and language: (inclusive --- includes everyones words and vocabulary) controlled vocabularies are not prac cally and economically extensible; a low-investment bridge between personal classifica on and shared classifica on; easy to use and quick to classify big quan es of individuals; not all the limita ons of folksonomies are defects :-) Alberto Simões Classifica on Systems
  12. 12. . Folksonomies: Pros and Cons Cons: by itself, the vocabulary is flat; (there is no structure, just terms) not usable for small or few users collec ons; (sta s c methods significance is dependent on popula on size) without some technology help, vocabularies get inexact or ambiguous; have a very low findability quo ent. They are great for serendipity and browsing but not aimed at a targeted approach or search; Alberto Simões Classifica on Systems
  13. 13. . Taxonomies Alberto Simões Classifica on Systems
  14. 14. . Taxonomies Taxonomy is the science of iden fying and naming species, and arranging them into a classifica on. The field of taxonomy, some mes referred to as ``biological taxonomy, revolves around the descrip on and use of taxonomic units, known as taxa. A resul ng taxonomy is a par cular classifica on, arranged in a hierarchical structure or classifica on scheme. Taxonomy (Wikipedia, 2012) Alberto Simões Classifica on Systems
  15. 15. . Taxonomies taxonomy [tækˈsɒnəmɪ] n. (Life Sciences & Allied Applica ons / Biology) the branch of biology concerned with the classifica on of organisms into groups based on similari es of structure, origin, etc. the prac ce of arranging organisms in this way. the science or prac ce of classifica on. [from French taxonomie, from Greek taxis order + - nomy] Collins English Dic onary – Complete and Unabridged © HarperCollins Publishers 1991, 1994, 1998, 2000, 2003 Alberto Simões Classifica on Systems
  16. 16. . Taxonomies: How they work? Used to par oning the world in disjunc ve classes or groups; Each group is, again, par oned in sub-classes or sub-groups; And sub-classes are par oned, and… Individuals are classified in one leaf category; (a classifica on is a path in the tree) Alberto Simões Classifica on Systems
  17. 17. . Taxonomies: The typical example Alberto Simões Classifica on Systems
  18. 18. . Taxonomies: The example you use everyday Main index (top level) of Universal Decimal Classifica on: 0 Generali es (now Science and knowledge. Organiza on. Computer Science. Informa on. Documenta on. Librarianship. Ins tu ons. Publica ons) 1 Philosophy. Psychology 2 Religion. Theology 3 Social Sciences 4 Vacant 5 Mathema cs and natural sciences 6 Applied sciences. Medicine. Technology 7 The arts. Recrea on. Entertainment. Sport 8 Language. Linguis cs. Literature 9 Geography. Biography. History Alberto Simões Classifica on Systems
  19. 19. . Taxonomies: The example you use everyday 8 Language. Linguis cs. Literature 80 General ques ons […] linguis cs and literature. Philology 81 Linguis cs and languages 81-11 Schools and trends in linguis cs 81-13 Methodology of linguis cs. Methods and means 811 Languages 811.1/.2 Indo-European Languages 811.3 Dead languages of unknown affilia on. Caucasian languages 811.4 Afro-Asia c, Nilo-Saharan, Congo-Kordofanian, Khoisan languages 811.5 Ural-Altaic, Palaeo-Siberian, Eskimo-Aleut, Dravidian and Sino-Tibetan languages. Japanese. Korean… 811.6 Austro-Asia c languages. Austronesian languages 811.7 Indo-Pacific (non-Austronesian) languages. Australian languages 811.8 American indigenous languages 811.9 Ar ficial languages 82 Literature Alberto Simões Classifica on Systems
  20. 20. . Taxonomies: Class Task 0 Science and knowledge. Organiza on. Computer Science. Informa on… 1 Philosophy. Psychology 2 Religion. Theology 3 Social Sciences 5 Mathema cs and natural sciences 6 Applied sciences. Medicine. Technology 7 The arts. Recrea on. Entertainment. Sport 8 Language. Linguis cs. Literature 9 Geography. Biography. History Prolog Programming for Ar ficial Intelligence, Prof Ivan Bratko Alberto Simões Classifica on Systems
  21. 21. . Taxonomies: Class Task 5 Mathema cs, Natural Sciences 51 Mathema cs 519 (no name, virtual class) 519.6 Computa onal mathema cs. Numerical Analysis Prolog Programming for Ar ficial Intelligence, Prof Ivan Bratko SDUM Alberto Simões Classifica on Systems
  22. 22. . Taxonomies: Class Task 0 Science and knowledge. Organiza on. Computer Science. Informa on… 00 Prolegomena. Fundamentals of knowledge and culture. Propaedeu cs 004 Computer science and technology. Compu ng. Data processing 004.4 So ware 004.42 Computer programming. Computer programs Prolog Programming for UA Ar ficial Intelligence, Prof Ivan Bratko Alberto Simões Classifica on Systems
  23. 23. . Taxonomies: Class Task 0 Science and knowledge. Organiza on. Computer Science. Informa on… 00 Prolegomena. Fundamentals of knowledge and culture. Propaedeu cs 004 Computer science and technology. Compu ng. Data processing 004.4 So ware 004.43 Computer Languages Prolog Programming for Ar ficial Intelligence, IPP Prof Ivan Bratko Alberto Simões Classifica on Systems
  24. 24. . Taxonomies: Class Task 0 Science and knowledge. Organiza on. Computer Science. Informa on… 00 Prolegomena. Fundamentals of knowledge and culture. Propaedeu cs 004 Computer science and technology. Compu ng. Data processing 004.8 Ar ficial intelligence Prolog Programming for Ar ficial Intelligence, UAlg Prof Ivan Bratko Alberto Simões Classifica on Systems
  25. 25. . Taxonomies: Pros and Cons Pros: rigid tree, makes it easy to process; suitable for some areas (like life classifica on); the hierarchy helps searching for terms (abstrac on); Cons: rigid tree, makes it difficult to classify; (different people classify objects differently) the structure is defined by some authority group; (for example, the UDC Consor um) forces the subdivision of the world; (categories are single-parental) as a workaround, people classify in more than one category; (so, the rigid tree Pro gets a Con) Alberto Simões Classifica on Systems
  26. 26. . Thesauri Alberto Simões Classifica on Systems
  27. 27. . Thesauri A thesaurus is a reference work that lists words grouped together according to similarity of meaning (contain- ing synonyms and some mes antonyms), in contrast to a dic onary, which contains defini ons and pronuncia- ons. Thesauri (Wikipedia, 2012) Alberto Simões Classifica on Systems
  28. 28. . Thesauri In Informa on Science, Library Science, and Informa on Technology, specialized thesauri are designed for infor- ma on retrieval. They are a type of controlled vocabu- lary, for indexing or tagging purposes. Such a thesaurus can be used as the basis of an index for online material. […] Unlike a literary thesaurus, these specialized thesauri typically focus on one discipline, subject or field of study. Thesauri (Wikipedia, 2012) Alberto Simões Classifica on Systems
  29. 29. . Thesauri: How they work! Thesauri for informa on retrieval are typically con- structed by informa on specialists, and have their own unique vocabulary defining different kinds of terms and rela onships. Terms are the basic seman c units for conveying con- cepts. They are usually single-word nouns, since nouns are the most concrete part of speech. […] When a term is ambiguous, a ``scope note can be added to ensure consistency, and give direc on on how to interpret the term. ``Term rela onships are links between terms. These re- la onships can be divided into three types: hierarchical, equivalency or associa ve. Thesauri (Wikipedia, 2012) Alberto Simões Classifica on Systems
  30. 30. . Thesauri: How they work! Hierarchical rela onships are used to indicate terms which are narrower and broader in scope. A ``Broader Term (BT) or hyperonym is a more general term. Recip- rocally, a ``Narrower Term (NT) or hyponym is a more specific term. BT and NT are reciprocals; a broader term necessarily implies at least one other term which is narrower. BT and NT are used to indicate class rela onships, as well as part-whole rela onships (meronyms and holonyms). Thesauri (Wikipedia, 2012) Alberto Simões Classifica on Systems
  31. 31. . Thesauri: How they work! Example of a thesaurus with hierarchical rela ons. Feline NT Cat NT Panther Cat BT Feline Panther BT Feline NT Pink Panther Pink Panther BT Panther Alberto Simões Classifica on Systems
  32. 32. . Thesauri: How they work! The equivalency rela onship is used primarily to con- nect synonyms and near-synonyms. ``Use (USE) and ``Used For (UF) indicators are used when an autho- rized term is to be used for another, unauthorized, term. Unauthorized terms are o en called ``entry vo- cabulary, ``entry points, ``lead-in terms, or ``non- preferred terms, poin ng to the authorized term (also referred to as the ``preferred term or ``descriptor) that has been chosen to stand for the concept. Thesauri (Wikipedia, 2012) Alberto Simões Classifica on Systems
  33. 33. . Thesauri: How they work! Example of a thesaurus with equivalency rela ons. Parliament USE European Parliament Parliament of Europe USE European Parliament European Parliament UF Parliament UF Parliament of Europe Alberto Simões Classifica on Systems
  34. 34. . Thesauri: How they work! Associa ve rela onships are used to connect two re- lated terms whose rela onship is neither hierarchical nor equivalent. This rela onship is described by the in- dicator ``Related Term (RT). Associa ve rela onships should be applied with cau on, since excessive use of RTs will reduce specificity in searches. Consider the following: if the typical user is searching with term "A", would they also want resources tagged with term "B"? If the answer is no, then an associa ve rela onship should not be established. Thesauri (Wikipedia, 2012) Alberto Simões Classifica on Systems
  35. 35. . Thesauri: How they work! Example of a thesaurus with associa ve rela ons. Douro Porto BT River BT Portugal RT Porto RT Gaia Portugal NT Porto River NT Gaia NT Douro City Gaia NT Gaia BT Portugal NT Porto Note RT is not symmetrical. a RT b ̸⇒ b RT a. Alberto Simões Classifica on Systems
  36. 36. . Thesauri: a simple example Extract of Food Safety relationships in AGROVOC Quality Asia BT BT NT NT RT RT Food Safety Contamination China USE USE Food Food Contamination Alberto Simões Classifica on Systems
  37. 37. . Thesauri: Pros and Cons Pros: More flexible than Taxonomies; (does not require a tree, work as a graph) Have other types of rela onship than simple hierarchy; There is an ISO standard that documents their correct use; Standard defines mathema cal proper es for rela onships; Cons: Standardized types of rela onship are kind of limited; (same rela on for hyperonyms and meronyms) (non-hierarchical rela on is too vague: related) No support for rela onships with non-terms (features); Alberto Simões Classifica on Systems
  38. 38. . Ontologies Alberto Simões Classifica on Systems
  39. 39. . Ontologies Ontology is the philosophical study of the nature of be- ing, existence, or reality as such, as well as the basic cat- egories of being and their rela ons. Tradi onally listed as a part of the major branch of philosophy known as metaphysics, ontology deals with ques ons concerning what en es exist or can be said to exist, and how such en es can be grouped, related within a hierarchy, and subdivided according to similari es and differences. Ontology (Wikipedia, 2012) Alberto Simões Classifica on Systems
  40. 40. . Ontologies In computer science and informa on science, an ontol- ogy formally represents knowledge as a set of concepts within a domain, and the rela onships between those concepts. It can be used to reason about the en es within that domain and may be used to describe the do- main. Ontology: informa on science (Wikipedia, 2012) Alberto Simões Classifica on Systems
  41. 41. . Ontologies Contemporary ontologies share many structural simi- lari es, regardless of the language in which they are expressed. Most ontologies describe individuals (in- stances), classes (concepts), a ributes, and rela ons. Ontology: informa on science (Wikipedia, 2012) Alberto Simões Classifica on Systems
  42. 42. . Ontologies Individuals are the instances or objects (the basic or ``ground level objects). Ontology: informa on science (Wikipedia, 2012) Unlike any of the other classifica on systems, Ontologies clearly include the individuals (or objects being classified) in the structure. Alberto Simões Classifica on Systems
  43. 43. . Ontologies Individuals are the instances or objects (the basic or ``ground level objects). Ontology: informa on science (Wikipedia, 2012) Unlike any of the other classifica on systems, Ontologies clearly include the individuals (or objects being classified) in the structure. Alberto Simões Classifica on Systems
  44. 44. . Ontologies Classes are sets, collec ons, concepts, […] or kinds of things. Ontology: informa on science (Wikipedia, 2012) Classes are the concepts used in Thesauri and Taxonomy. They can be super-classes, including sub-classes, or can just include individuals (low level classes, leafs if we were talking about taxonomies). Alberto Simões Classifica on Systems
  45. 45. . Ontologies Classes are sets, collec ons, concepts, […] or kinds of things. Ontology: informa on science (Wikipedia, 2012) Classes are the concepts used in Thesauri and Taxonomy. They can be super-classes, including sub-classes, or can just include individuals (low level classes, leafs if we were talking about taxonomies). Alberto Simões Classifica on Systems
  46. 46. . Ontologies A ributes are aspects, proper es, features, characteris- cs, or parameters that objects (and classes) can have. Ontology: informa on science (Wikipedia, 2012) A ributes are proper es of individuals or classes. If the individual is a book on a library, a property can be the number of pages, the tle, the author. For a class, like ``mammal, an a ribute can be a reference to its fur. Alberto Simões Classifica on Systems
  47. 47. . Ontologies A ributes are aspects, proper es, features, characteris- cs, or parameters that objects (and classes) can have. Ontology: informa on science (Wikipedia, 2012) A ributes are proper es of individuals or classes. If the individual is a book on a library, a property can be the number of pages, the tle, the author. For a class, like ``mammal, an a ribute can be a reference to its fur. Alberto Simões Classifica on Systems
  48. 48. . Ontologies Rela ons are ways in which classes and individuals can be related to one another. Ontology: informa on science (Wikipedia, 2012) Rela ons are similar to the rela ons used in Thesauri, but unlike them, there isnt a list of valid rela ons. They can be the common hierarchical rela ons, or the rela on ``eat rela ng animals with the animals they eat. Alberto Simões Classifica on Systems
  49. 49. . Ontologies Rela ons are ways in which classes and individuals can be related to one another. Ontology: informa on science (Wikipedia, 2012) Rela ons are similar to the rela ons used in Thesauri, but unlike them, there isnt a list of valid rela ons. They can be the common hierarchical rela ons, or the rela on ``eat rela ng animals with the animals they eat. Alberto Simões Classifica on Systems
  50. 50. . Ontologies Func on terms: complex structures formed from certain rela ons that can be used in place of an individual term in a statement. Ontology: informa on science (Wikipedia, 2012) Suppose you are adding Portuguese rivers to an Ontology. One can define a simple macro to add some default rela ons to the river:   Term → name ∼ River(name)= Is a → river  Is at → Portugal Alberto Simões Classifica on Systems
  51. 51. . Ontologies Func on terms: complex structures formed from certain rela ons that can be used in place of an individual term in a statement. Ontology: informa on science (Wikipedia, 2012) Suppose you are adding Portuguese rivers to an Ontology. One can define a simple macro to add some default rela ons to the river:   Term → name ∼ River(name)= Is a → river  Is at → Portugal Alberto Simões Classifica on Systems
  52. 52. . Ontologies Restric ons: formally stated descrip ons of what must be true in order for some asser on to be accepted as in- put. Ontology: informa on science (Wikipedia, 2012) We can force that a capital of a country it a city: add (X capital-of Y) iff X is-a City Alberto Simões Classifica on Systems
  53. 53. . Ontologies Restric ons: formally stated descrip ons of what must be true in order for some asser on to be accepted as in- put. Ontology: informa on science (Wikipedia, 2012) We can force that a capital of a country it a city: add (X capital-of Y) iff X is-a City Alberto Simões Classifica on Systems
  54. 54. . Ontologies Rules: statements in the form of an antecedent- consequent sentence that describe the logical inferences that can be drawn from an asser on in a par cular form. Ontology: informa on science (Wikipedia, 2012) In the other hand, if we trust who is edi ng an ontology, we can classify automa cally it as a city, and its country as a…country: X capital-of Y⇒X is-a City ∧ Y is-a Country Alberto Simões Classifica on Systems
  55. 55. . Ontologies Rules: statements in the form of an antecedent- consequent sentence that describe the logical inferences that can be drawn from an asser on in a par cular form. Ontology: informa on science (Wikipedia, 2012) In the other hand, if we trust who is edi ng an ontology, we can classify automa cally it as a city, and its country as a…country: X capital-of Y⇒X is-a City ∧ Y is-a Country Alberto Simões Classifica on Systems
  56. 56. . Ontologies Axioms: asser ons (including rules) in a logical form that together comprise the overall theory that the ontology describes in its domain of applica on. Ontology: informa on science (Wikipedia, 2012) Differs from Rules in the aspect as axioms are tests to guarantee the ontology structure. They are not used to infer new rela ons. They assert. Alberto Simões Classifica on Systems
  57. 57. . Ontologies Axioms: asser ons (including rules) in a logical form that together comprise the overall theory that the ontology describes in its domain of applica on. Ontology: informa on science (Wikipedia, 2012) Differs from Rules in the aspect as axioms are tests to guarantee the ontology structure. They are not used to infer new rela ons. They assert. Alberto Simões Classifica on Systems
  58. 58. . Ontologies Events: the changing of a ributes or rela ons. Ontology: informa on science (Wikipedia, 2012) Similar to rules, but react to events. For example, if the user adds a feature sta ng that an individual lays eggs, classify it as an oviparous. Note that the division in Rules, Axioms and Events is not universal, and depend quite a lot in the applica on that is used to support the ontology. Alberto Simões Classifica on Systems
  59. 59. . Ontologies Events: the changing of a ributes or rela ons. Ontology: informa on science (Wikipedia, 2012) Similar to rules, but react to events. For example, if the user adds a feature sta ng that an individual lays eggs, classify it as an oviparous. Note that the division in Rules, Axioms and Events is not universal, and depend quite a lot in the applica on that is used to support the ontology. Alberto Simões Classifica on Systems
  60. 60. . Ontologies: Example 1 Alberto Simões Classifica on Systems
  61. 61. . Ontologies: Example 2 Alberto Simões Classifica on Systems
  62. 62. . Ontologies: Pros and Cons Pros: More flexible than Thesauri; (graph with ad-hoc rela onships) Lots of formalisms and standards (OWL, SKOS, …); Lots of tools to edit (like Protégé); Languages for querying and comple on (like SPARQL); Cons: As a classifica on approach, requires an authority for its defini on, just like Taxonomies or Thesauri. Complexity: not everybody is able to create a detailed ontology. Alberto Simões Classifica on Systems
  63. 63. . Further Reading Folksonomies: Folksonomy Coinage and Defini on http://vanderwal.net/folksonomy.html Folksonomies: A User-Driven Approach to Organizing Content http://www.uie.com/articles/folksonomies/ Folksonomies: power to the people http://www.iskoi.org/doc/folksonomies.htm Folksonomies: Tidying up Tags? http://www.dlib.org/dlib/january06/guy/01guy.html Folksonomies - Coopera ve Classifica on and Communica on Through Shared Metadata http://www.adammathes.com/academic/ computer-mediated-communication/folksonomies.html Alberto Simões Classifica on Systems
  64. 64. . Further Reading Taxonomies: Taxonomy http://en.wikipedia.org/wiki/Taxonomy Perspec ves on Taxonomy, Classifica on, Structure and Find-ability http://www.serviceinnovation.org/included/docs/ kcs_taxonomy.pdf Universal Decimal Classifica on http://www.udcc.org/udcsummary/php/index.php Thesauri: Thesaurus http://en.wikipedia.org/wiki/Thesaurus Thesaurus principles and prac ce http://www.willpowerinfo.co.uk/thesprin.htm Alberto Simões Classifica on Systems
  65. 65. . Further Reading Ontologies: Ontology (informa on science) http://en.wikipedia.org/wiki/Ontology_ (information_science) Protégé Ontology Editor http://protege.stanford.edu/ OWL Web Ontology Language http://www.w3.org/TR/owl-features/ SPARQL Query Language for RDF http://www.w3.org/TR/rdf-sparql-query/ Alberto Simões Classifica on Systems

×