Metadata first, ontologies second

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    2 Favorites

    Metadata first, ontologies second - Presentation Transcript

    1. Towards a solution to extract knowledge from the social web (“metadata first, ontologies second”) Project Collaborative Ontology Building System (CollOnBus) INTEK Nets 2005-2007 Aitor Almeida, Borja Sotomayor, Joseba Abaitua , Diego Lopez de Ipiña
    2. Social web: source of knowledge
      • Crowds share and tag resources of different types:
        • pictures, music, posts, videoclips, slides, books, bookmarks, etc.
      • Social tagging (or crowd- tagging ) is a very effective and economic way of generating knowledge
          • Crowdsourcing “the trend of leveraging the mass collaboration enabled by Web2.0 technologies to achieve business goals. ”
            • <http://en.wikipedia.org/wiki/Crowdsourcing>
    3. Related work (since 2006)
      • mapping tags to ontologies
      • Schmitz 2006. Inducing Ontology from Flickr tags. WWW’2006: Collaborative Web Tagging workshop
      • Abbasi et. al. 2007. Organizing Resources on Tagging Systems using T-ORG. ESWC2007 SemNet workshop
      • identifying semantic relations
      • Specia, Motta. 2007. Integrating Folksonomies with the Semantic Web. ESWC2007
      • transforming folksonomies into formal representations
      • Marlow et al. 2006. Tagging, Taxonomy, Flickr, Article, ToRead. WWW’2006: Collaborative Web Tagging workshop
      • Hotho et al. 2006. Trend Detection in Folksonomies . Semantics And Digital Media Technology SAMT2006
      • Maala et. Al. A Conversion Process From Flickr Tags to RDF Descriptions. BIS2007 workshop
    4. Which knowledge representation model?
      • Extracting knowledge from data sharing Web 2.0 sites, but into which formal representation?
      • Semantic Networks
        • Lexical networks (WordNet)
      • Taxonomines
        • eg. categories from Wikipedia, Thesauri
      • Metadata
        • “ mapping to Dublin Core is a weak choice”
      • Ontologies
      • “ metadata first, ontologies second”
    5. Crowds tagging pictures
    6. Crowds tagging pictures Aitor Almeida Borja Sotomayor Diego López de Ipiña
    7. Crowds tagging pictures
    8. Crowds tagging posts
    9. Crowds tagging slides
    10. Crowds tagging books
    11. Crowds tagging URL
    12. Crowd-sharing of tags
      • Flickr, del.icio.us... group tags by social sharing (or “co-usage”)
        • but the semantic information that socially shared tags acquire is poorly exploited
    13. Mapping folksonomies into tag clusters
      • RawSugar <http://rawsugar.com/>
        • allows users to assign hierarchies to their tags, improving the navigation and searching of folksonomies
        • non-expert users will find it easier to tag resources without any restrictions
    14. Tag clustering
      • TAG clustering is the main technique used to improve the wealth of social tagging
        • but semantic relations are not detected
    15. Beyond tag clusters?
    16. Should we map them into ontologies?
    17. Better mapping 1st into metadata
    18. Metadata vs ontologies
      • Why are metadata structures better than ontologies (for resource classification and categorisation)?
      • Let’s reflect on different knowledge representations and about who use them:
        • Folksonomies (crowds)
        • Taxonomies, ontologies (knowledge engineers, AI/SW practitioners)
        • Metadata structures (librarians, archivists, documentalists)
    19. What are metadata?
    20. TAG vs metadata ?
    21. Metadata vs ontologies
      • Why are metadata structures better ?
        • Because metadata provide wide and complete range of facets for representing knowledge about an entity or resource
        • Each facet (or data type) could be part of one or several ontological structures
        • Facet “any of the definable aspects that make up a subject (as of contemplation) or an object (as of consideration)”
        • “ A faceted classification system allows the assignment of multiple classifications to an object, enabling the classifications to be ordered in multiple ways, rather than in a single, pre-determined, taxonomic order” (Wikipedia).
    22. Better mapping 1st folksonomies into metadata structures
    23. Dublin Core Metadata Initiative http://jodi.tamu.edu/Articles/v02/i02/Greenberg/metadataform.gif
    24. Dublin Core Metadata Initiative
    25. Dublin Core Metadata Inicitive
    26. Our mapping tool: folk2onto (? folk2meta) designed by Borja Sotomayor
    27. folk2onto: Tag Distiller
      • Tag Distiller :
        • Downloads tags from Web 2.0 sites
        • Matches each tag against WordNet (taking into account the tag’s context/cloud)
        • Filters out synonyms
        • Keeps the list of remaining tags
        • Generates an XML file
          • Implemented by Aitor Almeida
    28. TAG clouds from del.icio.us
      • http://del.icio.us/url/check?url=site
      • Looks for <title> and gets its content: the hash
      • Gets the RSS in
          • http://del.icio.us/rss/url/ + hash
      • Then tag-clouds are downloaded from
        • < rdf:li resource=&quot;http://del.icio.us/tag/&quot; >
    29. TAG clouds from Technorati
      • Technorati: blog aggregator
        • We can get tag clouds from Technoraty through: http://api.technorati.com/blogposttags?key= [apikey] &url= [blog URL]
    30. TAG clouds from Technorati
        • <?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?>
        • <!-- generator=&quot;Technorati API version 1.0 /blogposttags&quot; -->
        • <!DOCTYPE tapi PUBLIC &quot;-//Technorati, Inc.//DTD TAPI 0.02//EN&quot; &quot;http://api.technorati.com/dtd/tapi-002.xml&quot;>
        • <tapi version=&quot;1.0&quot;>
        • <document>
        • <result>
        • <querycount>13</querycount>
        • </result>
        • <item>
        • <tag>christmas cookie recipes</tag>
        • <posts>274</posts>
        • </item>
        • … .
    31. Tagged URL at Technorati
      • All <tag> elements are downloaded
      • To get the “title” http://api.technorati.com/bloginfo?key= [apikey] &url= [blog url]
      • And<name> is recovered
    32. semantic relations in WordNet
      • WordNet relations for tag ‘Spanish’:
    33. TAG filtering algorithm
      • Tags are filtered out by means of WordNet
      • If a TAG has only one meaning (synset) that meaning is assigned
      • If it has more than one, then
        • T: resources tag set
        • Related(a,b): gives 1 if a and b have some type of relation (hypernym, hyponym, holonym, meronym)
        • w: weights
      • Several iterations are made until a meaning is found (10 iterations max.)
    34. TAG filtering algorithm
      • Once senses have been discarded, synonyms are also filtered out
      • Words then are grouped in senses using WordNet’s relation network
      • The output is exported to a:
        • XML file with senses
        • XML file with tags that were discarded
        • RDF containing WordNet’s relation network
    35. TAG XML file
      • <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?>
      • <resource>
      • <tittle>PostgreSQL: Perguntas Frequentes (FAQ) sobre PostgreSQL</tittle>
      • <type>Text</type>
      • <format>text/html</format>
      • <identifier>www.postgresql.org/docs/faqs.FAQ_brazilian.html</identifier>
      • <tags>
      • <tag>
      • <lemma>tune</lemma>
      • < idlex>236726</idlex>
      • </tag>
      • <tag>
      • <lemma>bd</lemma>
      • <idlex>5604473</idlex>
      • </tag>
    36. TAG file without senses
      • <resource>
      • <tittle>Wired News: The Virus That Ate DHS</tittle>
      • <type>Text</type>
      • <format>text/html</format>
      • <identifier>www.wired.com/news/technology/0,72051-0.html?tw=rss.index</identifier>
      • <tags>
      • <tag>bit200f06</tag>
      • <tag>group141</tag>
      • <tag>dhs</tag>
      • <tag>group35</tag>
      • <tag>malware</tag><tag>group91</tag><tag>group17</tag>
      • <tag>group53</tag>
      • <tag>computer_security</tag>
      • </tags>
      • </resource>
    37. WordNet’s sense sets
      • Words are grouped in sense sets
        • If related(a,b) is = 1, then words are grouped in the same set
        • The relations depth has to be equal or smaller than 3
    38. folk2onto: Tag Trainer
    39. folk2onto: Map Trainer
    40. folk2onto: Tag Mapper
      • The Mapper makes tag-element associations
      • These associations are made according to the senses asigned by the Distiller
      • Mapping targets into Dublin Core metadata records
    41. folk2onto: Dublin Core
      • The Distiller gets 4 elements from the tag source (del.icio.us, Technorati, etc.):
        • Title : URL’s title -> from the <title> XML tag
        • Type : content type -> depending on the source (here both are “Text”)
        • Format : MIME class -> depending on the source (here we have 2 text/html)
        • Identifier : we take the resource’s URL
    42. folk2onto: Dublin Core
      • The Tag-Mapper deals with:
        • Subject : the “topic”.
        • Language : en, es, fr, de, ru...
        • Coverage : when, where (about the topic)
        • Rights : type of licence
    43. folk2onto: mapping formulae
      • When a TAG has one mapping, that TAG is used
      • If it has more than one:
      • If it has no mapping, then:
    44. folk2onto: file mapping
      • <rdf:RDF
      • xmlns:j.0=&quot;http://purl.org/dc/elements/1.1&quot;
      • xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot; >
      • <rdf:Description rdf:nodeID=&quot;A0&quot;>
        • <rdf:type rdf:resource=&quot;http://purl.org/dc/elements/1.1identifier&quot;/>
        • <j.0:identifier>www.postgresql.org/docs/faqs.FAQ_brazilian.html</j.0:identifier>
        • <j.0:type>Text</j.0:type>
        • <j.0:format>text/html</j.0:format>
        • <j.0:tittle>PostgreSQL: Perguntas Frequentes (FAQ) sobre PostgreSQL</j.0:tittle>
        • <j.0:subject>database</j.0:subject>
        • <j.0:subject>performance</j.0:subject>
        • <j.0:subject>bd</j.0:subject>
      • </rdf:Description>
      • </rdf:RDF>
    45. Mapping trainer
    46. folk2onto: 6 tests (A-F)
      • Experiment A : Selecting random synsets for the tags.
      • Experiment B : Without any limit in the semantic relation depth. Only taking into account the trained synsets (frec=0, wordnet=0, trained=1).
      • Experiment C : Without any limit in the semantic relation depth. Only taking into account the context (frec=0, wordnet=1, trained=0).
      • Experiment D : Without any limit in the semantic relation depth. Taking the context and the trained synsets into account (frec=0,=wordnet0.4, trained=0.6).
      • Experiment E : Without any limit in the semantic relation depth. Taking al three components of the equation (familiarity, context and trained synsets) into account (frec=0.1, wordnet=0.3, trained=0.6).
      • Experiment F : Limiting the semantic relation depth to 3 and taking the context and the trained synsets into account. (frec=0, wordnet=0.4, trained=0.6).
    47. folk2onto: tests output 278 (%12.8) 1894 (%87.2) F 823 (%37.9) 1349 (%62.1) E 680 (%31.3) 1492 (%68.7) D 973 (%44.8) 1199 (%55.2) C 578 (%26.6) 1594 (%73.4) B 1466 (%67.5) 706 (%32.5) A Erroneous synsets Correct synsets Experiment
    48. folk2onto: tests output
    49. Open issues
      • Tag filtering through WordNet
        • blog, wiki
        • xml, rdf, rss
        • wordpress, tuenti, flickr
        • social, open
      • “ tags can be about so many things
        • mapping to Dublin Core is a weak choice”
      • Mappings
        • Coverage: Japan
        • Language: Spanish
      • Learning the right synset of eg. &quot;jaguar&quot;
        • &quot;vehicle&quot;, &quot;video game console&quot;, or &quot;cat of prey&quot;
        • &quot;<dc:subject>Jaguar</dc:subject>&quot;
      • Word-sense disambiguation
        • tag-category disambiguation
    50. That was all about CollOnBus/folk2onto
      • Thank you very much!
      • Any question?

    + JosebaAbaituaJosebaAbaitua, 2 years ago

    custom

    1489 views, 2 favs, 0 embeds more stats

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 1489
      • 1489 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 2
    • Downloads 44
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories