Metadata first, ontologies second - Presentation Transcript
Towards a solution to extract knowledge from the social web (“metadata first, ontologies second”) Project Collaborative Ontology Building System (CollOnBus) INTEK Nets 2005-2007 Aitor Almeida, Borja Sotomayor, Joseba Abaitua , Diego Lopez de Ipiña
Social web: source of knowledge
Crowds share and tag resources of different types:
pictures, music, posts, videoclips, slides, books, bookmarks, etc.
Social tagging (or crowd- tagging ) is a very effective and economic way of generating knowledge
Crowdsourcing “the trend of leveraging the mass collaboration enabled by Web2.0 technologies to achieve business goals. ”
<http://en.wikipedia.org/wiki/Crowdsourcing>
Related work (since 2006)
mapping tags to ontologies
Schmitz 2006. Inducing Ontology from Flickr tags. WWW’2006: Collaborative Web Tagging workshop
Abbasi et. al. 2007. Organizing Resources on Tagging Systems using T-ORG. ESWC2007 SemNet workshop
identifying semantic relations
Specia, Motta. 2007. Integrating Folksonomies with the Semantic Web. ESWC2007
transforming folksonomies into formal representations
Marlow et al. 2006. Tagging, Taxonomy, Flickr, Article, ToRead. WWW’2006: Collaborative Web Tagging workshop
Hotho et al. 2006. Trend Detection in Folksonomies . Semantics And Digital Media Technology SAMT2006
Maala et. Al. A Conversion Process From Flickr Tags to RDF Descriptions. BIS2007 workshop
Which knowledge representation model?
Extracting knowledge from data sharing Web 2.0 sites, but into which formal representation?
Semantic Networks
Lexical networks (WordNet)
Taxonomines
eg. categories from Wikipedia, Thesauri
Metadata
“ mapping to Dublin Core is a weak choice”
Ontologies
“ metadata first, ontologies second”
Crowds tagging pictures
Crowds tagging pictures Aitor Almeida Borja Sotomayor Diego López de Ipiña
Crowds tagging pictures
Crowds tagging posts
Crowds tagging slides
Crowds tagging books
Crowds tagging URL
Crowd-sharing of tags
Flickr, del.icio.us... group tags by social sharing (or “co-usage”)
but the semantic information that socially shared tags acquire is poorly exploited
Mapping folksonomies into tag clusters
RawSugar <http://rawsugar.com/>
allows users to assign hierarchies to their tags, improving the navigation and searching of folksonomies
non-expert users will find it easier to tag resources without any restrictions
Tag clustering
TAG clustering is the main technique used to improve the wealth of social tagging
but semantic relations are not detected
Beyond tag clusters?
Should we map them into ontologies?
Better mapping 1st into metadata
Metadata vs ontologies
Why are metadata structures better than ontologies (for resource classification and categorisation)?
Let’s reflect on different knowledge representations and about who use them:
Because metadata provide wide and complete range of facets for representing knowledge about an entity or resource
Each facet (or data type) could be part of one or several ontological structures
Facet “any of the definable aspects that make up a subject (as of contemplation) or an object (as of consideration)”
“ A faceted classification system allows the assignment of multiple classifications to an object, enabling the classifications to be ordered in multiple ways, rather than in a single, pre-determined, taxonomic order” (Wikipedia).
Better mapping 1st folksonomies into metadata structures
<j.0:tittle>PostgreSQL: Perguntas Frequentes (FAQ) sobre PostgreSQL</j.0:tittle>
<j.0:subject>database</j.0:subject>
<j.0:subject>performance</j.0:subject>
<j.0:subject>bd</j.0:subject>
</rdf:Description>
</rdf:RDF>
Mapping trainer
folk2onto: 6 tests (A-F)
Experiment A : Selecting random synsets for the tags.
Experiment B : Without any limit in the semantic relation depth. Only taking into account the trained synsets (frec=0, wordnet=0, trained=1).
Experiment C : Without any limit in the semantic relation depth. Only taking into account the context (frec=0, wordnet=1, trained=0).
Experiment D : Without any limit in the semantic relation depth. Taking the context and the trained synsets into account (frec=0,=wordnet0.4, trained=0.6).
Experiment E : Without any limit in the semantic relation depth. Taking al three components of the equation (familiarity, context and trained synsets) into account (frec=0.1, wordnet=0.3, trained=0.6).
Experiment F : Limiting the semantic relation depth to 3 and taking the context and the trained synsets into account. (frec=0, wordnet=0.4, trained=0.6).
folk2onto: tests output 278 (%12.8) 1894 (%87.2) F 823 (%37.9) 1349 (%62.1) E 680 (%31.3) 1492 (%68.7) D 973 (%44.8) 1199 (%55.2) C 578 (%26.6) 1594 (%73.4) B 1466 (%67.5) 706 (%32.5) A Erroneous synsets Correct synsets Experiment
folk2onto: tests output
Open issues
Tag filtering through WordNet
blog, wiki
xml, rdf, rss
wordpress, tuenti, flickr
social, open
“ tags can be about so many things
mapping to Dublin Core is a weak choice”
Mappings
Coverage: Japan
Language: Spanish
Learning the right synset of eg. "jaguar"
"vehicle", "video game console", or "cat of prey"
0 comments
Post a comment