SAnno: a unifying framework for semantic annotation


Published on

A talk presentation of SAnno at IDSIA, 2010/06/01.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SAnno: a unifying framework for semantic annotation

  1. 1. SAnno: a unifying framework for semantic annotationDavide EynardIDSIA, 01/06/2010
  2. 2. 2Introduction • S(emantic)Anno(tations) • … in Italy, “sanno” also means “they know” • Basic principle: anyone should be able to say anything about anything else • Well, this should hold in general :-) • Actually, in our case it is “anything about any URI” • And we would like everyone to say that in a formal way • But first, a little step back in time... IDSIA, 01/06/2010 Davide Eynard
  3. 3. 3Participation and semantics Data Structure IDSIA, 01/06/2010 Davide Eynard
  4. 4. 4Sannos grandfather: Speakinabout [1] • Purpose: produce semantic annotations about named entities • When you read “Harry Potter”, is it the book or the movie? • Plays with user gratifications • When users annotate a string as matching a specific concept, they are shown a list of services/search engines which are related to it • Relies on user provided data: • Freebase types • User generated search templates, built inside a wiki system IDSIA, 01/06/2010 Davide Eynard
  5. 5. 5Sannos grandfather: Speakinabout [1] IDSIA, 01/06/2010 Davide Eynard
  6. 6. 6Sannos grandfather: Speakinabout [1] IDSIA, 01/06/2010 Davide Eynard
  7. 7. 7Sannos father: RDFMonkey [2] • Purpose: augment browsing experience by providing information/services related to the visited URL • Relies on Freebase types • … as in SpeakinAbout, but without requiring user interaction • Types are found by searching backlinks in Freebase (which topics are linking the visited page) • Related services as widgets inside a browser extension • The app could load widgets at runtime (from Freebase itself or another collaborative system) IDSIA, 01/06/2010 Davide Eynard
  8. 8. 8 Sannos father: RDFMonkey [2] CitiesMusical Artists Books IDSIA, 01/06/2010 Davide Eynard
  9. 9. 9The problem • We already have semantics on the annotation (i.e. Annotea), but how can we have semantics within the annotation? • Good starting points: • Some participative systems already provide semi-structured information (i.e. infoboxes in Wikipedia) • Some communities of practice already built their own bottom-up way to structure information (i.e. machine tags) • Some (relatively new) systems allow, with some additional effort, to save information in a structured way almost without requiring users to know that (i.e. semantic wikis) • Challenges • Provide a shared way to describe annotations coming from heterogeneous systems • Aggregate this information to provide something new and useful IDSIA, 01/06/2010 Davide Eynard
  10. 10. 10SAnno as a framework • Sanno is built up of many different parts, which all together provide something (we consider) new and useful • An ontology to describe annotations (the “shells” that contain metadata about a resource) • An ontology describing the types of properties we are already able to aggregate • A set of conversion tools which are able to translate existing annotations from other systems into our notation • A system to show the results of the aggregation of different annotations • A system to manage provenance, authorship, and filters on incoming annotations IDSIA, 01/06/2010 Davide Eynard
  11. 11. 11The annotations ontology • Every annotation can be considered as a “Post-it”, a piece of paper where something is written about something else • … you can say things about what is written there, but also about the Post-it itself • The annotation is about a resource, it is created by someone in a specific date, it comes from a particular annotation system and might be connected to a specific community • Main goal: do not reinvent everything from scratch • Reuse well-known ontologies such as DC, SIOC, etc. • Use named graphs as an alternative to reifications • Start in an easy way: restriction to URLs • Also a way to provide instant gratification to users: show annotations while they are browsing a website IDSIA, 01/06/2010 Davide Eynard
  12. 12. 12The aggregation ontology • Aggregation deals with the contents of the annotation (i.e. The triples found in the NG) • Objectives • Avoid constraining users to a specific vocabulary for annotations • Find a way to collect different annotations and provide something new and interesting by aggregating them  Our approach • Properties used inside annotations could be described as belonging to families we already know how to deal with • Examples: very specific (tags, ratings), more general (transitive relations) • Properties inside some external vocabulary are mapped as subproperties of ours • … by whom? High-experience users who have incentives to do this (think about users building templates in Wikipedia...) IDSIA, 01/06/2010 Davide Eynard
  13. 13. 13Conversion tools • Our worst enemy: the bootstrap • who is going to annotate the first resources? I dont have time! • Our best friends: already existing annotation systems • why dont we convert existing data to our notation and show the advantages of our approach?  Different families of conversion tools • Easy: already existing APIs, with realtime search functionalities (i.e. • Medium: conversions from existing structured repositories such as SPARQL endpoints (advantage: the conversion is very clean, you just need one tool and different CONSTRUCTs) • A little harder: Web scraping when no other sources are available IDSIA, 01/06/2010 Davide Eynard
  14. 14. 14Annotation client • Actually, two possible clients in our mind: • a browser extension which shows annotations while users are browsing the Web • an independent service which is able to aggregate heterogeneous information related to similar resources (i.e. URLs marked as being MP3 files) • Filter annotations according to author, date, originating system, and community • Users should be able to “subscribe” to some annotating communities and ignore others • System is thought as distributed, as data can come from different, unrelated sources IDSIA, 01/06/2010 Davide Eynard
  15. 15. 15The prototype • Early annotation ontology • Property families: tag, rating, generically related URI • Conversions from SMW, Delicious • Visualization as a web service + Firefox extension • No subscriptions yet IDSIA, 01/06/2010 Davide Eynard
  16. 16. 16The prototype IDSIA, 01/06/2010 Davide Eynard
  17. 17. 17The end Thank you! Questions? References: • [0] D.Laniado, D.Eynard and M.Colombetti. Using WordNet to turn a folksonomy into a hierarchy of concepts. Semantic Web Application and Perspectives 192–201, 2007. • [1] D.Eynard and M.Colombetti. Exploiting User Gratification for Collaborative Semantic Annotation. Proceedings of SWUI 2008. April 2008. • [2] D.Eynard. Using semantics and user participation to customize personalization. HP Labs Technical Report HPL-2008-197. September 2008. • [3] L.Mazzola, D.Eynard and R.Mazza. GVIS: a framework for graphical mashups of heterogeneous sources to support data interpretation. HSI 2010. May 2010. IDSIA, 01/06/2010 Davide Eynard
  18. 18. Contact Davide Eynard Tel. 02 2399 4010 Fax 02 2399 3411 Project page @AIRLab: Back