WP2 1st Review


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

WP2 1st Review

  1. 1. 5/17/2011<br />www.insemtives.eu<br />1<br />INSEMTIVESIncentives for Semantics<br />WP 2 - Models and Methods for the Creation and Usage of Lightweight, Structured Knowledge<br />Pierre Andrews, Ilya ZaihrayeuUNITN<br />
  2. 2. Work package dependencies<br />
  3. 3. Motivation<br />The current Web 2.0 annotation model lacks formal semantics and, therefore, suffers from several shortcomings, e.g.:<br />Searching for “images” should (not) return resources annotated with “picture” (synonymy problem)<br />Searching for “java” (books) should (not) return resources annotated with “java” (drink) (polysemy problem)<br />Searching for “animals” should (not) return resources annotated with “dogs” (specificity gap problem)<br />These problems have a negative effect on the QoS for the end user (e.g., correctness, completeness)<br />An annotation model with formal semantics can address these (and other) problems and enable new services<br />5/17/2011<br />www.insemtives.eu<br />3<br />
  4. 4. Aims and outcomes<br />Key problem: how to enrich the Web 2.0 user-annotation-resource model with semantics and semantics-aware services that an ordinary user can comprehend and make use of?<br />Models for annotations:<br />Structural complexity (e.g., tags vs. attributes)<br />Vocabulary support (e.g., free tags vs. controlled tags)<br />Collaboration level (e.g., single user vs. shared vocabularies)<br />Services:<br />Annotation bootstrapping (help the user at the initial phase)<br />Vocabulary evolution (help users “talk” the same annotation language)<br />Annotation evolution (keep annotations synced with the vocabulary)<br />Semantic search<br />Annotation linkage (enable cross-platform services)<br />Semantics-aware services<br />5/17/2011<br />www.insemtives.eu<br />4<br />annotation<br />sematnics<br />User<br />Resource<br />
  5. 5. Research – key challenges<br />What is the right level of complexity of semantics to let the ordinary user generate semantic annotations and to provide the user with useful new services?<br />How to (semi-)automatically extract annotations from user generated contents and link them to the underlying model?<br />How to support the users in (re)using the same semantics for annotations?<br />How to keep semantic annotations up-to-date when the underlying semantic model changes?<br />5/17/2011<br />www.insemtives.eu<br />5<br />
  6. 6. WP 2 TIMELINE AND DELIVERABLES<br />Months<br />24<br />12<br />18<br />30<br />36<br />6<br />0<br />Tasks <br />D2.1.1: State of the Art and requirements from the use case partners<br />D2.1.2: Specification of the model<br />Task 2.1Designing models<br />UIBK <br />D2.2.1: Report on bootstrapping semantic annotations and on reaching consensus in the use of semantics<br />D2.2.2+D2.2.3: Report on linking semantic annotations to external sources and on keeping them up-to-date when the underlying semantic model changes<br />D2.4 Report on the refinement of the proposed models, methods and semantic search<br />Task 2.2Designingmethods<br />UNITN<br />Task 2.3Research on Information Retrieval (IR) methods for semantic content<br />D2.3.1: Requirements for semantics-aware IR methods<br />D2.3.2: Specification for semantics-aware IR methods<br />ONTO<br />
  7. 7. Outcomes<br />Semantic annotation model<br />Annotation bootstrapping algorithm<br />Consensus reaching algorithm<br />Algorithm for supporting the evolution in time of annotations<br />Algorithms for semantic search and faceted navigation<br />Algorithms for linking semantic annotations to external sources<br />5/17/2011<br />www.insemtives.eu<br />7<br />
  8. 8. User involvement<br />Resource 2<br />1. Uncontrolledannotation<br />User(annotator)<br />Uncontrolledannotations<br />annotate<br />2. Vocabularyevolution via consensus on the use<br />Consensus - ontology maturing<br />Consult and import<br />Resource 1<br />Bootstrappedannotations<br />Manuallyaddedannotations<br />Controlledannotations<br />publish<br />Externalsources (DBPEdia, Yago, etc)<br />Link to<br />User(creator)<br />Context<br />3. Annotation evolution<br />bootstraping<br />file<br />User involvement<br />Search, navigate<br />User(consumer)<br />Annotation lifecycle<br />D2.2.1, D2.2.2, D2.2.3<br />User involvement<br />
  9. 9. Model<br />5/17/2011<br />www.insemtives.eu<br />9<br />Structural<br />Complexity<br />Flickr:<br /><ul><li> Tags
  10. 10. Attributes
  11. 11. Single user (public)
  12. 12. Uncontrolled</li></ul>ontology<br />TID+SeekDa<br /><ul><li> Tags
  13. 13. Attributes
  14. 14. Relations
  15. 15. Taxonomy</li></ul>PGP<br /><ul><li> Tags
  16. 16. Attributes
  17. 17. Relations
  18. 18. Ontology
  19. 19. Taxonomy</li></ul>relations<br />attributes<br />single user (private)<br />uncontrolled<br />tags<br />single user (public)<br />collective<br />Authority file<br />taxonomy<br />Community<br />Type<br />Vocabulary<br />Type<br />
  20. 20. Bootstrapping: motivation<br />Data on the web grows very fast:<br />161 exabytes (108 TB) ofinformation was created or replicated worldwide in 2006<br />6X growth is predicted by 2010<br />The largest source of data is:<br />user generated content with 4+ billion devices – cameras, phones, PCs, CCTVs – mostly multimedia!<br />will increase 50% by 2010<br />However, at the publishing time, metadata encoded in the local context in which the multimedia items resided is lost <br />Source: invited talk of Michael Brodie at VLDB 2007<br />
  21. 21. 13/01/2010<br />Bootstrapping: Tag Use at Flickr<br />Sample from Flickr:<br /><ul><li> 2 403 594 photos
  22. 22. 482 006 tags
  23. 23. 13 163 034 tag-photo pairs</li></ul>Images with 0 or 1 tag = 27.3 %<br />
  24. 24. Bootstrapping Algorithm<br />5/17/2011<br />www.insemtives.eu<br />12<br />Photos/conferences/Brain and Computer Science/java<br />Tokenisation<br />Lemmatisation<br />Multi Word Detection<br />photo<br />conference<br />brain science<br />java<br />computer science<br />Word Sense Disambiguation<br />conference#1<br />Brain science#1<br />Java#1 (programming)<br />Computer science#1<br />WordNet Knowledge Base<br />
  25. 25. 13/01/2010<br />Reaching Consensus: Tag reuse in Flickr<br />Sample from Flickr:<br /><ul><li> 2 403 594 photos
  26. 26. 482 006 tags
  27. 27. 13 163 034 tag-photo pairs</li></ul>About 49% of tags are used only once<br />% Of Total Vocabulary<br />Frequently used tags make a <br />small part of the total vocabulary (less 1 % > 512 times)<br />How many times a tag is used<br />
  28. 28. Consensus through User Interaction<br />Tag Recommendation<br />5/17/2011<br />www.insemtives.eu<br />14<br />javaIsland<br />“java island”<br />Sea#water mass<br />java<br />Java#island<br />Island#land mass<br />java<br />Java#coffee<br />Java#language<br />Jav<br /><ul><li>Java#island
  29. 29. java#coffee
  30. 30. Javanese#citizen</li></ul>.<br />.<br />
  31. 31. Consensus through clustering<br />5/17/2011<br />www.insemtives.eu<br />15<br />Sea#water mass<br />SEA<br />
  32. 32. Evolution in Time<br />5/17/2011<br />www.insemtives.eu<br />16<br />resource<br />Sea<br />Sea#1<br />Adriatic#1<br />evolve<br />evolve<br />water#2<br />Sea#1<br />…<br />Adriatic#1<br />…<br />
  33. 33. Evolution in time through clustering<br />5/17/2011<br />www.insemtives.eu<br />17<br />Adriatic#sea<br />Sea#water mass<br />
  34. 34. Outlook<br />Further specification and refinement of models, algorithms, and semantic IR methods (to be reported in D2.4 due on M20)<br />Know-how transfer and integration with WP3 and WP4 as well as with use case WPs<br />Work on the implementation of models and algorithms (part of WP3)<br />5/17/2011<br />www.insemtives.eu<br />18<br />