Multipedia: Enriching DBpedia with Multimedia information


Published on

Presentation given by Andrés García at KCAP2011 on the selection of images for dbpedia terms

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Multipedia: Enriching DBpedia with Multimedia information

  1. 1. Multipedia:Enriching DBpedia with Images<br />Andrés García-Silva†, Asunción Gómez-Pérez†<br />Max Jakob *, Pablo Mendez * and Chris Bizer ⃰<br />† {hgarcia, ocorcho,asun}<br />Facultad de Informática<br />Universidad Politécnica de Madrid<br />Campus de Montegancedo s/n<br />28660 Boadilla del Monte, Madrid, Spain<br />*<br />Web-based Systems Group<br />Freie Universitat Berlin, Germany<br />
  2. 2. Introduction<br />Enriching ontologies with multimedia<br />The use of images and videos complement information about concepts/entities in existing knowledge bases.<br />Multimodal ontologies can help in QA systems, User Interfaces, search and recommendation processes.<br />depicts<br />Pathology<br />IsA<br />«Show me X-ray Images with fractures of the Femur»<br />occurs<br />isA<br />Bone<br />depicts<br />Radhouani, S., HweeLim, J.: pierre Chevallet, J., Falquet, G.: Combining textual and visual ontologies to solve medical multimodal queries. In: IEEE International Conference on Multimedia and Expo., pp. 1853-1856 (2006).<br />2<br />Garcia-Silva et al. <br />
  3. 3. Introduction<br />Goal: <br />Populate a general purpose ontology with images from the Web. <br />- Find relevant images for ontology instances with ambiguous names<br />DBpedia knowledge base<br />Collects facts from Wikipedia containing 3.5 million entities, <br />Classified into a consistent cross-domain ontology: 272 classes and 1.6 million instances.<br />Has evolved into a hub in the linked data cloud.<br />Images in DBpedia<br />Wikipedia images are represented in DBpedia (foaf:depiction)<br />about 70% of the wikipedia articles don’t have images<br />3<br />Garcia-Silva et al. <br />
  4. 4. Introduction<br />Challenges<br />Ambiguity of instance labels<br />Querying the web for images related to the resource dbpedia:hornet<br />4<br />Garcia-Silva et al. <br />
  5. 5. Related Work<br />5<br />Garcia-Silva et al. <br />
  6. 6. Enriching DBpedia with Multimedia<br />6<br />Garcia-Silva et al. <br />Get Context<br />Retrieve Images<br />Aggregate <br />Generate tag-based ranking<br />Aggregate<br />dbpr:Hornet<br />Wikipedia-based Context Index<br />Related terms<br />Query per context term & dbpr name<br />Image Search Engines<br />Rankings of Images<br />(One per each query)<br />List of Images<br />Annotated with tags<br />Ranking of Images<br />Ranking of Images<br />Ranking of Images<br />
  7. 7. Enriching DBpedia with Multimedia<br />7<br />Garcia-Silva et al. <br />Get Context<br />Wikipedia article<br />dbpr:Hornet<br />Wikipedia-based Context Index<br />family, wasps, insect<br />
  8. 8. Enriching DBpedia with Multimedia<br />8<br />Garcia-Silva et al. <br />Retrieve Images<br />dbpr:Hornet<br />family, wasps, insect<br />Q0=Hornet<br />Q1=Hornet and Family<br />Q2=Hornet and Wasps<br />Q3=Hornet and insect<br />Image Search Engines<br />Image Rankings<br />R0 = img0,1; img0,2 ... Img0,k<br />R1 = img1,1; img1,2 ... Img1,l<br />R2 = img2,1; img2,2 ... Img2,m<br />R3 = img3,1; img3,2 ... Img3,n<br />
  9. 9. Enriching DBpedia with Multimedia<br />9<br />Garcia-Silva et al. <br />Aggregate<br />R0 = img0,1; img0,2 ... Img0,k<br />R1 = img1,1; img1,2 ... Img1,l<br />R2 = img2,1; img2,2 ... Img2,m<br />R3 = img3,1; img3,2 ... Img3,n<br />Borda´s count<br /><ul><li> Positional Method, very easy to compute
  10. 10. Each query result Ri is a voter and Images imgj are candidates:</li></ul>Foreachcandidate imgj in Ri<br />Si(imgj) = number of candidates ranked below imgjin Ri.<br />Output: imgj ordered by S(imgj) value<br />Rcontext-based= img1; img2 ... Imgp<br />
  11. 11. Enriching DBpedia with Multimedia<br />10<br />Garcia-Silva et al. <br />Generate tag-based ranking<br />Aggregate <br />List of images<br />L= R0ᴜ R1ᴜ R2ᴜ R3<br />Rtag-based= img1; img2 ... Imgq<br />1) Measuring relatedness between a DBpedia resource and an image: <br /> - Overlapping of terms between the context of the former and the tags of the latter.<br />2) Vector Space Model to represent the DBpedia resource and images:<br /> - TF as weighting scheme, <br /> - cosine function to measure similarity<br />3) Generate ranking of images according to the similarity value<br />Rcontext-based= img1; img2 ... Imgp<br />Rfinal= img1; img2 ... Imgl<br />Rtag-based= img1; img2 ... Imgq<br />
  12. 12. Experiments<br />How many context words do produce the best results?<br />11<br />Apple context: «juice, fruit, apples, capital, michigan, orange»<br />Garcia-Silva et al. <br />
  13. 13. Experiments<br />Ambiguity<br />Search engines work well:<br />unambiguous names<br />ambiguous names referring a dominant sense e.g., dbpedia:Stonehenge<br />However they fail for ambiguous names:<br />Lacking of a dominant sensee.g.: dbpedia:Apple<br />When they do not refer to the dominant sense<br /> e.g.: dbpedia:Blackberry<br />12<br />Garcia-Silva et al. <br />
  14. 14. Experiments<br />Dominance:<br />Dataset:<br />10 Classes and 15 dbpr randomly selected per each class<br />Each dbpr must be: 1) popular, 2) have a dominance under 0.7 <br />We found dbpr for Mammals, Birds and Insects<br />Increasing the dominance limit to 0.9 we found dbpr for the rest of classes. <br />13<br />Garcia-Silva et al. <br />
  15. 15. Experiments<br />15 people evaluate the results of three approaches<br />Each image was rated by 3 evaluators<br />14<br />Garcia-Silva et al. <br />
  16. 16. Experiments<br />15<br />Garcia-Silva et al. <br />
  17. 17. Conclusions<br />Multipedia an approach to automatically populate an ontology with images related to existing instances<br />We focused on the particularly challenging problem of ambiguity in instance names<br />Human-driven evaluation of the approach involving 15 users and a total of 2250 image ratings containing DBpedia resources from several classes.<br />A variation of Multipedia improves average precision by 9.4% over a baseline of keyword queries to commercial image search engines<br />We have validated that in contrast to the baseline our approach achieves the highest precision with ambiguous names lacking a dominant sense.<br />16<br />Garcia-Silva et al. <br />