Index   Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks




        Improvi...
Index       Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks




           ...
Index       Introduction           Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Why fo...
Index       Introduction           Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Why fo...
Index       Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Why folksonomi...
Index       Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Why folksonomi...
Index       Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Why folksonomi...
Index       Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Why folksonomi...
Index       Introduction       Folksonomies       Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Augmen...
Index       Introduction       Folksonomies       Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Augmen...
Index        Introduction        Folksonomies       Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Mult...
Index        Introduction        Folksonomies       Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Mult...
Index        Introduction        Folksonomies       Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Mult...
How Flickrpedia works

             German user


           enters the query in Flickrpedia




                         ...
The web page box for “alternate languages” in Wikipedia
An example: the German word ‘Flugzeug’
Index        Introduction        Folksonomies       Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Mult...
Don’t trust me: try by yourself!
Word searched: ‘Flugzeug’, i.e. airplane in German




     http://buffy.sciva.uninsubria...
Index      Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks




Flickrpedia ...
Index      Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks




Further dire...
Index      Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks




Beyond Flick...
Index   Introduction    Folksonomies        Linguistic issues      Introducing Flickrpedia   Concluding Remarks




Thank ...
Upcoming SlideShare
Loading in …5
×

Improving Flickr discovery through Wikipedias

1,370 views

Published on

Position paper presented at the "Between Ontologies and Folksonomies" (BOF) workshop at CCT2007.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,370
On SlideShare
0
From Embeds
0
Number of Embeds
30
Actions
Shares
0
Downloads
40
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Improving Flickr discovery through Wikipedias

  1. 1. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Improving Flickr discovery through Wikipedias Federico Gobbo {federico.gobbo}@uninsubria.it Universit` degli Studi dell’Insubria a Varese, Italy (cc) Some rights reserved. 1/21
  2. 2. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Introduction 1 Why folksonomies are interesting Folksonomies 2 Why folksonomies differ? Linguistic issues 3 Augmented folksonomies through natural language Introducing Flickrpedia 4 Multilingual diversity as the source of knowledge Concluding Remarks 5 2/21
  3. 3. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Why folksonomies are interesting A key question of information retrieval today How to add meaningful metadata to web content, in order to increase the utility of information by improve the precision of information retrieval to search engines? 3/21
  4. 4. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Why folksonomies are interesting Folksonomies, a tentative answer. What are they? folksonomy = folks + taxonomy A folksonomy is made by tags or labels, usually single-word metadata attached to online items (documents, photos, videos, etc.), in order to add contextual meaning to the items themselves. Folksonomies are a tentative effort toward the goal of improving the precision of information retrieval. 4/21
  5. 5. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Why folksonomies differ? Folksonomies and traditional taxonomies Unlike traditional taxonomies, there is no explicit hierarchy between tags nor tags are exclusive. For example, the photo of a cat may be tagged as ‘cat’ and ‘european’ and ‘animal’, but there is nothing that say that all cats are animals: tags can be seen as common facets of the item itself (Schmitz 2006). There is no central authority, and this is the main reason why folksonomies are becoming more and more popular among web resource users. 5/21
  6. 6. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Why folksonomies differ? The two different scopes of folksonomies Each tag has two different scopes at the same time: personimy, the user’s defined one (Quintarelli 2005); consensus, the social shared meaning. Consensus is becoming more and more important, as the wide use of tag suggestion interfaces in web applications suggests. 6/21
  7. 7. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Why folksonomies differ? Folksonomies and the Long Tail (see the video!) 7/21
  8. 8. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Why folksonomies differ? The key concept of serendipity Consensus permits serendipity, i.e. users dig the web through tags finding new, unexpected and useful content, not easily accessible via traditional search engines. Tags are used as filters, i.e. a query on more tags returns the items tagged with any of the given tags – or with all tags, depending on the application (Golder and Huberman 2006). The purpose of this paper is to improve serendipity allowing people to dig folksonomies regardless of the natural language(s) they master. 8/21
  9. 9. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Augmented folksonomies through natural language Tags as linguistic objects Tags are words, i.e. alphabetical strings meaningful in some natural language. There is no controlled language. In particular, features unrecognized are: synonymity (different word strings, analogue meaning); homography (identical word string, totally different meaning); different strategies in encoding are possibles (e.g. ‘28-03-2008’, ‘2008March3’, ‘3rd March 2008’); misspellings are very frequent, so standard NLP techniques are banned. Guy and Tonkin (2006) even advocated tag literacy education. 9/21
  10. 10. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Augmented folksonomies through natural language The linguistic divide in folksonomies Multilingualism is an issue not fully explored yet in folksonomies. In fact, tags are written in a human language and users are inclined to write in the languages they are comfortable in. It is certainly desiderable for a user not comfortable in English or other big language (in terms of presence in the web) to search and find tags using a search engine interface in his or her tongue, while the engine searches the corresponding tags in English and in other major human languages. 10/21
  11. 11. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Multilingual diversity as the source of knowledge How to overcome the linguistic divide? A proposal: through a special web application which extracts the pairs language-tags in every available language before passing the tags to the folksonomy search engine. The claim is improvement in serendipity: when searching in 20 natural languages at the same time, some interesting data will be found, undiscovered through a single language search. 11/21
  12. 12. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Multilingual diversity as the source of knowledge Flickr and its API Flickr is one of the most popular web applications for photos (+2 million photos are found if ‘flowers’ are searched, nowadays). Photos are freely tagged by users, so it can be considered a folksonomy. Open source APIs in major programming languages are available and people can make queries to the Flickr repository through an authentication key given on request. http://www.flickr.com/services/api 12/21
  13. 13. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Multilingual diversity as the source of knowledge Flickrpedia = Flickr + Wikipedias Flickrpedia is designed on an API in Ruby and over development framework Ruby on Rails (Thomas 2005, Thomas and Heinemeier-Hansson 2005). Users can make queries in Flickr writing a tag specifying its natural language. The system crawls the Wikipedia in the corresponding language and look for an appropriate page. With the help of regular expressions, Flickrpedia parses the web page and extracts the existing language pairs of the same topic in other languages from the appropriate web page box. 13/21
  14. 14. How Flickrpedia works German user enters the query in Flickrpedia the system Flugzeug German crawls parsing with the help of regular expressions Airplane Avion Hegazkin ... English French basque the German user obtains the desidered photos from Flickr!
  15. 15. The web page box for “alternate languages” in Wikipedia An example: the German word ‘Flugzeug’
  16. 16. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Multilingual diversity as the source of knowledge The results of the German word ‘Flugzeug’ At 2007, April, 11, Flickr finds less than 10,000 photos while Flickrpedia more than 20,000 for the same query, giving a lot of unexpected and relevant photos. 16/21
  17. 17. Don’t trust me: try by yourself! Word searched: ‘Flugzeug’, i.e. airplane in German http://buffy.sciva.uninsubria.it/∼rl608838/search
  18. 18. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Flickrpedia until now Flickrpedia should only store the wikipedias according to the existing natural languages – actually, 85. Large and extemporaneus shared information repositories, like Flickr, can be managed through other semi-structured information repositories as the wikipedias. Flickrpedia, if refined out of its actual prototypical phase, may help users with poor knowledge of major languages to retrieve information only through their lesser-used languages. 18/21
  19. 19. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Further direction of Flickrpedia Flickrpedia is far from perfect: homographies are still unmanaged, even if wikipedias have disambiguating pages, and it is not clear which wikipedias to choose in order to optimize serendipity. By now the parsed wikipedias are the biggest ones in terms of wiki pages, but this doesn’t give any guarantee of serendipity augmentation. Finally, the API given by Flickr is a severe limit: up to 20 tags can be inserted in a single query request, and up to 60 thumbnails may be given. 19/21
  20. 20. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Beyond Flickrpedia This approach isn’t limited to Flickr as the underlying folksonomy. Our research direction is towards generalization, i.e. users can choose the appropriate folksonomy performing multilingual queries. It is still to demonstrate how to apply this approach to folksonomies where the semantic references are different from photos, i.e. an airplane or a flower is still so in almost every human language, more or less. The real underlying problem is how to measure serendipity, i.e. specific and precise metrics for serendipity are needed. 20/21
  21. 21. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Thank you. Any questions? Download these slides at the following permalink: http://purl.org/net/fgobbo (cc) F. Gobbo 2007. Published in Italy. Attribuzione – Non commerciale – Condividi allo stesso modo 2.5 21/21

×