Europeana Tech 2011

3,982
-1

Published on

Published in: Business, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,982
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
10
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • From the different research topics I am involved in today I address some of the results from projects in which we use explicit semantics and user interaction to make crowd knowledge effectively processable for machines\n\n\n*****\nFrom Crowd Knowledge to Machine Knowledge: Use cases with semantics and user interaction in Dutch cultural heritage collections\n\nIn this talk I will discuss several projects, for example with the Dutch national archive for sound and vision or with the Rijksmuseum Amsterdam, where we have experimented with semantics-based technologies and user interaction paradigms to provide systems with additional support for users to turn their lay knowledge into machine-processable knowledge. Turning this crowd knowledge into machine knowledge makes the system more intelligent and thus makes them capitalize on the knowledge assets in the crowds of users.\n\nThe problem our system addresses is concerned with making the massive set of interesting multimedia content available in these cultural heritage institutions accessible for a large community of users.  On the one hand, this content is difficult to find for the average user as they have been indexed by experts (curators, art historians, etc) who use a very specific vocabulary that is unknown to the general audience.  On the other hand, these professionals can no longer  cope with the demand for annotation on the ever growing multimedia content in these collections.\n\nOur solution to both these problem is to exploit crowdsourcing on the web, where there is a lot of specific domain knowledge and processing power available that such archives and museums would gladly incorporate. However, turning this knowledge from the mass of lay users into additional intelligence in content management systems is not a simple challenge for several reasons: (1) the lay users use different vocabularies and are also interested in different aspects of the collection items; (2) the collected user metadata needs to be validated for quality and correctness; (3) crowd-sourcing users need to be continuously supported and motivated in order to supply more and better knowledge in such systems.  I will present our data analysis of the crowdsourced content, our techniques for quality control and incorporating domain semantics, and the results of our attempts to employ different interaction strategies and user interfaces (for example games and simplified interfaces for interactive annotation) to engage and stimulate the users.\n\nDemos:\n-----------\nVideo-tagging game: http://woordentikkertje.manbijthond.nl/\nArt recommender and personalized museum tours: http://www.chip-project.org/demo/ (3rd prize winner of ISWC Semantic Web challenge)\nHistorical events in museum collections: http://agora.cs.vu.nl/demo/\n\nRelevant Papers:\n------------------------\n- On the role of user-generated metadata in audio visual collections, at K-CAP2011\nhttp://portal.acm.org/citation.cfm?id=1999702\n\n- Digital Hermeneutics:  Agora and the Online Understanding of Cultural Heritage, at WebSci2011, (Best paper nominee)\nhttp://www.websci11.org/fileadmin/websci/Papers/116_paper.pdf\n\n- The effects of transparency on trust in and acceptance of a content-based art recommender, UMUAI journal, (Best journal paper for 2008)\nhttp://www.springerlink.com/content/81q34u73mpp58u75/\n\n- Recommendations based on semantically enriched museum collections, Journal of Web Semantics\nhttp://www.sciencedirect.com/science/article/pii/S1570826808000681\n\n- Enhancing Content-Based Recommendation with the Task Model of Classification, at EKAW2010\nhttp://www.springerlink.com/content/p78hl5r283x79r13/\n\nShort bio:\n-----------------\nLora Aroyo is an associate professor at the Web and Media group, at the Department of Computer Science, Free University Amsterdam, The Netherlands. Her research interests are in using semantic web technologies for modeling user interests and context, recommendation systems and personalized access in Web-based applications. Typical example domains are cultural heritage collections, multimedia archives and interactive TV. She has coordinated the research work in the CHIP project on Cultural Heritage Information Personalization (http://chip-project.org). Currently she is a scientific coordinator of the EU Integrated Project NoTube dealing with the integration of Web and TV data with the help of semantics (http://notube.tv), a project leader of VU INTERTAIN Experimental Research Lab initiative (http://www.cs.vu.nl/intertain), and involved in the research on motivational user interaction for video-tagging games in the PrestoPrime project (http://www.prestoprime.org/) and modeling\nhistoric events in the Agora project (http://agora.cs.vu.nl/). She has organized numerous workshops in the areas of personalized access to cultural heritage, e-learning, interactive television, as well as on visual interfaces to the social and semantic web (PATCH, FutureTV, PersWeb, VISSW and DeRIVE). Lora has been actively involved in both the Semantic Web (PC co-chair and conference chair for ESWC2009 and ESWC2010 and PC co-chair for ISWC2011) and the Personalization and User modeling communities (on the editorial board for the UMUAI journal and on the steering committee of UMAP conference).\n\nMore information can be found at:\n-----------------------------------------------\nWebpage: http://www.cs.vu.nl/~laroyo\nSlideshare: http://www.slideshare.net/laroyo\nTwitter: @laroyo\n\n
  • Nowadays av collections are undergoing process of transformation from archives of analog material to large digital (online) data stores, as videos are very much wanted by different types of end users. \n\nFor example, the Netherlands Institute of Sound and Vision archives all radio and TV material broadcasted in the Netherlands (has appr. 700,000 hours radio and television programs available online. \n\nFacilitating a successfully access to av collection items demands quality metadata associated with them.\n\nTraditionally, in AV achives it is the task of professional catalogers to manually describe the videos. Usually, in the process they follow well-defined , well-established guidelines and rules. They also may make use of auxiliary materials like controlled vocabularies, thesauri, and such.\nHowever, as we all know video is medium that is extremely rich in meaning. Directors and screenwriters create entire universes with complex interplay between characters, objects and events. Sometimes they may employ rich and complex abstract symbolic language. This makes that task of describing the meaning of a video as complicated as describing the real worlds. Which is no trivial matter.\nAs a result the process of annotation is tedious, time-consuming and inevitably incomplete. According to some research, it takes approximately 5 times of the duration of the material to annotate it completely. So for example, if we are talking about a documentary that lasts one hour, it will take approximately 5 hours for a cataloger to fully describe it. Furthermore,\n\nConsequently, professional annotations are coarse-grained in a sense that they are referring to the entire video describing prevalent topics. It may happen that catalogers provide more fine-grained, shot-level descriptions for a video. But this is exception of the rule and it is reserved to the most important pieces of the AV collection.\n
  • Nowadays av collections are undergoing process of transformation from archives of analog material to large digital (online) data stores, as videos are very much wanted by different types of end users. \n\nFor example, the Netherlands Institute of Sound and Vision archives all radio and TV material broadcasted in the Netherlands (has appr. 700,000 hours radio and television programs available online. \n\nFacilitating a successfully access to av collection items demands quality metadata associated with them.\n\nTraditionally, in AV achives it is the task of professional catalogers to manually describe the videos. Usually, in the process they follow well-defined , well-established guidelines and rules. They also may make use of auxiliary materials like controlled vocabularies, thesauri, and such.\nHowever, as we all know video is medium that is extremely rich in meaning. Directors and screenwriters create entire universes with complex interplay between characters, objects and events. Sometimes they may employ rich and complex abstract symbolic language. This makes that task of describing the meaning of a video as complicated as describing the real worlds. Which is no trivial matter.\nAs a result the process of annotation is tedious, time-consuming and inevitably incomplete. According to some research, it takes approximately 5 times of the duration of the material to annotate it completely. So for example, if we are talking about a documentary that lasts one hour, it will take approximately 5 hours for a cataloger to fully describe it. Furthermore,\n\nConsequently, professional annotations are coarse-grained in a sense that they are referring to the entire video describing prevalent topics. It may happen that catalogers provide more fine-grained, shot-level descriptions for a video. But this is exception of the rule and it is reserved to the most important pieces of the AV collection.\n
  • Our solution to both these problem is to exploit crowdsourcing on the web, where there is a lot of specific domain knowledge and processing power available that such archives and museums would gladly incorporate. However, turning this knowledge from the mass of lay users into additional intelligence in content management systems is not a simple challenge for several reasons: (1) the lay users use different vocabularies and are also interested in different aspects of the collection items; (2) the collected user metadata needs to be validated for quality and correctness; (3) crowd-sourcing users need to be continuously supported and motivated in order to supply more and better knowledge in such systems.  I will present our data analysis of the crowdsourced content, our techniques for quality control and incorporating domain semantics, and the results of our attempts to employ different interaction strategies and user interfaces (for example games and simplified interfaces for interactive annotation) to engage and stimulate the users.\n\nunderstanding the user-generated data\ncontextualize the user-generated metadata \n\n\n
  • Our solution to both these problem is to exploit crowdsourcing on the web, where there is a lot of specific domain knowledge and processing power available that such archives and museums would gladly incorporate. However, turning this knowledge from the mass of lay users into additional intelligence in content management systems is not a simple challenge for several reasons: (1) the lay users use different vocabularies and are also interested in different aspects of the collection items; (2) the collected user metadata needs to be validated for quality and correctness; (3) crowd-sourcing users need to be continuously supported and motivated in order to supply more and better knowledge in such systems.  I will present our data analysis of the crowdsourced content, our techniques for quality control and incorporating domain semantics, and the results of our attempts to employ different interaction strategies and user interfaces (for example games and simplified interfaces for interactive annotation) to engage and stimulate the users.\n\nunderstanding the user-generated data\ncontextualize the user-generated metadata \n\n\n
  • Our solution to both these problem is to exploit crowdsourcing on the web, where there is a lot of specific domain knowledge and processing power available that such archives and museums would gladly incorporate. However, turning this knowledge from the mass of lay users into additional intelligence in content management systems is not a simple challenge for several reasons: (1) the lay users use different vocabularies and are also interested in different aspects of the collection items; (2) the collected user metadata needs to be validated for quality and correctness; (3) crowd-sourcing users need to be continuously supported and motivated in order to supply more and better knowledge in such systems.  I will present our data analysis of the crowdsourced content, our techniques for quality control and incorporating domain semantics, and the results of our attempts to employ different interaction strategies and user interfaces (for example games and simplified interfaces for interactive annotation) to engage and stimulate the users.\n\nunderstanding the user-generated data\ncontextualize the user-generated metadata \n\n\n
  • Our solution to both these problem is to exploit crowdsourcing on the web, where there is a lot of specific domain knowledge and processing power available that such archives and museums would gladly incorporate. However, turning this knowledge from the mass of lay users into additional intelligence in content management systems is not a simple challenge for several reasons: (1) the lay users use different vocabularies and are also interested in different aspects of the collection items; (2) the collected user metadata needs to be validated for quality and correctness; (3) crowd-sourcing users need to be continuously supported and motivated in order to supply more and better knowledge in such systems.  I will present our data analysis of the crowdsourced content, our techniques for quality control and incorporating domain semantics, and the results of our attempts to employ different interaction strategies and user interfaces (for example games and simplified interfaces for interactive annotation) to engage and stimulate the users.\n\nunderstanding the user-generated data\ncontextualize the user-generated metadata \n\n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • - group by type (faceted)\n- concept is uniquely identified (see full name for Juliana and Bernhard)\n- background information\n
  • - group by type (faceted)\n- concept is uniquely identified (see full name for Juliana and Bernhard)\n- background information\n
  • - group by type (faceted)\n- concept is uniquely identified (see full name for Juliana and Bernhard)\n- background information\n
  • - group by type (faceted)\n- concept is uniquely identified (see full name for Juliana and Bernhard)\n- background information\n
  • - group by type (faceted)\n- concept is uniquely identified (see full name for Juliana and Bernhard)\n- background information\n
  • - group by type (faceted)\n- concept is uniquely identified (see full name for Juliana and Bernhard)\n- background information\n
  • - group by type (faceted)\n- concept is uniquely identified (see full name for Juliana and Bernhard)\n- background information\n
  • - group by type (faceted)\n- concept is uniquely identified (see full name for Juliana and Bernhard)\n- background information\n
  • - group by type (faceted)\n- concept is uniquely identified (see full name for Juliana and Bernhard)\n- background information\n
  • - group by type (faceted)\n- concept is uniquely identified (see full name for Juliana and Bernhard)\n- background information\n
  • \n
  • Europeana Tech 2011

    1. 1. Erwin Verbruggen Jacco van Ossenbruggen Lora Aroyo Johan Oomen COLLECTING AND MANAGING USER-GENERATED METADATA Maarten BrinkerinkGuus Schreiber Riste Gligorov Lotte Baltussen Michiel Hildebrand
    2. 2. VIDEO CONTENT ANNOTATION
    3. 3. VIDEO CONTENT ANNOTATIONtime consuming5 times the duration of the videocoarse grainedprofessional vocabulary
    4. 4. VIDEO SEARCH BEHAVIORTodays and Tomorrows Retrieval Practice in the Audiovisual ArchiveBouke Huurnink et al. ACM International Conference on Image and Video Retrieval 2010
    5. 5. VIDEO SEARCH BEHAVIOR people buy fragments broadcast (33%), stories (17%), fragments(49%) user vocabulary 35% of clicked results not found by title or termTodays and Tomorrows Retrieval Practice in the Audiovisual ArchiveBouke Huurnink et al. ACM International Conference on Image and Video Retrieval 2010
    6. 6. improve support to find fragments
    7. 7. time-based annotations in a user vocabulary
    8. 8. crowdsourcing?
    9. 9. Winner EuroITV Competition Best Archives on the Web AwardEmerging Practices in the Cultural Heritage Domain - Social Tagging of Audiovisual HeritageJohan Oomen, Lotte Belice Baltussen, et al. Web science conference 2010
    10. 10. Labeling images with a computer gameLuis von Ahn and Laura Dabbish. SIGCHI conference on Human factors in computing systems 2004
    11. 11. Pilot I Pilot IImonths 8 2videos 612 1.544players 2.000 438tags 420.000 172.000
    12. 12. westminster abbey abbey priester geestelijken Hyde hye park beefeater bernhard hek paarden tocht aankomst kerk intocht engeland koets kroning mensenmassa parade juliana koning kroon stoet regenOn the Role of User-generated Metadata in Audio Visual CollectionsRiste Gligorov, Michiel Hildebrand et al. KCAP International Conference on Knowledge Capture 2011
    13. 13. westminster abbey abbey priester geestelijken Hyde hye park beefeater bernhard bernhard time-based hek paarden tocht aankomst kerk intocht engeland koets kroning mensenmassa parade juliana koning kroon stoet regenOn the Role of User-generated Metadata in Audio Visual CollectionsRiste Gligorov, Michiel Hildebrand et al. KCAP International Conference on Knowledge Capture 2011
    14. 14. westminster abbey abbey priester geestelijken Hyde hye park user vocabulary beefeater 8% in professional vocabulary bernhard 23% in Dutch lexicon hek paarden 89% found on google tocht aankomst kerk intocht engeland koets kroning mensenmassa parade juliana koning kroon stoet regenOn the Role of User-generated Metadata in Audio Visual CollectionsRiste Gligorov, Michiel Hildebrand et al. KCAP International Conference on Knowledge Capture 2011
    15. 15. westminster abbey westminster abbey abbey abbey priester priester geestelijken geestelijken Hyde hye park user vocabulary beefeater beefeater 8% in professional vocabulary bernhard 23% in Dutch lexicon hek hek paarden paarden 89% found on google tocht tocht aankomst aankomst kerk kerk intocht intocht objects (57%) engeland koets koets kroning kroning mensenmassa mensenmassa parade parade juliana koning koning kroon kroon stoet stoet regen regenOn the Role of User-generated Metadata in Audio Visual CollectionsRiste Gligorov, Michiel Hildebrand et al. KCAP International Conference on Knowledge Capture 2011
    16. 16. westminster abbey abbey priester geestelijken Hyde hye park user vocabulary beefeater 8% in professional vocabulary bernhard bernhard 23% in Dutch lexicon hek paarden 89% found on google tocht aankomst kerk intocht objects (57%) engeland koets persons (31%) kroning mensenmassa parade juliana juliana koning kroon stoet regenOn the Role of User-generated Metadata in Audio Visual CollectionsRiste Gligorov, Michiel Hildebrand et al. KCAP International Conference on Knowledge Capture 2011
    17. 17. westminster abbey abbey priester geestelijken Hyde hye park user vocabulary beefeater 8% in professional vocabulary bernhard 23% in Dutch lexicon hek paarden 89% found on google tocht aankomst kerk intocht objects (57%) engeland engeland koets persons (31%) kroning mensenmassa parade locations (7%) juliana koning kroon stoet regenOn the Role of User-generated Metadata in Audio Visual CollectionsRiste Gligorov, Michiel Hildebrand et al. KCAP International Conference on Knowledge Capture 2011
    18. 18. westminster abbey westminster abbey abbey abbey priester priester geestelijken geestelijken Hyde Hyde hye park hye park user vocabulary beefeater beefeater 8% in professional vocabulary bernhard bernhard 23% in Dutch lexicon hek hek paarden 89% found on google paarden tocht tocht aankomst aankomst kerk kerk intocht objects (57%) no events intocht engeland engeland koets koets persons (31%) no scenes kroning kroning mensenmassa mensenmassa parade parade locations (7%) juliana juliana koning koning kroon kroon stoet stoet regen regenOn the Role of User-generated Metadata in Audio Visual CollectionsRiste Gligorov, Michiel Hildebrand et al. KCAP International Conference on Knowledge Capture 2011
    19. 19. westminster abbey westminster abbey abbey abbey priester priester geestelijken geestelijken Hyde Hyde hye park hye park beefeater beefeater bernhard bernhard “just” tags hek hek paarden paarden tocht tocht aankomst aankomst kerk kerk intocht intocht engeland engeland koets koets kroning kroning mensenmassa mensenmassa parade parade juliana juliana koning koning kroon kroon stoet stoet regen regenOn the Role of User-generated Metadata in Audio Visual CollectionsRiste Gligorov, Michiel Hildebrand et al. KCAP International Conference on Knowledge Capture 2011
    20. 20. westminster abbey abbey priester geestelijken Hyde hye park hye park beefeater beefeater bernhard “just” tags bernhard hek hek paarden paarden tocht tocht aankomst aankomst typos kerk kerk intocht intocht engeland engeland koets koets kroning kroning mensenmassa mensenmassa parade parade juliana juliana koning koning kroon kroon stoet stoet regen regenOn the Role of User-generated Metadata in Audio Visual CollectionsRiste Gligorov, Michiel Hildebrand et al. KCAP International Conference on Knowledge Capture 2011
    21. 21. westminster abbey abbey priester geestelijken Hyde hye park beefeater bernhard “just” tags bernhard hek hek paarden paarden tocht tocht aankomst aankomst typos kerk kerk intocht intocht engeland engeland no unique reference koets koets kroning kroning mensenmassa mensenmassa parade parade juliana juliana koning koning kroon kroon stoet stoet regen regenOn the Role of User-generated Metadata in Audio Visual CollectionsRiste Gligorov, Michiel Hildebrand et al. KCAP International Conference on Knowledge Capture 2011
    22. 22. westminster abbey abbey priester geestelijken Hyde hye park beefeater beefeater bernhard “just” tags bernhard hek hek paarden paarden tocht tocht aankomst aankomst typos kerk kerk intocht intocht engeland engeland no unique reference koets koets kroning kroning mensenmassa no synonym mensenmassa parade parade juliana juliana koning koning kroon kroon stoet stoet regen regenOn the Role of User-generated Metadata in Audio Visual CollectionsRiste Gligorov, Michiel Hildebrand et al. KCAP International Conference on Knowledge Capture 2011
    23. 23. Linking user-generated video annotations to the web of dataMichiel Hildebrand and Jacco van Ossenbruggen, International conference on multimedia modelling 2012
    24. 24. Linking user-generated video annotations to the web of dataMichiel Hildebrand and Jacco van Ossenbruggen, International conference on multimedia modelling 2012
    25. 25. Linking user-generated video annotations to the web of dataMichiel Hildebrand and Jacco van Ossenbruggen, International conference on multimedia modelling 2012
    26. 26. grouped by type (faceted)
    27. 27. concepts uniquely identified
    28. 28. backgroundinformation
    29. 29. Would you include user-generated metadata in your collection? no maybe yeswhy not? waisda.nl what (quality) criteria are important to you?

    ×