Tell me why! ain't nothin' but a mistake  describing media item differences with media fragments uri and speech synthesis
Upcoming SlideShare
Loading in...5
×
 

Tell me why! ain't nothin' but a mistake describing media item differences with media fragments uri and speech synthesis

on

  • 717 views

Tell me why! ain't nothin' but a mistake describing media item differences with media fragments uri and speech synthesis

Tell me why! ain't nothin' but a mistake describing media item differences with media fragments uri and speech synthesis

Statistics

Views

Total Views
717
Views on SlideShare
507
Embed Views
210

Actions

Likes
0
Downloads
2
Comments
0

3 Embeds 210

http://mediamixer.eu 169
http://www.mediamixer.eu 39
http://translate.googleusercontent.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Tell me why! ain't nothin' but a mistake  describing media item differences with media fragments uri and speech synthesis Tell me why! ain't nothin' but a mistake describing media item differences with media fragments uri and speech synthesis Presentation Transcript

    • Tell me why! Ain't nothin' but a mistake? Describing Media Item Differences with Media Fragments URI and Speech Synthesis Thomas Steiner (tomac@google.com, @tomayac) Raphaël Troncy (raphael.troncy@eurecom.fr, @rtroncy) http://www.ourprg.com/wp-content/uploads/2013/03/wallpapers ru corvuscorax 2560x1440 chelyabinskiy meteor.jpg
    • Introduction Context of this work: ● Event summarization based on multimedia data shared publicly on social networks. ● Developed an application that auto-generates media galleries.
    • Media gallery creation steps 1) Extract media items from multiple social networks [Rizzo2012] G. Rizzo, T. Steiner, R. Troncy, R. Verborgh, J.-L. Redondo García, R. Van de Walle. What fresh media are you looking for?: retrieving media items from multiple social networks. In Proceedings of the 2012 international workshop on Socially-aware multimedia, pp. 15–20, 2012
    • Media gallery creation steps (cont.) 2) Deduplicate visually similar media items [Steiner2013_1] Thomas Steiner, Ruben Verborgh, Joaquim Gabarró Vallés, and Rik Van de Walle. Near- duplicate Photo Deduplication in Event Media Shared on Social Networks. In Proceedings of the International Conference on Advanced IT, Engineering and Management, 2013
    • Media gallery creation steps (cont.) 3) Rank media item clusters [Steiner2013_2] Thomas Steiner. A Meteoroid on Steroids: Ranking Media Items Stemming from Multiple Social Networks. In Companion Publication of the IW3C2 WWW 2013 Conference, May 13–17, 2013, Rio de Janeiro, Brazil.
    • Media gallery creation steps (cont.) 4) Compile media galleries [Steiner2012_1] T Steiner, R Verborgh, J Gabarro, R Van de Walle. Defining aesthetic principles for automatic media gallery layout for visual and audial event summarization based on social networks. In Quality of Multimedia Experience (QoMEX), 2012 Fourth International Workshop on, 2012 [Steiner2013_3] Thomas Steiner and Christopher Chedeau. To Crop, Or Not to Crop: Compiling Online Media Galleries. In Companion Publication of the IW3C2 WWW 2013 Conference, May 13–17, 2013, Rio de Janeiro, Brazil
    • Research Question "Given a complex algorithm like a media item clustering algorithm, can we use Media Fragments URIs together with speech synthesis to describe the algorithm's results rationales?" ● Human raters that evaluate algorithm results are non-experts. ● Can help algorithm developers improve the algorithms. ● Generalization potential for the proof-of-concept.
    • Media Fragments URIs A media item tile is a spatial media fragment xywh.js—Polyfill for spatial media fragments <img src="kitten.jpg#xywh=100,100,50,50"/> <img src="kitten.jpg#xywh=pixel:100,100,50,50"/> <img src="kitten.jpg#xywh=percent:25,25,50,50"/> Available as open source on GitHub: https://github.com/tomayac/xywh.js
    • Media Fragments URIs (cont.) Using a tile-wise average-histogram-based media item deduplication algorithm with face detection. Makes use of Media Fragments URIs [Troncy2012] to make semantic statements about fragments of media items: @base <http://example.org/> . @prefix ma: <http://www.w3.org/ns/ma-ont> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix db: <http://dbpedia.org/resource/> . @prefix dbo: <http://dbpedia.org/ontology/> . @prefix col: <http://purl.org/colors/rgb/> . <video> a ma:MediaResource . <video#t=,10&xywh=0,0,30,40> a ma:MediaFragment ; foaf:depicts db:Face . <video#t=,10&xywh=0,0,10,10> a ma:MediaFragment ; dbo:colour col:f00 . [Troncy2012] R. Troncy, E. Mannens, S. Pfeiffer, D. Van Deursen, M. Hausenblas, P. Jagenstedt, J. Jansen, Y. Lafon, C. Parker, and T. Steiner, “Media Fragments URI 1.0 (basic),” Recommendation, W3C, 2012
    • Deduplicating media items Each tile of a media item has its unique URI: ● http://example.org/image.png#xywh=0,0,10,10 We can leverage this fact to make semantic statements about media item similarity, for example, to debug the deduplication algorithm.
    • Deduplicating media items (cont.) Algorithm Matching Conditions Cond. 1: Out of m tiles of a media item with n tiles (m <= n), the average color of at most tiles_threshold tiles may differ not more than similarity_threshold from their counterpart tiles. Cond. 2: The numbers f1 and f2 of detected faces in both media items have to be the same. We note that the algorithm does not recognize faces, but only detects them. Cond. 3: If the average colors of a tile and its counterpart tile are within the black-and-white tolerance bw_tolerance, these tiles are not considered and tiles_threshold is decreased accordingly.
    • Deduplicating media items (cont.) Using a speech synthesizer and speech generation to make spoken statements based on RDF statements about visual similarity of media item tiles. Based on Speak.js (https://github.com/kripken/speak.js)
    • Deduplicating media items (cont.) Human Rater Decisions Clustering Consent: Two or more media items are clustered by the algorithm and the human rater agrees. The human rater wants to understand why they were clustered. Clustering Dissent: Two or more media items are clustered by the algorithm, but the human rater thinks that they should not have been clustered. The human rater wants to understand why they were incorrectly clustered. Non-Clustering Dissent: Two or more media items are not clustered by the algorithm, but the human rater thinks that they should have been clustered. The human rater wants to understand why they were not clustered.
    • Deduplicating media items (cont.) Low-level debug output - Similarity threshold: 15 (Cond. 1) - Tiles threshold: 67 (Cond. 1) - Similar tiles: 52 (Cond. 1) - Faces left: 0. Faces right: 0 (Cond. 2) - BW tolerance: 1 (Cond. 3) - Not considered tiles: 22 (Cond. 3) - Effective tiles threshold: 45 (Cond. 3) Needs to be lifted to normal human language in order to be understandable by non-domain experts.
    • Natural Speech Generation Reiter and Dale [Reiter2000] differentiate three phases of speech generation: Document planning determines the content and structure of a document. Microplanning decides which words, syntactic structures, etc. are used to communicate the chosen content and structure. Realization maps the abstract representations used by microplanning into text. [Reiter2000] E. Reiter and R. Dale, Building Natural Language Generation Systems, Studies in Natural Language Processing. Cambridge University Press, 2000.
    • Natural Speech Generation (cont.) Document Planning: We need to convey the currently selected tiles_threshold and similarity_threshold, the number of detected faces f1 and f2 in each media item, and the number of tiles not considered given the bw_tolerance parameter. Microplanning: We need to decide on a matching condition aspect of the algorithm that will be first highlighted. Afterwards, we need to elaborate on secondary matching conditions such as detected faces and black-and-white tolerance. The grammatical number (plural or singular) needs to be taken into account. The microplanner needs to decide when exactness (e.g., “99% of all tiles”) and when approximation of calculated values (e.g., “roughly 50%”) better suits the human evaluators’ needs. Realization: We need to map the abstract representations used by the microplanning step into text.
    • Natural Speech Generation (cont.) “However, 22 tiles were not considered, as they are either too bright or too dark, which is a common source of clustering issues.”
    • Live Demo Slides: http://bit.ly/icme2013 Demo: http://social-media-illustrator.herokuapp.com This Paper: http://www.lsi.upc.edu/~tsteiner/papers/2013/tell-me-why-aint-nothin-but- a-mistake-describing-media-item-differences-icme2013.pdf Other Papers: http://www2013.org/companion/p31.pdf http://www2013.org/companion/p201.pdf Questions here, or tomac@google.com @tomayac