Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones - Presentation Transcript

    1. Research & Development Text vs. Speech A Comparison of Tagging Input Modalities for Camera Phones Mauro Cherubini, Xavier Anguera, Nuria Oliver, and Rodrigo de Oliveira
    2. people do not want to tag their pictures intro → hypotheses → methodology → results → implications
    3. research question: Assuming that users are willing to input at least one tag, which input modality can help the production and retrieval of the pictures? intro → hypotheses → methodology → results → implications
    4. hypothesis 1 Speech is preferred to text as an annotation mechanism on mobile phones (objective measure) Support: - Mitchard and Winkles (2002) intro → hypotheses → methodology → results → implications
    5. hypothesis 1-bis Speech annotations are preferred by users even if this means spending more time on the task (subjective measure) Support: - Perakakis and Potamianos (2008) intro → hypotheses → methodology → results → implications
    6. hypothesis 2 The longer the tag the larger the advantage of voice over text for annotating pictures on mobile phones Support: - Hauptmann and Rudnicky (1990) intro → hypotheses → methodology → results → implications
    7. hypothesis 3 Retrieving pictures on mobile phones with speech is not faster than with text (objective measure) Support: - Mills et al. (2000) intro → hypotheses → methodology → results → implications
    8. the user study field study controlled (4 weeks) experiment T1 - T2 - T3 - T4 3 experimental conditions: a. Speech only b. Text only c. Speech and Text intro → hypotheses → methodology → results → implications
    9. MAMI intro → hypotheses → methodology → results → implications
    10. features of MAMI •  processing is done entirely on the mobile phone •  speech is not transcribed •  to compare the waveforms of the audio tags, MAMI uses algorithm of Dynamic Time Warping intro → hypotheses → methodology → results → implications
    11. task 1: remember the tag stimulus retrieval Pictures taken during the field trial intro → hypotheses → methodology → results → implications
    12. task 2: remember the context stimulus retrieval TASK 2 PICTURE 1 three little bushes Garden Tree Stairs intro → hypotheses → methodology → results → implications
    13. task 3: remember the picture stimulus retrieval Text Audio tags were converted into textual tags and vice versa intro → hypotheses → methodology → results → implications
    14. task 4: remember the sequence assignment retrieval TASK 4 Three pictures among the oldest and three pictures among the newest. intro → hypotheses → methodology → results → implications
    15. metrics •  time to completion •  false positives •  retrieval errors intro → hypotheses → methodology → results → implications
    16. results H1 intro → hypotheses → methodology → results → implications
    17. results H1-bis All participants in the BOTH group felt that tagging with text was more effective than tagging with voice. Voice: 3.33 [0.81], Text: 4.34 [0.81] (Mean [SD]) 1 = completely agree; 5 = completely disagree intro → hypotheses → methodology → results → implications
    18. results H2 intro → hypotheses → methodology → results → implications
    19. results H3 intro → hypotheses → methodology → results → implications
    20. results H3 - continued
    21. take away 1:  speech is not a given the advantage of audio as an input modality for tagging pictures on mobile phones is not a given why? 1. retrieval precision 2. privacy intro → hypotheses → methodology → results → implications
    22. take away 2:  input mistakes we address text input mistakes immediately. on the contrary mistakes in audio recordings are less frequently addressed intro → hypotheses → methodology → results → implications
    23. take away 3:  memory speech does not help memorizing the tags intro → hypotheses → methodology → results → implications
    24. implication 1: allow multiple modalities © Pixar, 2008 intro → hypotheses → methodology → results → implications
    25. implication 2: enable audio inspection intro → hypotheses → methodology → results → implications
    26. implication 3:  enable modality synesthesia © Disney, 1940 intro → hypotheses → methodology → results → implications
    27. Research & Development end thanks martigan@gmail.com mauro@tid.es http://www.i-cherubini.it/mauro/blog/ http://research.tid.es/multimedia/
    SlideShare Zeitgeist 2009

    + Mauro CherubiniMauro Cherubini Nominate

    custom

    332 views, 0 favs, 0 embeds more stats

    Speech and typed text are two common input modaliti more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 332
      • 332 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 0
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories