Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones - Presentation Transcript
Research & Development
Text vs. Speech
A Comparison of Tagging Input Modalities
for Camera Phones
Mauro Cherubini, Xavier Anguera,
Nuria Oliver, and Rodrigo de Oliveira
people do not want to tag
their pictures
intro → hypotheses → methodology → results → implications
research question:
Assuming that users are willing to
input at least one tag, which input
modality can help the production and
retrieval of the pictures?
intro → hypotheses → methodology → results → implications
hypothesis 1
Speech is preferred to text as an
annotation mechanism on mobile
phones (objective measure)
Support:
- Mitchard and Winkles (2002)
intro → hypotheses → methodology → results → implications
hypothesis 1-bis
Speech annotations are preferred by
users even if this means spending more
time on the task (subjective measure)
Support:
- Perakakis and Potamianos (2008)
intro → hypotheses → methodology → results → implications
hypothesis 2
The longer the tag the larger the
advantage of voice over text for
annotating pictures on mobile phones
Support:
- Hauptmann and Rudnicky (1990)
intro → hypotheses → methodology → results → implications
hypothesis 3
Retrieving pictures on mobile phones
with speech is not faster than with text
(objective measure)
Support:
- Mills et al. (2000)
intro → hypotheses → methodology → results → implications
the user study
field study
controlled
(4 weeks)
experiment
T1 - T2 - T3 - T4
3 experimental conditions:
a. Speech only
b. Text only
c. Speech and Text
intro → hypotheses → methodology → results → implications
features of MAMI
• processing is done entirely on the mobile
phone
• speech is not transcribed
• to compare the waveforms of the audio tags,
MAMI uses algorithm of Dynamic Time
Warping
intro → hypotheses → methodology → results → implications
task 1: remember the tag
stimulus
retrieval
Pictures taken during the field trial
intro → hypotheses → methodology → results → implications
task 2: remember the context
stimulus
retrieval
TASK 2
PICTURE 1
three little bushes
Garden
Tree
Stairs
intro → hypotheses → methodology → results → implications
task 3: remember the picture
stimulus
retrieval
Text
Audio tags were converted into
textual tags and vice versa
intro → hypotheses → methodology → results → implications
task 4: remember the
sequence
assignment
retrieval
TASK 4
Three pictures among
the oldest and three
pictures among the
newest.
intro → hypotheses → methodology → results → implications
metrics
• time to completion
• false positives
• retrieval errors
intro → hypotheses → methodology → results → implications
results H1-bis
All participants in the BOTH group felt that tagging
with text was more effective than tagging with voice.
Voice: 3.33 [0.81], Text: 4.34 [0.81] (Mean [SD])
1 = completely agree; 5 = completely disagree
intro → hypotheses → methodology → results → implications
take away 1:
speech is not a given
the advantage of audio as an input modality for tagging
pictures on mobile phones is not a given
why?
1. retrieval precision
2. privacy
intro → hypotheses → methodology → results → implications
take away 2:
input mistakes
we address text input mistakes immediately.
on the contrary mistakes in audio recordings are less
frequently addressed
intro → hypotheses → methodology → results → implications
take away 3:
memory
speech does not help memorizing the tags
intro → hypotheses → methodology → results → implications
Speech and typed text are two common input modaliti more
Speech and typed text are two common input modalities for mobile phones. However, little research has compared them in their ability to support annotation and retrieval of digital pictures on mobile devices. In this paper, we report the results of a month-long field study in which participants took pictures with their camera phones and had the choice of adding annotations using speech, typed text, or both. Subsequently, the same subjects participated in a controlled experiment where they were asked to retrieve images based on annotations as well as retrieve annotations based on images in order to study the ability of each modality to effectively support users' recall of the previously captured pictures. Results demonstrate that each modality has advantages and shortcomings for the production of tags and retrieval of pictures. Several guidelines are suggested when designing tagging applications for portable devices. less
0 comments
Post a comment