65 - An Empirical Simulation-based Study of Real-Time Speech Translation for Multilingual Global Project Teams

An Empirical Simulation-based Study of
Real-Time Speech Translation for
Multilingual Global Project Teams
Fabio Calefato
Filippo Lanubile
University of Bari, Italy
Rafael Prikladnicki
João Henrique Stocker Pinto
PUCRS, Brazil
ESEM'14 - Turin, Sept. 18-19, 2014 1

Motivation
• Global software projects challenged by
language differences
– Especially requirements meetings
• Speech translation technology for remote
meetings in countries with
– Opportunities for global projects
– Lack of English speaking professionals
• Goal:
Evaluate the feasibility of adopting a real-time
speech translation to support multilingual
requirements meetings
ESEM'14 - Turin, Sept. 18-19, 2014 2

Speech Translation
Speech Recognition +
• First prototypes date back
to early 70s
– Appropriate for dictation
only, not for real-time
captioning of speech
YET
– Recent progress,
especially with mobile
devices
– Need for further
investigation
Machine Translation
• First prototypes date back
to 50s
– Still far from 100% accurate
for multilingual group
communication
YET
– Not disruptive of the
conversation flow
– Does not prevent complex
tasks completion
– Even grants more balanced
discussions

Research questions
• RQ1: How well does speech translation work
for continuous speech in global software
projects?
• RQ2: How does technical jargon affect speech
translation in global software projects?
ESEM'14 - Turin, Sept. 18-19, 2014 4

Simulation-based study
• 8 Participants
– Software engineering professionals
– 4 from Bari (Italy) and 4 from Porto Alegre (Brazil)
– 7 males, 1 female
ESEM'14 - Turin, Sept. 18-19, 2014 5

Instrumentation
• 60 sentences from 5 requirements workshop logs
– Half containing jargon
– Half generic
– Increasing length (# words 5-30)
• Manually translated EN -> IT, PT
• Google Chrome Web Speech API Demo + Google
Translate
ESEM'14 - Turin, Sept. 18-19, 2014 6
Speech transcript
(IT / PT)
Resulting
translation
(IT / PT / EN)
Speech transcript
(IT / PT)

Variables for
Speech Recognition
• Independent
1. Source Language (IT, PT)
2. Speaker (Speaker1-4, nested under Language)
3. Lexicon (generic, jargon)
4. Replication (R1-R30, nested under Lexicon)
• Dependent
1. Transcript accuracy
′ = 푇 푎푐푐 + 1 2
ESEM'14 - Turin, Sept. 18-19, 2014 7
푇푎푐푐 =
# 푟푒푐표푔푛푖푧푒푑 푤표푟푑푠 − # 푒푟푟표푟푠
# 푤표푟푑푠 푖푛 푢푡푡푒푟푎푛푐푒
푇푎푐푐

Variables for
Machine Translation
• Independent
1. Language Pairs
(ITEN, PTIT, PTEN, ITPT)
2. Lexicon (generic, jargon)
3. Replication (R1-R30, nested under Lexicon)
• Dependent
1. Translation adequacy
ESEM'14 - Turin, Sept. 18-19, 2014 8

Translation Adequacy
scoring scheme
Category Description
4
Completely adequate. The translation clearly reflects the information
contained in the original sentence. It is perfectly clear, intelligible,
grammatically correct, and reads like ordinary tex.
3
Fairly adequate. The translation generally reflects the information
contained in the original sentence, despite some inaccuracies or
infelicities in the text. It is generally clear and intelligible, and one can
(almost) immediately understand what it means.
2
Poorly adequate. The translation poorly reflects the information
contained in the original sentence. It contains grammatical errors and/or
poor word choices. The general idea of the text is intelligible only after
considerable study.
1
Completely adequate. The translation is unintelligible and it is not
possible to obtain the information contained in the original sentence.
Studying the meaning of the text is hopeless and, even allowing for
context, one feels that guessing would be too unreliable.
Adapted from: D. Arnold et al. "Machine Translation: an Introductory Guide" (1994)

Results: Speech Recognition
Accuracy (1/2)
Mean
• Minimal differences in mean accuracy
– by Language and Lexicon
– by Speaker (except PT-Speaker2)
ESEM'14 - Turin, Sept. 18-19, 2014 10
Language
IT .81
PT .75
Lexicon
Generic .80
Jargon .77
Speaker Language Mean
PT-Speaker1 PT .78
PT-Speaker2 PT .68
PT-Speaker3 PT .79
PT-Speaker4 PT .73
IT-Speaker1 IT .76
IT-Speaker2 IT .88
IT-Speaker3 IT .78
IT-Speaker4 IT .82

Results: Speech Recognition
Accuracy (2/2)
Source df
Mean
Square
F Sig.
Intercept 1 290.785 587.408 .017
Language 1 .460 2.948 .144
Speaker(Language) 6 .166 12.907 .003†
Lexicon 1 .125 3.285 .104
Replication(Lexicon) 58 .082 1.740 .018†
Language * Lexicon 1 .003 .068 .797
UNIANOVA
• Speaker(Language) and Replication(Lexicon)
the only significant factors
ESEM'14 - Turin, Sept. 18-19, 2014 11
Language *
Replication(Lexicon)
58 .047 3.004 .000
Lexicon * Speaker(Language) 6 .013 .817 .557
† Significant at 5% level

Results: Speech Translation
Adequacy (1/2)
Adequacy by Language pairs Adequacy by Lexicon
152
141
165
50
49
30
70
71
90
38
50
45
82
70
75
IT->PT
IT->EN
PT->IT
ESEM'14 - Turin, Sept. 18-19, 2014 12
88
99
75
80
160
0% 20% 40% 60% 80% 100%
IT->PT
IT->EN
PT->IT
PT->EN
Adequate
(categories 4 and 3)
Inadequate
(categories 2 and 1)
32
88
48
72
0 100 200
PT->EN
Jargon
Adequate Inadequate
Generic
Adequate Inadequate
41%
31%

Results: Speech Translation
Adequacy (2/2)
Spearman's rho
(N=120)
PT->IT PT->EN IT->PT IT->EN
Jargon Generic Jargon Generic Jargon Generic Jargon Generic
Translation adequacy
Transcription accuracy
Translat.
adequacy
Correl. 1.0 .55* 1.0 .54* 1.0 .63* 1.0 .59* 1.0 .71* 1.0 .55* 1.0 .72* 1.0 .61*
Sig. . .00 . .00 . .00 . .00 . .00 . .00 . .00 . .00
Transcript.
accuracy
Correl. .55* 1.0 .54* 1.0 .63* 1.0 .59* 1.0 .71* 1.0 .55* 1.0 .72* 1.0 .61* 1.0
Sig. .00 . .00 . .00 . .00 . .00 . .00 . .00 . .00 .
* Correlation significant at the 0.01 level (2-tailed).
• Moderate positive correlation between
Transcription Accuracy and Translation Adequacy

Conclusions: RQ1
How well does speech translation work for
continuous speech?
• Our study setup: Simulation of a conversation
– Similar to: Automatic generation of closed captioning
from webcasts
• Our findings
– In line with baseline: 75% word accuracy [1]
YET
• Adequate speech translations 31-41%
• Some domains more critical than others
ESEM'14 - Turin, Sept. 18-19, 2014 14
[1] Munteanu et al. “Collaborative editing for improved usefulness and usability of transcript-enhanced webcasts”, CHI’08

Conclusions: RQ2
How does technical jargon affect speech
translation?
• No evidence that jargon generates worse
speech translations
– At least, in the CS domain
HOWEVER
• Professionals reads jargon differently
– e.g., “SQL” → SEQUEL, spelled in Italian, in
English…
ESEM'14 - Turin, Sept. 18-19, 2014 15

Study limitation & Future work
- Simulation-based study
- What would happen in a
real setting?
- Refine transcription
accuracy construct
(errors)
- One technology only
- i.e., Google’s Web Speech
API and Translate
- Effect of accents,
pronunciations, gender?
- i.e., only 8 speakers, 1
female
+ Run a controlled
experiment
+ Multi-language group task
+ Distinguish between
incorrect and missing
words
+ Compare more speech
translation solutions
+ e.g., Nuance, Sphinx, Bing
+ Involve more speakers in
experiments
+ Also include EN native
speakers
ESEM'14 - Turin, Sept. 18-19, 2014 16

Thanks!
QUESTIONS?
fabio.calefato@uniba.it

65 - An Empirical Simulation-based Study of Real-Time Speech Translation for Multilingual Global Project Teams

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to 65 - An Empirical Simulation-based Study of Real-Time Speech Translation for Multilingual Global Project Teams

Similar to 65 - An Empirical Simulation-based Study of Real-Time Speech Translation for Multilingual Global Project Teams (20)

More from ESEM 2014

More from ESEM 2014 (20)

Recently uploaded

Recently uploaded (20)

65 - An Empirical Simulation-based Study of Real-Time Speech Translation for Multilingual Global Project Teams

Editor's Notes