Mark Seligman of Spoken Translation, Inc. will provide some history of the speech translation field since the 1990s; give a quick tour of the state of the art; and speculate about future developments, with special interest in deep semantics.
3. In the beginning …
• NEC demo, ITU Telecom World (1983 )
• Chat: Uni-verse, Amikai, Compuserve, Global-Link; (later) Ortsbo
• C-STAR(-2) consortium (1991?)
– ATR trilateral demo (1993)
– English, French, Japanese, Italian, Korean, German, Chinese
– First unrestricted SLT (1997, 1998)
• GETA group (Seligman)/Compuserve
• Information Transcript Project (1995, 1996)
– MIT – Biennale d’Art Contemporain, Lyon
• IBM Voice Type
• Global-Link translation software
• Mac TEXT-TO-SPEECH
• CU-SeeMe, video, audio
• Robert Palmquist
– Mobile device (1993)
– English-Spanish, Office of Naval Research (1997)
– SpeechGear launch (2001)
– Mobile device for Japanese<>English (2003)
– “Interpreter” mobile phone (2004)
– Compadre: Interact 4.0, English<>35 other languages (2009)
• Talk&Translate
4. (Still) In the beginning …
• Verbmobil (Wolfgang Walster)
– German Federal Ministry of Research and Technology (1993-2000)
• Linguatec (>Lingenio)
– First boxed product (1998)
• More Japanese
– NEC mobile device with Japanese-English (2006)
– ATR-Trek, Shabete Honyaku, mobile phone (2007)
• Apps
– Google Translate (2006-2007)
– Jibbigo, first offline (2009)
– …
• Healthcare system pilots
– Spoken Translation, Inc., Converser for Healthcare 3.0 (2011)
– Sehda (>Fluential) (2009?)
• Phone-based
– SpeechTrans (multi-lingual conferences) (2014)
– Lexifone (2014)
5. Converser for Healthcare 3.0:
Kaiser Permanente Pilot
• Three departments at San Francisco Medical Center
– Pharmacy:
• Consulting or Drop-off use case
– Shortcuts: Consultation: Typhoid Vaccine
• Pickup use case
• Greeter use case
– Inpatient Nursing
• Shortcuts: IV, External Catheter, Pain Assessment
– Eye Care
• Shortcuts: Informed Consent for Cataract Surgery
10. More Platforms
• Watches
– Apple Watch
– SpeechTrans
• Glasses
– Augmented reality
– In 3 years, not 300!
• Issues:
– Cloud vs. on-device
– Must be everyday device
– Must be cool!
– Watch Apple!
11. The Way Forward
• Improve statistical MT
– User feedback + machine learning
– More, better data
• Knowledge source integration
– Discourse
– Domain
– Prosody …
• The Return of Semantics
– Interlingua/ontologies
– True (grounded) semantics
• New tech
– Deep neural networks
– Other AI
12. Improve Statistical MT
• User feedback (over) +
machine learning
• More, better data
• Parsing > hybrid MT
14. Knowledge Source Integration
• Think: IBM’s Watson
• … or Verbmobil (till 2000)
• Nine Issues in Speech Translation
– Discourse
– Speech acts
– Topic tracking
– Domain
– Prosody
– Pauses
– Pitch, stress
– Translation mismatches
– System architecture, data structures
15. 車_car を_obj 運転_driving する_do 人_person
Syntactic structure
NP
VP
Semantic structure
PP V
N NP VN V
drive
person
person car
mod
agt obj
The Return of Semantics: Interlingua/Ontologies
16. English Real-
time Text
Texte Français
en Temps Vrais
English Voice
Voix Français
Real-time
Voice/Text
“Do you come
here often?”
“Est-ce que tu
viens souvent ici?”
Machine Translation
20. New Tech
• MT via deep neural networks
– Google Deepmind
– Kyunghyun Cho (NYU)
• Other AI
– Kurzweil’s brain project
– Microsoft research
– …
• Issue: Is black box sufficient?