Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Block-based Speech to Speech Translation

354 views

Published on

This bachelor’s thesis explores different ways of building a block-based Speech Translation system with the aim of generating huge amounts of parallel speech data. The first goal is to research and manage to run suitable tools to implement each one of the three blocks that integrates the Speech Translation system: Speech Recognition, Translation and Speech Synthesis. We experiment with some open-source toolkits and we manage to train a speech recognition system and a neural machine translation system. Then, we test them to evaluate their performance. As an alternative option, we use the cloud computing solutions provided by Google Cloud to implement the three sequential blocks and we successfully build the overall system. Finally, we make a comparative study between an in-house software development versus Cloud computing implementation.

https://imatge.upc.edu/web/publications/block-based-speech-speech-translation

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Block-based Speech to Speech Translation

  1. 1. Block-based Speech-to-Speech Translation Author Sandra Roca Advisors Xavier Giró, Marta Costa, Amanda Duarte 1 October, 2018
  2. 2. Contents • Introduction • Project Development • Results • Costs • Conclusions 2
  3. 3. How many languages are there in the world? 3 How many people has a good English level?
  4. 4. Introduction | Motivation 4
  5. 5. Introduction | Context Good morning Guten morgenASR NMT Speech synthesis“Good morning” “Guten morgen” Lip detector Lip generator 5 DeepLipDub: Putting a Mouth to Someone’s Words
  6. 6. Introduction | Classic S2ST system Source language Source language Target language Target language 6
  7. 7. Project Development 7 VS IN-HOUSE CLOUD GPI Platform Google Cloud Platform
  8. 8. Open-Source Toolkits IN-HOUSE Speech Recognition Speech (Awni) Deepspeech. pytorch OpenNMT Translation Speech Synthesis https://github.com/awni/speech https://github.com/SeanNaren/deepspeech.pytorch https://github.com/OpenNMT/OpenNMT-py https://github.com/marytts/marytts
  9. 9. Speech Recognition | Dataset 9 Subset hours Total speakers Train-clean-360 363.3 921 Dev-clean 5.4 40 Test-clean 5.4 40 LibriSpeech. An ASR corpus IN-HOUSE
  10. 10. “Speech. A PyTorch implementation for Speech-to-Text” https://github.com/awni/speech  Data preparation  Training Edit Configuration file: ctc_config.json 7.2% of CER  Testing 7.1% of CER 10 IN-HOUSE Speech Recognition | Toolkit
  11. 11. Translation | Toolkit • Data preparation Extract features Generate vocabulary files • Training and Translation Training with a small dataset Using a pre-trained model 11 IN-HOUSE OpenNMT
  12. 12. Implementation with Google Cloud 12 Cloud Speech API Cloud Translation API Cloud Text-to-Speech API
  13. 13. API Request Configuration fields • Encoding • Sample Rate • Language Code Audio Data 13 Speech-to-Text
  14. 14. Pre-trained models / Custom built models More tan 100 languages Requested fields • Source language • Target language • Text to be translated 14 Translation
  15. 15. Text-to-Speech API Request Voice configuration • Language • Voice • Gender Encoding format Text 15
  16. 16. Results. Speech Recognition 16 Prediction you must look at him in the face fight him cocquer him with what scave you may you ned not think to keep out ot the way of him Label you must look at him in the face fight him conquer him with what scathe you may you need not think to keep out of the way of him Prediction slaing makes one shuter Label slang makes one shudder
  17. 17. Results | Cloud-based implementation 17
  18. 18. 18 Results | Cloud-based implementation “I really like that account of himself better than anything else he said.” “Ich mag diesen Bericht über sich selbst besser als alles andere, was er gesagt hat.” “Realmente me gusta esa cuenta de sí mismo mejor que cualquier otra cosa, dijo.”
  19. 19. Costs | In-house development • Virtual machines with GPUs • Access servers • Storage • Data centre • Software • Technical staff • Electricity cost !!! 19
  20. 20. Costs | Speech-to-Text Subset hours Cost Dev-clean 5,4 6,75 € Dev-other 5,4 6,75€ Test-clean 5,3 6,625 € Test-other 5,1 6,375 € Train-clean-100 100,6 125,75 € Train-clean-360 363,6 454,5 € Train-other-500 496,7 620,875 € Total 982,1 1.127,625 € 20 Price: 0,0052€ / 15 seconds  1,25€ / 1 hour
  21. 21. Costs | Translation Subset Total characters Cost Dev-clean 286.844 5,02 € Dev-other 264.090 4,62 € Test-clean 278.974 4,88 € Test-other 271.063 4,74 € Train-clean-100 5.427.481 91,83 € Train-clean-360 19.053.548 333,44 € Train-other-500 25.282.513 442,44 € Total 50.684.513 886,98 € 21 Price: 17,5€ / 1 million characters.
  22. 22. Costs | Text-to-Speech 22 • Price: 3,5€ / 1 million characters • Estimated character count ratio  1:1,11 (English:German) Subset Total characters Cost Dev-clean 286.844 5,02 € Dev-other 264.090 4,62 € Test-clean 278.974 4,88 € Test-other 271.063 4,74 € Train-clean-100 5.427.481 91,83 € Train-clean-360 19.053.548 333,44 € Train-other-500 25.282.513 442,44 € Total 50.684.513 886,98 €
  23. 23. Costs | Speech-to-Speech Translation 23 Total cost of the overall system Speech-to-Text 1.127,625 € Translate 886,98 € Text-to-Speech 197,61 € Total 2.212,22 €
  24. 24. Conclusions • It’s very challenging to train a system from scratch • Better choice: use Google Cloud API • We are able to create a parallel corpus • GC advantages: multiple languages 24
  25. 25. Thanks! 25 Q&A

×