Advertisement

Latest Updates and Experiences in Launching Local Language Tools, Karel Bourgois

Business and Service Development at Alan Quayle Business & Service Development
Nov. 11, 2022
Advertisement

More Related Content

More from Alan Quayle(20)

Advertisement

Latest Updates and Experiences in Launching Local Language Tools, Karel Bourgois

  1. From Speech to Knowledge Latest Updates and Experiences in Launching Local Language Tools
  2. Karel Bourgois • 20+ years in Telecom Who am I ? • Entrepreneur • Ecosystem Le Voice Lab
  3. Voxist voicemail since 2016 Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Main Features ü Custom greetings ü Speech to text Products Clients ü B2C : 10’s of thousand users with over 5% paying ü B2B : Consulting firms, Law firms, Entrepreneurs… 3
  4. 2022 Business Model
  5. Telcos Voicemail apps have low ratings
  6. SA1 SA2 SA3 DONNÉES PRIVATE PUBLIC AUGMENTED Corporate Labs MOTEURS SERVICES Unified Voice Related APIs (ASR, TTS, NLP,...) APIs Corporate Labs € Corporate/Labs/Gov € MARKET PLACE € Corporate/Labs/Gov Vocal Assistants – Emotions – Voice ID – Translation – Subtitles … Open Source Le Voice Lab
  7. APIs in the Cloud & On-premise Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Current Features ü Transcriptions in French & English ü Punctuation ü Speaker separation (Diarization) Coming soon Ø Spanish, Portuguese, German, Italian Ø TTS: create your own assistant voices Ø Real-time translation Products Clients ü French Vocal Assistant manufacturer ü Le Voice Lab Distributors ü OVH ü Eden.ai 7
  8. Why Now Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. 8
  9. 9 Traditional ASR approach This solution split the optimization of ASR problem into 3 components Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
  10. 10 Traditional ASR approach This solution split the optimization of ASR problem into 3 components Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Acoustic Model: Neural Network transduce signals frames to sequence of phonemes (Tri-phones), using EM techniques + Lattice Free-MMI (Maximum mutual information) Phonetic Lexicon: it provides the decomposition of words into basic acoustic unit Language Model: using n- gram model, estimation of probabilities based on frequency
  11. 11 Traditional ASR issues Large Annotated dataset require Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Traditional ASR requires annotated data for: 1. Acoustic modeling : large amount of audio with the corresponding texts and even phonemes 2. Lexicon creation : all the ways of saying the same phonemes / words Þ This requires also very specific skills in the linguistic domain This is the approach of ASR toolkits like Kaldi, HTK, Sphinx, Julius, RASR that were crated before E2E solutions where available (Kaldi main contributor, Daniel Povey, now works at Xiomi in China and works on a new E2E ASR engine called K2)
  12. 12 New ASR approaches End-to-End Neural Networks (E2E) Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Predict sequence of characters directly from speech using Neural Network and differentiable CTC Loss
  13. 13 Advantages of new ASR approach Self-Supervised techniques Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. The idea is to learn Language model directly from Speech: - You need much less annotated data - Less specialized Linguistic skills - No phonetic lexicons
  14. 14 Voxist hybrid approach Self-Supervised & Domain Specific Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Lexicon and Language Model created for target domain using client data
  15. 15 Voxist Results Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Models WER on 40h (GigaSpeech) Google 18.9 Kaldi 14.9 MS 12.4 Pika 12.3 ESPnet 10.3 WeNet 10.6 Voxist basic 10.2 Voxist hybrid 9.8
  16. Voxist tech can also bypass ASR and get Intents directly Self-supervised applied to SLU Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
  17. Video to text & knowledge management Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Current Features ü Video indexing and Semantic search ü Video subtitles Coming soon Ø Audio search without ASR Ø Multimodal Sentiment Analysis Ø Auto translate Products 17
  18. What Next ? Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. A Telco Vocal Assistant ? • ASR + TTS • Conversational Agent • Noise reduction / speech enhancement All in the cloud-native mobile core networks of tomorrow… Products 18
  19. Karel, BOURGOIS, Founder karel@voxist.com @bourgois
Advertisement