SlideShare a Scribd company logo
1 of 19
Download to read offline
From Speech to
Knowledge
Latest Updates and Experiences in Launching Local Language Tools
Karel Bourgois • 20+ years in Telecom
Who am I ?
• Entrepreneur
• Ecosystem
Le Voice Lab
Voxist voicemail since 2016
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Main Features
ü Custom greetings
ü Speech to text
Products
Clients
ü B2C : 10’s of thousand users with over 5% paying
ü B2B : Consulting firms, Law firms, Entrepreneurs…
3
2022
Business Model
Telcos Voicemail apps have low ratings
SA1
SA2
SA3
DONNÉES
PRIVATE PUBLIC
AUGMENTED
Corporate Labs
MOTEURS
SERVICES
Unified Voice Related APIs (ASR, TTS, NLP,...)
APIs
Corporate
Labs
€
Corporate/Labs/Gov
€
MARKET PLACE
€ Corporate/Labs/Gov
Vocal Assistants – Emotions – Voice ID – Translation – Subtitles …
Open
Source
Le Voice Lab
APIs in the Cloud & On-premise
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Current Features
ü Transcriptions in French & English
ü Punctuation
ü Speaker separation (Diarization)
Coming soon
Ø Spanish, Portuguese, German, Italian
Ø TTS: create your own assistant voices
Ø Real-time translation
Products
Clients
ü French Vocal Assistant manufacturer
ü Le Voice Lab
Distributors
ü OVH
ü Eden.ai
7
Why Now
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
8
9
Traditional ASR approach
This solution split the optimization of ASR problem into 3 components
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
10
Traditional ASR approach
This solution split the optimization of ASR problem into 3 components
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Acoustic Model: Neural Network
transduce signals frames to sequence of
phonemes (Tri-phones), using EM
techniques + Lattice Free-MMI (Maximum
mutual information)
Phonetic Lexicon: it provides
the decomposition of words into
basic acoustic unit
Language Model: using n-
gram model, estimation of
probabilities based on
frequency
11
Traditional ASR issues
Large Annotated dataset require
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Traditional ASR requires annotated data for:
1. Acoustic modeling : large amount of audio with the corresponding texts and even
phonemes
2. Lexicon creation : all the ways of saying the same phonemes / words
Þ This requires also very specific skills in the linguistic domain
This is the approach of ASR toolkits like Kaldi, HTK, Sphinx, Julius, RASR that were
crated before E2E solutions where available
(Kaldi main contributor, Daniel Povey, now works at Xiomi in China and works on a new E2E ASR engine called K2)
12
New ASR approaches
End-to-End Neural Networks (E2E)
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Predict sequence of characters directly
from speech using Neural Network and
differentiable CTC Loss
13
Advantages of new ASR approach
Self-Supervised techniques
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
The idea is to learn Language
model directly from Speech:
- You need much less
annotated data
- Less specialized Linguistic
skills
- No phonetic lexicons
14
Voxist hybrid approach
Self-Supervised & Domain Specific
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Lexicon and Language
Model created for target
domain using client data
15
Voxist Results
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Models
WER on 40h
(GigaSpeech)
Google 18.9
Kaldi 14.9
MS 12.4
Pika 12.3
ESPnet 10.3
WeNet 10.6
Voxist basic 10.2
Voxist hybrid 9.8
Voxist tech can also bypass ASR and get Intents directly
Self-supervised applied to SLU
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Video to text & knowledge management
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Current Features
ü Video indexing and Semantic search
ü Video subtitles
Coming soon
Ø Audio search without ASR
Ø Multimodal Sentiment Analysis
Ø Auto translate
Products
17
What Next ?
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
A Telco Vocal Assistant ?
• ASR + TTS
• Conversational Agent
• Noise reduction / speech enhancement
All in the cloud-native mobile core networks
of tomorrow…
Products
18
Karel, BOURGOIS, Founder
karel@voxist.com
@bourgois

More Related Content

Similar to Latest Updates and Experiences in Launching Local Language Tools, Karel Bourgois

RealSpeaker English Executive Summary
RealSpeaker English Executive SummaryRealSpeaker English Executive Summary
RealSpeaker English Executive Summary
RealSpeaker 2.0
 
Tek labs company presentation 1.9.2010 short
Tek labs company presentation 1.9.2010 shortTek labs company presentation 1.9.2010 short
Tek labs company presentation 1.9.2010 short
Anton Seredkin
 

Similar to Latest Updates and Experiences in Launching Local Language Tools, Karel Bourgois (20)

Open Source Telephony Disruptive Solutions
Open Source Telephony Disruptive SolutionsOpen Source Telephony Disruptive Solutions
Open Source Telephony Disruptive Solutions
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
GroovyDSLs
GroovyDSLsGroovyDSLs
GroovyDSLs
 
V Web Brochure
V Web BrochureV Web Brochure
V Web Brochure
 
groovy DSLs from beginner to expert
groovy DSLs from beginner to expertgroovy DSLs from beginner to expert
groovy DSLs from beginner to expert
 
Avaya speech analytics presentation at SWAG, Aug 15 2013 Meeting
Avaya speech analytics presentation at SWAG, Aug 15 2013 MeetingAvaya speech analytics presentation at SWAG, Aug 15 2013 Meeting
Avaya speech analytics presentation at SWAG, Aug 15 2013 Meeting
 
Text to speech converter in C#.NET
Text to speech converter in C#.NETText to speech converter in C#.NET
Text to speech converter in C#.NET
 
Cyflwyniad Bloc
Cyflwyniad BlocCyflwyniad Bloc
Cyflwyniad Bloc
 
Opensource wildey
Opensource wildeyOpensource wildey
Opensource wildey
 
NGNlab - The Deployment of an Open-Source Infrastructure
NGNlab - The Deployment of an Open-Source InfrastructureNGNlab - The Deployment of an Open-Source Infrastructure
NGNlab - The Deployment of an Open-Source Infrastructure
 
RealSpeaker English Executive Summary
RealSpeaker English Executive SummaryRealSpeaker English Executive Summary
RealSpeaker English Executive Summary
 
ibm språkbanken websphere
ibm språkbanken websphereibm språkbanken websphere
ibm språkbanken websphere
 
Cross-Cultural User Experience: What It Is and How to Do It?
Cross-Cultural User Experience: What It Is and How to Do It?Cross-Cultural User Experience: What It Is and How to Do It?
Cross-Cultural User Experience: What It Is and How to Do It?
 
Tek labs company presentation 1.9.2010 short
Tek labs company presentation 1.9.2010 shortTek labs company presentation 1.9.2010 short
Tek labs company presentation 1.9.2010 short
 
Opensource
OpensourceOpensource
Opensource
 
VoiceXML
VoiceXMLVoiceXML
VoiceXML
 
Blockchains development Services
Blockchains development ServicesBlockchains development Services
Blockchains development Services
 
Survival of the Forges: Do Language Trends Tell the Story?
Survival of the Forges: Do Language Trends Tell the Story?Survival of the Forges: Do Language Trends Tell the Story?
Survival of the Forges: Do Language Trends Tell the Story?
 
One pager RealSpeaker
One pager RealSpeakerOne pager RealSpeaker
One pager RealSpeaker
 
Voiceroy - Localised Voice and Speech Recognition App
Voiceroy - Localised Voice and Speech Recognition AppVoiceroy - Localised Voice and Speech Recognition App
Voiceroy - Localised Voice and Speech Recognition App
 

More from Alan Quayle

More from Alan Quayle (20)

What is a vCon?
What is a vCon?What is a vCon?
What is a vCon?
 
Supercharging CPaaS Growth & Margins with Identity and Authentication, Aditya...
Supercharging CPaaS Growth & Margins with Identity and Authentication, Aditya...Supercharging CPaaS Growth & Margins with Identity and Authentication, Aditya...
Supercharging CPaaS Growth & Margins with Identity and Authentication, Aditya...
 
Building a sub-second virtual ThunderDome: Considerations for mass scale sub-...
Building a sub-second virtual ThunderDome: Considerations for mass scale sub-...Building a sub-second virtual ThunderDome: Considerations for mass scale sub-...
Building a sub-second virtual ThunderDome: Considerations for mass scale sub-...
 
What makes a cellular IoT API great? Tobias Goebel
What makes a cellular IoT API great? Tobias GoebelWhat makes a cellular IoT API great? Tobias Goebel
What makes a cellular IoT API great? Tobias Goebel
 
eSIM as Root of Trust for IoT security, João Casal
eSIM as Root of Trust for IoT security, João CasaleSIM as Root of Trust for IoT security, João Casal
eSIM as Root of Trust for IoT security, João Casal
 
Architecting your WebRTC application for scalability, Arin Sime
Architecting your WebRTC application for scalability, Arin SimeArchitecting your WebRTC application for scalability, Arin Sime
Architecting your WebRTC application for scalability, Arin Sime
 
CPaaS Conversational Platforms and Conversational Customer Service – The Expe...
CPaaS Conversational Platforms and Conversational Customer Service – The Expe...CPaaS Conversational Platforms and Conversational Customer Service – The Expe...
CPaaS Conversational Platforms and Conversational Customer Service – The Expe...
 
Programmable Testing for Programmable Telcos, Andreas Granig
Programmable Testing for Programmable Telcos, Andreas GranigProgrammable Testing for Programmable Telcos, Andreas Granig
Programmable Testing for Programmable Telcos, Andreas Granig
 
How to best maximize the conversation data stream for your business? Surbhi R...
How to best maximize the conversation data stream for your business? Surbhi R...How to best maximize the conversation data stream for your business? Surbhi R...
How to best maximize the conversation data stream for your business? Surbhi R...
 
What Everyone Needs to Know about Protecting the CPaaS Ecosystem from Unlawfu...
What Everyone Needs to Know about Protecting the CPaaS Ecosystem from Unlawfu...What Everyone Needs to Know about Protecting the CPaaS Ecosystem from Unlawfu...
What Everyone Needs to Know about Protecting the CPaaS Ecosystem from Unlawfu...
 
Master the Audience Experience Multiverse: AX Best Practices and Success Stor...
Master the Audience Experience Multiverse: AX Best Practices and Success Stor...Master the Audience Experience Multiverse: AX Best Practices and Success Stor...
Master the Audience Experience Multiverse: AX Best Practices and Success Stor...
 
Open Source Telecom Software Survey 2022, Alan Quayle
Open Source Telecom Software Survey 2022, Alan QuayleOpen Source Telecom Software Survey 2022, Alan Quayle
Open Source Telecom Software Survey 2022, Alan Quayle
 
OpenSIPS 3.3 – Messaging in the IMS and UC ecosystems. Bogdan-Andrei Iancu
OpenSIPS 3.3 – Messaging in the IMS and UC ecosystems. Bogdan-Andrei IancuOpenSIPS 3.3 – Messaging in the IMS and UC ecosystems. Bogdan-Andrei Iancu
OpenSIPS 3.3 – Messaging in the IMS and UC ecosystems. Bogdan-Andrei Iancu
 
TADS 2022 - Shifting from Voice to Workflow Management, Filipe Leitao
TADS 2022 - Shifting from Voice to Workflow Management, Filipe LeitaoTADS 2022 - Shifting from Voice to Workflow Management, Filipe Leitao
TADS 2022 - Shifting from Voice to Workflow Management, Filipe Leitao
 
What happened since we last met TADSummit 2022, Alan Quayle
What happened since we last met TADSummit 2022, Alan QuayleWhat happened since we last met TADSummit 2022, Alan Quayle
What happened since we last met TADSummit 2022, Alan Quayle
 
Stacuity - TAD Summit 2022 - Time to ditch the dumb-pipe, Mike Bromwich
Stacuity - TAD Summit 2022 - Time to ditch the dumb-pipe, Mike BromwichStacuity - TAD Summit 2022 - Time to ditch the dumb-pipe, Mike Bromwich
Stacuity - TAD Summit 2022 - Time to ditch the dumb-pipe, Mike Bromwich
 
AWA – a Telco bootstrapping product development: Challenges with dynamic mark...
AWA – a Telco bootstrapping product development: Challenges with dynamic mark...AWA – a Telco bootstrapping product development: Challenges with dynamic mark...
AWA – a Telco bootstrapping product development: Challenges with dynamic mark...
 
Founding a Startup in Telecoms. The good, the bad and the ugly. João Camarate
Founding a Startup in Telecoms. The good, the bad and the ugly. João CamarateFounding a Startup in Telecoms. The good, the bad and the ugly. João Camarate
Founding a Startup in Telecoms. The good, the bad and the ugly. João Camarate
 
How to bring down your own RTC platform. Sandro Gauci
How to bring down your own RTC platform. Sandro GauciHow to bring down your own RTC platform. Sandro Gauci
How to bring down your own RTC platform. Sandro Gauci
 
Radisys - Engage Digital - TADSummit Nov 2022
Radisys - Engage Digital - TADSummit Nov 2022Radisys - Engage Digital - TADSummit Nov 2022
Radisys - Engage Digital - TADSummit Nov 2022
 

Recently uploaded

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligenceRevolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Precisely
 

Recently uploaded (20)

WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligenceRevolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 

Latest Updates and Experiences in Launching Local Language Tools, Karel Bourgois

  • 1. From Speech to Knowledge Latest Updates and Experiences in Launching Local Language Tools
  • 2. Karel Bourgois • 20+ years in Telecom Who am I ? • Entrepreneur • Ecosystem Le Voice Lab
  • 3. Voxist voicemail since 2016 Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Main Features ü Custom greetings ü Speech to text Products Clients ü B2C : 10’s of thousand users with over 5% paying ü B2B : Consulting firms, Law firms, Entrepreneurs… 3
  • 5. Telcos Voicemail apps have low ratings
  • 6. SA1 SA2 SA3 DONNÉES PRIVATE PUBLIC AUGMENTED Corporate Labs MOTEURS SERVICES Unified Voice Related APIs (ASR, TTS, NLP,...) APIs Corporate Labs € Corporate/Labs/Gov € MARKET PLACE € Corporate/Labs/Gov Vocal Assistants – Emotions – Voice ID – Translation – Subtitles … Open Source Le Voice Lab
  • 7. APIs in the Cloud & On-premise Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Current Features ü Transcriptions in French & English ü Punctuation ü Speaker separation (Diarization) Coming soon Ø Spanish, Portuguese, German, Italian Ø TTS: create your own assistant voices Ø Real-time translation Products Clients ü French Vocal Assistant manufacturer ü Le Voice Lab Distributors ü OVH ü Eden.ai 7
  • 8. Why Now Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. 8
  • 9. 9 Traditional ASR approach This solution split the optimization of ASR problem into 3 components Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
  • 10. 10 Traditional ASR approach This solution split the optimization of ASR problem into 3 components Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Acoustic Model: Neural Network transduce signals frames to sequence of phonemes (Tri-phones), using EM techniques + Lattice Free-MMI (Maximum mutual information) Phonetic Lexicon: it provides the decomposition of words into basic acoustic unit Language Model: using n- gram model, estimation of probabilities based on frequency
  • 11. 11 Traditional ASR issues Large Annotated dataset require Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Traditional ASR requires annotated data for: 1. Acoustic modeling : large amount of audio with the corresponding texts and even phonemes 2. Lexicon creation : all the ways of saying the same phonemes / words Þ This requires also very specific skills in the linguistic domain This is the approach of ASR toolkits like Kaldi, HTK, Sphinx, Julius, RASR that were crated before E2E solutions where available (Kaldi main contributor, Daniel Povey, now works at Xiomi in China and works on a new E2E ASR engine called K2)
  • 12. 12 New ASR approaches End-to-End Neural Networks (E2E) Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Predict sequence of characters directly from speech using Neural Network and differentiable CTC Loss
  • 13. 13 Advantages of new ASR approach Self-Supervised techniques Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. The idea is to learn Language model directly from Speech: - You need much less annotated data - Less specialized Linguistic skills - No phonetic lexicons
  • 14. 14 Voxist hybrid approach Self-Supervised & Domain Specific Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Lexicon and Language Model created for target domain using client data
  • 15. 15 Voxist Results Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Models WER on 40h (GigaSpeech) Google 18.9 Kaldi 14.9 MS 12.4 Pika 12.3 ESPnet 10.3 WeNet 10.6 Voxist basic 10.2 Voxist hybrid 9.8
  • 16. Voxist tech can also bypass ASR and get Intents directly Self-supervised applied to SLU Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
  • 17. Video to text & knowledge management Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Current Features ü Video indexing and Semantic search ü Video subtitles Coming soon Ø Audio search without ASR Ø Multimodal Sentiment Analysis Ø Auto translate Products 17
  • 18. What Next ? Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. A Telco Vocal Assistant ? • ASR + TTS • Conversational Agent • Noise reduction / speech enhancement All in the cloud-native mobile core networks of tomorrow… Products 18