SlideShare a Scribd company logo
1 of 45
Download to read offline
Developments of Swahili
resources for an ASR system

Hadrien Gelas1,2, Laurent Besacier2, François Pellegrino1
1Laboratoire DDL, CNRS - Université de Lyon, France
2LIG, CNRS - Université Joseph Fourier Grenoble, France
Swahili                 System
introduction               results


     1            2         3
                  ASR
               resources
Swahili ?


1
2% only of native speakers
(between 800k and 5M)



               98% are
                non-
               natives


between 40M and 100M speakers
Large
area of
East
Africa


                 9	
  
Spoken in more than   countries	
  
Large
area of
East
Africa


Official language of   5	
  
                          nations	
  
Large
area of
East
Africa       area 	
  




  Swahili
  language
Internet penetration
rate (%)                                                           78.6

                                                        67.5
                                              61.3



                                   39.5
                          35.6
                32.7
         26.2

13.5

Africa   Asia    World Middle East Latin      Europe   Oceania /    North
                Average           America /            Australia   America
                                  Caribbean
Internet population
         2988.4
                     growth (%)
                         2244.8
                                            2000-2011

                                   1205.1

         789.6
                 528.1
                                               376.4
                                                          214       152.6

Africa    Asia    World Middle East Latin      Europe   Oceania /    North
                 Average           America /            Australia   America
                                   Caribbean
Swahili and IT services
Swahili and IT services




But not yet
Swahili online
resources
Bantu
family




333
Swahili features
for ASR



 Rich morphology     Non-tonal
 Noun classes        Roman script
 agreement systems
 complex verbs
ASR resources
"
     Acoustic
                r   Pronunciation
                                    r   Language
     models
                l    dictionary
                                    l    models




 2                                             J    Text
                                                   output
ASR resources
"
     Acoustic
                r   Pronunciation
                                    r   Language
     models
                l    dictionary
                                    l    models



        Needs text corpus
                                               J    Text
                                                   output
Text corpus (M words)
                                                        28
  Collected from 16
  news websites

                                         12



                      5
     2
 Sawa corpus   [Getao and Miriti]   Helsinki corpus   Our corpus
Rich morphology in
Swahili
English   They will not tell you

Swahili   hawatakuambieni

Segm.     ha-wa-ta-ku-ambi-e-ni

Gloss     NEG-SM2-FUT-OM2-tell-FIN-PL
Rich morphology
for ASR (Type OOV %)
  19.17



             12.46
                         10.28




 Word-65k   Word-200k   Word-400k
Rich morphology
for ASR (Type OOV %)
  19.17
                                    High OOV rates	
  


             12.46
                         10.28




 Word-65k   Word-200k   Word-400k
Rich morphology
for ASR (Type OOV %)
  19.17
                                    To reach a larger
                                    lexical coverage,
                                    we used an
             12.46
                                    unsupervised
                         10.28      approach
                                    (Morfessor) to
                                    segment words in
                                    sub-words units	
  
 Word-65k   Word-200k   Word-400k
Rich morphology
for ASR (Type OOV %)
  19.17



             12.46
                         10.28      11.36


                                                1.61
 Word-65k   Word-200k   Word-400k   Morf-65k   Morf-200k
ASR resources
"
     Acoustic
                r   Pronunciation
                                    r   Language
     models
                l    dictionary
                                    l    models




 Needs unit pronunciation
                                               J    Text
                                                   output
Pronunciation
dictionary
65k most frequent units (words or sub-words)
+
Grapheme-to-phoneme script taking benefits
of the regularity of Swahili spelling
Pronunciation
dictionary
65k most frequent units (words or sub-words)
+
Grapheme-to-phoneme script taking benefits
of the regularity of Swahili spelling
                    BUT…
Issue with English words, proper names and
acronyms!
Pronunciation
dictionary

Near 9% of units in 65k lexicon are
found in CMU English dictionary
Pronunciation
dictionary
Words in 65k dictionary   Words in CMU
	
                        	
  
…                         …
games     g a m e s       games     G EY M Z
…                         …
Pronunciation
dictionary
Words in 65k dictionary         Words in CMU
	
                              	
  
…               1               …
                    Identical word
games     g a m e s             games     G EY M Z
…                               …
Pronunciation
dictionary
Words in 65k dictionary         Words in CMU
	
                              	
  
…               1               …
                    Identical word
games     g a m e s             games     G EY M Z
…                               …



                       2   Mapping to Swahili phones
Pronunciation
dictionary
Words in 65k dictionary         Words in CMU
	
                              	
  
…               1               …
                    Identical word
games     g a m e s             games     G EY M Z
games(2) g e y m z              …
…
     Add as a
 3   variant
                       2   Mapping to Swahili phones
ASR resources
"
     Acoustic
                r   Pronunciation
                                    r   Language
     models
                l    dictionary
                                    l    models




 Needs audio data and
 matching transcriptions
                                               J    Text
                                                   output
Audio corpus


Main constraint for us !
It is a time consuming and
expensive task.
Read speech corpus
(1st solution)

Transcriptions are directly available and the
task is easy to prepare
                   BUT…
May not be natural enough, need to find
speakers willing to record

        3h30 collected this way
Crowdsourcing
transcriptions            (2ndsolution)

Amazon’s Mechanical Turk:
Tasks can be posted online and anyone can be
paid to do them.


Good enough quality     Completion rate lower
for acoustic models     than for English
Possibility to find     Ethical issues
transcribers
Only a test, 1h30 of read speech corpus
transcribed this way
Collaborative
transcriptions             (3rdsolution)



Corpus to transcribe: web broadcast news
(available online with good enough quality)

Collaboration with a Kenyan institute :
	
  
Collaborative
transcriptions           (3rdsolution)

             A 1st acoustic model (AM)
             is trained using read
             speech corpus

1st set AM
Collaborative
transcriptions     (3rdsolution)
             2hrs set
             preparation
                           A 2hrs set is
                           automatically
1st set AM                 segmented and
                           filtered
Collaborative
transcriptions            (3rdsolution)
                  2hrs set
                  preparation


                                      2hrs set
1st set AM                            transcribed


             The 2hrs set is transcribed
             using our 1st set AM
Collaborative
transcriptions           (3rdsolution)
                 2hrs set
                 preparation


                                       2hrs set
1st set AM   The 2hrs set is sent to   transcribed
             the Ta ji Institute for
             correction

                 2hrs set
                 corrected
Collaborative
transcriptions         (3rdsolution)
                2hrs set
                preparation

             After correction, data
             are added to the       2hrs set
2nd set AM   training corpus and a transcribed
             new corpus is trained


                2hrs set
                corrected
Collaborative
transcriptions     (3rdsolution)
            2hrs set
            preparation


           12 hours were   2hrs set
6th set AM                 transcribed
           transcribed

            2hrs set
            corrected
Collaborative transcriptions
                              1st set
                         40
                         40

                         35


Time
Spent
    Time Spent (hours)

                         30




(hours)                                      3rd set
                                                         5th set
                         25   2nd set
                         25




                                               4th set
                         20




                                                                        6th set
                         15
                         15




                              60
                              60        65        70
                                                  70         75    80      85
                                                                           85


                              Character Accuracy rate (%)
                                    Character Accuracy Rate (%)
System results (WER)
"
     Acoustic
                r   Pronunciation
                                    r   Language
     models
                l    dictionary
                                    l    models




 3                                             J    Text
                                                   output
Asante! (Thank you!)

    hadrien.gelas@univ-lyon2.fr

    laurent.besacier@imag.fr

    françois.pellegrino@univ-lyon2.fr

More Related Content

Similar to Developments Swahili ASR resources

ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsForward Gradient
 
Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Automatic Phonetization-based Statistical Linguistic Study of Standard ArabicAutomatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Automatic Phonetization-based Statistical Linguistic Study of Standard ArabicCSCJournals
 
Semantic vs. Statistic Language Model Expansion
Semantic vs. Statistic Language Model ExpansionSemantic vs. Statistic Language Model Expansion
Semantic vs. Statistic Language Model ExpansionYuval Krymolowski
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsshrey bhate
 
Hybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
Hybrid Phonemic and Graphemic Modeling for Arabic Speech RecognitionHybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
Hybrid Phonemic and Graphemic Modeling for Arabic Speech RecognitionWaqas Tariq
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionStephen Marquard
 
Final quantitative analysis of egyptian aphorisms by using r
Final quantitative analysis of egyptian aphorisms by using rFinal quantitative analysis of egyptian aphorisms by using r
Final quantitative analysis of egyptian aphorisms by using rAlexandria University
 
Speech and Language Processing
Speech and Language ProcessingSpeech and Language Processing
Speech and Language ProcessingVikalp Mahendra
 
Concatenative bangla speech synthesizer model
Concatenative bangla speech synthesizer modelConcatenative bangla speech synthesizer model
Concatenative bangla speech synthesizer modelAbdullah al Mamun
 
Condi Rice - American Dialect Society
Condi Rice - American Dialect SocietyCondi Rice - American Dialect Society
Condi Rice - American Dialect SocietyLauren Hall-Lew
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizyLizy Abraham
 
Speech To Sign Language Interpreter System
Speech To Sign Language Interpreter SystemSpeech To Sign Language Interpreter System
Speech To Sign Language Interpreter Systemkkkseld
 
Coms30123 Synthesis 3 Projector
Coms30123 Synthesis 3 ProjectorComs30123 Synthesis 3 Projector
Coms30123 Synthesis 3 ProjectorDr. Cupid Lucid
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingKaterina Vylomova
 

Similar to Developments Swahili ASR resources (20)

ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
 
Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Automatic Phonetization-based Statistical Linguistic Study of Standard ArabicAutomatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Semantic vs. Statistic Language Model Expansion
Semantic vs. Statistic Language Model ExpansionSemantic vs. Statistic Language Model Expansion
Semantic vs. Statistic Language Model Expansion
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Rap Lyric Generator
Rap Lyric GeneratorRap Lyric Generator
Rap Lyric Generator
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Hybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
Hybrid Phonemic and Graphemic Modeling for Arabic Speech RecognitionHybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
Hybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
 
Final quantitative analysis of egyptian aphorisms by using r
Final quantitative analysis of egyptian aphorisms by using rFinal quantitative analysis of egyptian aphorisms by using r
Final quantitative analysis of egyptian aphorisms by using r
 
Build your own ASR engine
Build your own ASR engineBuild your own ASR engine
Build your own ASR engine
 
Speech and Language Processing
Speech and Language ProcessingSpeech and Language Processing
Speech and Language Processing
 
Concatenative bangla speech synthesizer model
Concatenative bangla speech synthesizer modelConcatenative bangla speech synthesizer model
Concatenative bangla speech synthesizer model
 
Condi Rice - American Dialect Society
Condi Rice - American Dialect SocietyCondi Rice - American Dialect Society
Condi Rice - American Dialect Society
 
Asr
AsrAsr
Asr
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
Speech To Sign Language Interpreter System
Speech To Sign Language Interpreter SystemSpeech To Sign Language Interpreter System
Speech To Sign Language Interpreter System
 
Coms30123 Synthesis 3 Projector
Coms30123 Synthesis 3 ProjectorComs30123 Synthesis 3 Projector
Coms30123 Synthesis 3 Projector
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language Processing
 

Developments Swahili ASR resources