AFLAT 2010 – Malta<br />Copyright JC Roux<br />
Introduction <br />
Aims<br />
Multidisciplinary nature of our activities<br />
Interaction between linguists and engineers / computer scientists<br />“Every time I fire a linguist, the performance of o...
Striking a balance<br />
Factors<br />
Linguistic knowledge <br />Demonstrate how the development of technologies has an impact on the type of linguistic knowled...
Unit selection / Concatenative synthesis<br />Widely accepted and approved technique<br />Festival Speech Synthesis System...
Concatenative synthesis in a tone language<br />Consider the following two words in isiXhosa where tonal movement takes pl...
Language Component<br />TEXT INPUT<br />umfundisana<br />MORPHOLOGICAL COMPONENT <br />u +mu +fund + is + ana<br /># ú + m...
Technical Component : Speech Generation<br />SPEECH DATABASE of PRE-RECORDED UNITS:<br />è bèsètèlèbé.., ù bùsùtùlùbú.., f...
Availability of resources for rule based synthesis (1)<br />Morphology?<br />UNISA morphology project for Southern African...
Availability of resources for rule based synthesis (2)<br />Prosody - Intonation?<br />Pronunciation dictionaries<br />Map...
Nature of tonal data in Southern Bantu  <br />Impressionistic descriptions (Interpretations of researcher)<br />Examples o...
Questions  <br />Which rules are to be used for implementation in a rule-based speech synthesis system?<br />Alternatively...
HMM-based Speech Synthesis System (HTS)<br />2002 First version released following pioneering work of Keiichi Tokudahttp:/...
Technical detail on HMM synthesis for less resourced languages   <br />Described in<br />Roux, JC &  Visagie, AS. 2007.   ...
Types of knowledge involved  (1)   <br />
Types of knowledge involved  (2)   <br />
Types of knowledge involved  (3)   <br />Point is:<br />Fine grained phonetic data / tonological rules are not necessarily...
Characteristics of text-to-speech (TTS) systems<br />
Examples <br />isiXhosa<br /><ul><li>based on 43 minutes of actual recorded speech
no tonal information included
3 339 words</li></ul>SA English<br /><ul><li>based on 140 minutes of actual recorded speech</li></li></ul><li>Kodwandiyayi...
Kubonisa (uku…)<br />
English<br />
Application of HTS  for TTS  development  in under resourced languages <br />
African language TTS embedded on mobile devices<br />ISIXHOSA <br />TEXT-TO-SPEECH  CONVERTER [Embedded into the system]<b...
Implications for (phonetic) knowledge development<br />
Pragmatic considerations: sustainability in African context? <br />
In conclusion <br />
References<br />Black, A. 2006.  Multilingual speech synthesis.  In Schultz T & Kirchhoff, K (eds) Multilingual Speech Pro...
References (2)<br />Roux, JC. 2001. Comments on “Zulu tonology and its relationship to other Nguni languages” by Cassimjee...
Upcoming SlideShare
Loading in …5
×

Do we need linguistic knowledge for speech technology applications in African languages?

1,913 views

Published on

© Justus Roux

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,913
On SlideShare
0
From Embeds
0
Number of Embeds
727
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Do we need linguistic knowledge for speech technology applications in African languages?

  1. 1. AFLAT 2010 – Malta<br />Copyright JC Roux<br />
  2. 2. Introduction <br />
  3. 3. Aims<br />
  4. 4. Multidisciplinary nature of our activities<br />
  5. 5. Interaction between linguists and engineers / computer scientists<br />“Every time I fire a linguist, the performance of our speech recognition system goes up.” <br />
  6. 6. Striking a balance<br />
  7. 7. Factors<br />
  8. 8. Linguistic knowledge <br />Demonstrate how the development of technologies has an impact on the type of linguistic knowledge needed for particular applications<br />Focus : Two approaches towards speech synthesis<br /> Unit selection / Concatenative synthesis (Rule driven)<br /> HMM-based generation synthesis (Data driven) <br />
  9. 9. Unit selection / Concatenative synthesis<br />Widely accepted and approved technique<br />Festival Speech Synthesis System (Black et al. 1998)<br />FestVox (Black & Lenzo, 2000)<br />Rule based : e.g. prosodic models based on detailed data on duration, stress, intonation - major languages of the world <br />Normally excellent (domain specific) quality of speech<br />
  10. 10. Concatenative synthesis in a tone language<br />Consider the following two words in isiXhosa where tonal movement takes place when the diminutive suffix /-ana/ is invoked:<br />úmfúndìsì (teacher)<br />úmfúndìsì + ana > úmfùndísànà HL > LH <br />In order to generate a speech version of example (2) consider the following (oversimplified) description of the process:<br />
  11. 11. Language Component<br />TEXT INPUT<br />umfundisana<br />MORPHOLOGICAL COMPONENT <br />u +mu +fund + is + ana<br /># ú + mù+ fúnd + ìs + ana#<br />LEXICON / TONAL ASSIGNMENT<br />GRAPHEME PHONEME CONVERSION<br />#u +mu +fund + is + ana # <br />NORMALISATION <br /># ú + m_+ fúnd + ìs + ana #<br /># ú + m+ fúnd + ìs + ana #<br />PHONOLOGICAL RULES<br />V -> Ø / m_+ [ V root<br />Nas-> [+ syllabic] / # V__ + C<br />PHONETIC FORM<br />[úmfundísa:na]<br />PROSODIC RULES<br />Penultimate vowel lengthening<br />Tonal assignment HL > LH<br /># ú + m+ fúnd + ìs + a:na #<br /># ú + m+ fund + ís+ a:na #<br />
  12. 12. Technical Component : Speech Generation<br />SPEECH DATABASE of PRE-RECORDED UNITS:<br />è bèsètèlèbé.., ù bùsùtùlùbú.., fúfùfáfà… (diphones, triphones………)<br />UNIT SELECTION PROCESS<br />Selection of applicable units from the speech database to match the required phonetic form <br />SETS OF POTENTIAL CANDIDATES<br />ú ndísa: m fu na<br />[ ] [ ] [ ] [ ] [ ] [ ]<br />[ ] [ ] [ ] [ ] [ ] [ ]<br />CONCATENATION<br />ú m fundísa: na<br />SMOOTHING OF JUNCTURES<br />SPOKEN FORM [úmfundísa:na]<br />
  13. 13. Availability of resources for rule based synthesis (1)<br />Morphology?<br />UNISA morphology project for Southern African languages of the Bantu group – great progress - implementation level<br />Phonology ?<br />Various studies on aspects of the phonologies of African languages within different theories – theoretical models per se have limited implementation value<br />Phonetics?<br />Mainly impressionistic descriptions, very little quantitative studies in formats that are suitable for speech technology applications<br />Experimental phonetic data mainly represent laboratory based speech – read speech - lacking real world authenticity<br />
  14. 14. Availability of resources for rule based synthesis (2)<br />Prosody - Intonation?<br />Pronunciation dictionaries<br />Mapping: orthography to pronunciation > manual / G2P; Progress with the Lwazi project of Meraka, but problem remains – basic tonal representation<br />Zerbian & Barnard (2008): Phonetics of intonation in Southern African Bantu languages <br />Emphasises the need for quantitative acoustic data on intonational processes.<br />
  15. 15. Nature of tonal data in Southern Bantu <br />Impressionistic descriptions (Interpretations of researcher)<br />Examples of inconsistencies: isiXhosa – the same speaker – three different researchers (1969, 1973 & 1992) with different tone marking for same items !<br />[Detailed discussions in Roux, 1991, 1995 (a) (b), 2001, 2003]<br />
  16. 16. Questions <br />Which rules are to be used for implementation in a rule-based speech synthesis system?<br />Alternatively: <br /> When will reliable (tonal) rules be available in a useable format for speech synthesis, particularly in African tone languages?<br />But, technologies change and give rise to new approaches to linguistic data – and the definition of linguistic knowledge <br />
  17. 17. HMM-based Speech Synthesis System (HTS)<br />2002 First version released following pioneering work of Keiichi Tokudahttp://www.sp.nitech.ac.jp/~tokuda/<br />
  18. 18. Technical detail on HMM synthesis for less resourced languages <br />Described in<br />Roux, JC & Visagie, AS. 2007. Data-driven approach to rapid prototyping Xhosa speech synthesis, Proceedings of the 6th ISCA Speech Synthesis Workshop, Bonn, Germany, pp 143-147<br />Maia, R, Zen, H, Tokuda, K, Kitamura, T & Resende, FGV. 2003. Towards the development of a Brazilian Portuguese Text-to-Speech system based on HMM, Eurospeech, Geneva, pp 2465-68 <br />
  19. 19. Types of knowledge involved (1) <br />
  20. 20. Types of knowledge involved (2) <br />
  21. 21. Types of knowledge involved (3) <br />Point is:<br />Fine grained phonetic data / tonological rules are not necessarily required for generating plausible intelligible speech – as long as the carriers of those information are present in the text data (and corresponding speech recordings) used for training the system<br />
  22. 22. Characteristics of text-to-speech (TTS) systems<br />
  23. 23. Examples <br />isiXhosa<br /><ul><li>based on 43 minutes of actual recorded speech
  24. 24. no tonal information included
  25. 25. 3 339 words</li></ul>SA English<br /><ul><li>based on 140 minutes of actual recorded speech</li></li></ul><li>Kodwandiyayithanda lo nto<br />
  26. 26. Kubonisa (uku…)<br />
  27. 27. English<br />
  28. 28. Application of HTS for TTS development in under resourced languages <br />
  29. 29. African language TTS embedded on mobile devices<br />ISIXHOSA <br />TEXT-TO-SPEECH CONVERTER [Embedded into the system]<br />[Also English TTS]<br />ISIXHOSA TEXTS <br />[English texts]<br />Job training material / manuals<br />Literacy and numeracy training material<br />Terminologies / definitions<br />Multilingual Communication Phrases in different environments, etc<br />[REFERENCE TOOL]<br />SPEECH : ISIXHOSA / ENGLISH<br />
  30. 30. Implications for (phonetic) knowledge development<br />
  31. 31. Pragmatic considerations: sustainability in African context? <br />
  32. 32. In conclusion <br />
  33. 33. References<br />Black, A. 2006. Multilingual speech synthesis. In Schultz T & Kirchhoff, K (eds) Multilingual Speech Processing. Amsterdam; ELSEVIER. <br />Louw, JA, Davel, M & Barnard, E. 2005. A general-purpose isiZulu speech synthesiser. South African Journal of African Languages, 25(2): 92-100. <br />Roux, JC. 1991. On the integration of phonetics and phonology, South African Journal of Linguistics, vol 11:34-52<br />Roux, JC. 1995a. Prosodic data and phonological analyses in Zulu and Xhosa. South African Journal of African Languages, 15.1:19-28<br />Roux, JC. 1995b. On the perception and production of tone in Xhosa. South African Journal of African Languages, 15.4: 196-203<br />
  34. 34. References (2)<br />Roux, JC. 2001. Comments on “Zulu tonology and its relationship to other Nguni languages” by Cassimjee and Kisseberth. Proceedings of the Symposium Cross-Linguistic Studies of Tonal Phenomena. Ed. S Kaji. Tokyo. pp. 361-367<br />Roux, JC. 2003. On the perception and description of tone in the Sotho and Nguni languages. Proceedings of the Symposium Cross-Linguistic Studies of Tonal Phenomena: historical development, phonetics of tone and descriptive studies. Ed. S Kaji. Tokyo. pp. 155-176<br />Zerbian, S & Barnard, E. 2008. Phonetics of intonation in South African Bantu languages. Southern African Linguistics and Applied Language Studies, 26(2): 235-254<br />
  35. 35.
  36. 36. Mobile Platforms<br />
  37. 37. Why mobile platform?<br />“Africa has become the fastest growing mobile market in the world with mobile penetration in the region ranging from 100% to 30%.”<br />http://whiteafrican.com/2008/08/01/2007-african-mobile-phone-statistics<br />
  38. 38.
  39. 39. Why speech output?<br /> Speech is a natural way of communication - hence, in this context, it provides access to information in a language of choice to non-literates as well a literates.<br /> Acutely aware of the low literacy rate in many parts of Africa.<br />

×