Linguistic knowledge Demonstrate how the development of technologies has an impact on the type of linguistic knowledge needed for particular applications Focus : Two approaches towards speech synthesis Unit selection / Concatenative synthesis (Rule driven) HMM-based generation synthesis (Data driven)
Unit selection / Concatenative synthesis Widely accepted and approved technique Festival Speech Synthesis System (Black et al. 1998) FestVox (Black & Lenzo, 2000) Rule based : e.g. prosodic models based on detailed data on duration, stress, intonation - major languages of the world Normally excellent (domain specific) quality of speech
Concatenative synthesis in a tone language Consider the following two words in isiXhosa where tonal movement takes place when the diminutive suffix /-ana/ is invoked: úmfúndìsì (teacher) úmfúndìsì + ana > úmfùndísànà HL > LH In order to generate a speech version of example (2) consider the following (oversimplified) description of the process:
Language Component TEXT INPUT umfundisana MORPHOLOGICAL COMPONENT u +mu +fund + is + ana # ú + mù+ fúnd + ìs + ana# LEXICON / TONAL ASSIGNMENT GRAPHEME PHONEME CONVERSION #u +mu +fund + is + ana # NORMALISATION # ú + m_+ fúnd + ìs + ana # # ú + m+ fúnd + ìs + ana # PHONOLOGICAL RULES V -> Ø / m_+ [ V root Nas-> [+ syllabic] / # V__ + C PHONETIC FORM [úmfundísa:na] PROSODIC RULES Penultimate vowel lengthening Tonal assignment HL > LH # ú + m+ fúnd + ìs + a:na # # ú + m+ fund + ís+ a:na #
Technical Component : Speech Generation SPEECH DATABASE of PRE-RECORDED UNITS: è bèsètèlèbé.., ù bùsùtùlùbú.., fúfùfáfà… (diphones, triphones………) UNIT SELECTION PROCESS Selection of applicable units from the speech database to match the required phonetic form SETS OF POTENTIAL CANDIDATES ú ndísa: m fu na [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] CONCATENATION ú m fundísa: na SMOOTHING OF JUNCTURES SPOKEN FORM [úmfundísa:na]
Availability of resources for rule based synthesis (1) Morphology? UNISA morphology project for Southern African languages of the Bantu group – great progress - implementation level Phonology ? Various studies on aspects of the phonologies of African languages within different theories – theoretical models per se have limited implementation value Phonetics? Mainly impressionistic descriptions, very little quantitative studies in formats that are suitable for speech technology applications Experimental phonetic data mainly represent laboratory based speech – read speech - lacking real world authenticity
Availability of resources for rule based synthesis (2) Prosody - Intonation? Pronunciation dictionaries Mapping: orthography to pronunciation > manual / G2P; Progress with the Lwazi project of Meraka, but problem remains – basic tonal representation Zerbian & Barnard (2008): Phonetics of intonation in Southern African Bantu languages Emphasises the need for quantitative acoustic data on intonational processes.
Nature of tonal data in Southern Bantu Impressionistic descriptions (Interpretations of researcher) Examples of inconsistencies: isiXhosa – the same speaker – three different researchers (1969, 1973 & 1992) with different tone marking for same items ! [Detailed discussions in Roux, 1991, 1995 (a) (b), 2001, 2003]
Questions Which rules are to be used for implementation in a rule-based speech synthesis system? Alternatively: When will reliable (tonal) rules be available in a useable format for speech synthesis, particularly in African tone languages? But, technologies change and give rise to new approaches to linguistic data – and the definition of linguistic knowledge
HMM-based Speech Synthesis System (HTS) 2002 First version released following pioneering work of Keiichi Tokudahttp://www.sp.nitech.ac.jp/~tokuda/
Technical detail on HMM synthesis for less resourced languages Described in Roux, JC & Visagie, AS. 2007. Data-driven approach to rapid prototyping Xhosa speech synthesis, Proceedings of the 6th ISCA Speech Synthesis Workshop, Bonn, Germany, pp 143-147 Maia, R, Zen, H, Tokuda, K, Kitamura, T & Resende, FGV. 2003. Towards the development of a Brazilian Portuguese Text-to-Speech system based on HMM, Eurospeech, Geneva, pp 2465-68
Types of knowledge involved (3) Point is: Fine grained phonetic data / tonological rules are not necessarily required for generating plausible intelligible speech – as long as the carriers of those information are present in the text data (and corresponding speech recordings) used for training the system
Characteristics of text-to-speech (TTS) systems
Application of HTS for TTS development in under resourced languages
African language TTS embedded on mobile devices ISIXHOSA TEXT-TO-SPEECH CONVERTER [Embedded into the system] [Also English TTS] ISIXHOSA TEXTS [English texts] Job training material / manuals Literacy and numeracy training material Terminologies / definitions Multilingual Communication Phrases in different environments, etc [REFERENCE TOOL] SPEECH : ISIXHOSA / ENGLISH
Implications for (phonetic) knowledge development
Pragmatic considerations: sustainability in African context?
References Black, A. 2006. Multilingual speech synthesis. In Schultz T & Kirchhoff, K (eds) Multilingual Speech Processing. Amsterdam; ELSEVIER. Louw, JA, Davel, M & Barnard, E. 2005. A general-purpose isiZulu speech synthesiser. South African Journal of African Languages, 25(2): 92-100. Roux, JC. 1991. On the integration of phonetics and phonology, South African Journal of Linguistics, vol 11:34-52 Roux, JC. 1995a. Prosodic data and phonological analyses in Zulu and Xhosa. South African Journal of African Languages, 15.1:19-28 Roux, JC. 1995b. On the perception and production of tone in Xhosa. South African Journal of African Languages, 15.4: 196-203
References (2) Roux, JC. 2001. Comments on “Zulu tonology and its relationship to other Nguni languages” by Cassimjee and Kisseberth. Proceedings of the Symposium Cross-Linguistic Studies of Tonal Phenomena. Ed. S Kaji. Tokyo. pp. 361-367 Roux, JC. 2003. On the perception and description of tone in the Sotho and Nguni languages. Proceedings of the Symposium Cross-Linguistic Studies of Tonal Phenomena: historical development, phonetics of tone and descriptive studies. Ed. S Kaji. Tokyo. pp. 155-176 Zerbian, S & Barnard, E. 2008. Phonetics of intonation in South African Bantu languages. Southern African Linguistics and Applied Language Studies, 26(2): 235-254
Why mobile platform? “Africa has become the fastest growing mobile market in the world with mobile penetration in the region ranging from 100% to 30%.” http://whiteafrican.com/2008/08/01/2007-african-mobile-phone-statistics
Why speech output? Speech is a natural way of communication - hence, in this context, it provides access to information in a language of choice to non-literates as well a literates. Acutely aware of the low literacy rate in many parts of Africa.