Your SlideShare is downloading. ×
Do we need linguistic knowledge for speech technology applications in African languages?
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Do we need linguistic knowledge for speech technology applications in African languages?

1,514
views

Published on

© Justus Roux

© Justus Roux

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,514
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. AFLAT 2010 – Malta
    Copyright JC Roux
  • 2. Introduction
  • 3. Aims
  • 4. Multidisciplinary nature of our activities
  • 5. Interaction between linguists and engineers / computer scientists
    “Every time I fire a linguist, the performance of our speech recognition system goes up.”
  • 6. Striking a balance
  • 7. Factors
  • 8. Linguistic knowledge
    Demonstrate how the development of technologies has an impact on the type of linguistic knowledge needed for particular applications
    Focus : Two approaches towards speech synthesis
    Unit selection / Concatenative synthesis (Rule driven)
    HMM-based generation synthesis (Data driven)
  • 9. Unit selection / Concatenative synthesis
    Widely accepted and approved technique
    Festival Speech Synthesis System (Black et al. 1998)
    FestVox (Black & Lenzo, 2000)
    Rule based : e.g. prosodic models based on detailed data on duration, stress, intonation - major languages of the world
    Normally excellent (domain specific) quality of speech
  • 10. Concatenative synthesis in a tone language
    Consider the following two words in isiXhosa where tonal movement takes place when the diminutive suffix /-ana/ is invoked:
    úmfúndìsì (teacher)
    úmfúndìsì + ana > úmfùndísànà HL > LH
    In order to generate a speech version of example (2) consider the following (oversimplified) description of the process:
  • 11. Language Component
    TEXT INPUT
    umfundisana
    MORPHOLOGICAL COMPONENT
    u +mu +fund + is + ana
    # ú + mù+ fúnd + ìs + ana#
    LEXICON / TONAL ASSIGNMENT
    GRAPHEME PHONEME CONVERSION
    #u +mu +fund + is + ana #
    NORMALISATION
    # ú + m_+ fúnd + ìs + ana #
    # ú + m+ fúnd + ìs + ana #
    PHONOLOGICAL RULES
    V -> Ø / m_+ [ V root
    Nas-> [+ syllabic] / # V__ + C
    PHONETIC FORM
    [úmfundísa:na]
    PROSODIC RULES
    Penultimate vowel lengthening
    Tonal assignment HL > LH
    # ú + m+ fúnd + ìs + a:na #
    # ú + m+ fund + ís+ a:na #
  • 12. Technical Component : Speech Generation
    SPEECH DATABASE of PRE-RECORDED UNITS:
    è bèsètèlèbé.., ù bùsùtùlùbú.., fúfùfáfà… (diphones, triphones………)
    UNIT SELECTION PROCESS
    Selection of applicable units from the speech database to match the required phonetic form
    SETS OF POTENTIAL CANDIDATES
    ú ndísa: m fu na
    [ ] [ ] [ ] [ ] [ ] [ ]
    [ ] [ ] [ ] [ ] [ ] [ ]
    CONCATENATION
    ú m fundísa: na
    SMOOTHING OF JUNCTURES
    SPOKEN FORM [úmfundísa:na]
  • 13. Availability of resources for rule based synthesis (1)
    Morphology?
    UNISA morphology project for Southern African languages of the Bantu group – great progress - implementation level
    Phonology ?
    Various studies on aspects of the phonologies of African languages within different theories – theoretical models per se have limited implementation value
    Phonetics?
    Mainly impressionistic descriptions, very little quantitative studies in formats that are suitable for speech technology applications
    Experimental phonetic data mainly represent laboratory based speech – read speech - lacking real world authenticity
  • 14. Availability of resources for rule based synthesis (2)
    Prosody - Intonation?
    Pronunciation dictionaries
    Mapping: orthography to pronunciation > manual / G2P; Progress with the Lwazi project of Meraka, but problem remains – basic tonal representation
    Zerbian & Barnard (2008): Phonetics of intonation in Southern African Bantu languages
    Emphasises the need for quantitative acoustic data on intonational processes.
  • 15. Nature of tonal data in Southern Bantu
    Impressionistic descriptions (Interpretations of researcher)
    Examples of inconsistencies: isiXhosa – the same speaker – three different researchers (1969, 1973 & 1992) with different tone marking for same items !
    [Detailed discussions in Roux, 1991, 1995 (a) (b), 2001, 2003]
  • 16. Questions
    Which rules are to be used for implementation in a rule-based speech synthesis system?
    Alternatively:
    When will reliable (tonal) rules be available in a useable format for speech synthesis, particularly in African tone languages?
    But, technologies change and give rise to new approaches to linguistic data – and the definition of linguistic knowledge
  • 17. HMM-based Speech Synthesis System (HTS)
    2002 First version released following pioneering work of Keiichi Tokudahttp://www.sp.nitech.ac.jp/~tokuda/
  • 18. Technical detail on HMM synthesis for less resourced languages
    Described in
    Roux, JC & Visagie, AS. 2007. Data-driven approach to rapid prototyping Xhosa speech synthesis, Proceedings of the 6th ISCA Speech Synthesis Workshop, Bonn, Germany, pp 143-147
    Maia, R, Zen, H, Tokuda, K, Kitamura, T & Resende, FGV. 2003. Towards the development of a Brazilian Portuguese Text-to-Speech system based on HMM, Eurospeech, Geneva, pp 2465-68
  • 19. Types of knowledge involved (1)
  • 20. Types of knowledge involved (2)
  • 21. Types of knowledge involved (3)
    Point is:
    Fine grained phonetic data / tonological rules are not necessarily required for generating plausible intelligible speech – as long as the carriers of those information are present in the text data (and corresponding speech recordings) used for training the system
  • 22. Characteristics of text-to-speech (TTS) systems
  • 23. Examples
    isiXhosa
    • based on 43 minutes of actual recorded speech
    • 24. no tonal information included
    • 25. 3 339 words
    SA English
    • based on 140 minutes of actual recorded speech
  • Kodwandiyayithanda lo nto
  • 26. Kubonisa (uku…)
  • 27. English
  • 28. Application of HTS for TTS development in under resourced languages
  • 29. African language TTS embedded on mobile devices
    ISIXHOSA
    TEXT-TO-SPEECH CONVERTER [Embedded into the system]
    [Also English TTS]
    ISIXHOSA TEXTS
    [English texts]
    Job training material / manuals
    Literacy and numeracy training material
    Terminologies / definitions
    Multilingual Communication Phrases in different environments, etc
    [REFERENCE TOOL]
    SPEECH : ISIXHOSA / ENGLISH
  • 30. Implications for (phonetic) knowledge development
  • 31. Pragmatic considerations: sustainability in African context?
  • 32. In conclusion
  • 33. References
    Black, A. 2006. Multilingual speech synthesis. In Schultz T & Kirchhoff, K (eds) Multilingual Speech Processing. Amsterdam; ELSEVIER.
    Louw, JA, Davel, M & Barnard, E. 2005. A general-purpose isiZulu speech synthesiser. South African Journal of African Languages, 25(2): 92-100.
    Roux, JC. 1991. On the integration of phonetics and phonology, South African Journal of Linguistics, vol 11:34-52
    Roux, JC. 1995a. Prosodic data and phonological analyses in Zulu and Xhosa. South African Journal of African Languages, 15.1:19-28
    Roux, JC. 1995b. On the perception and production of tone in Xhosa. South African Journal of African Languages, 15.4: 196-203
  • 34. References (2)
    Roux, JC. 2001. Comments on “Zulu tonology and its relationship to other Nguni languages” by Cassimjee and Kisseberth. Proceedings of the Symposium Cross-Linguistic Studies of Tonal Phenomena. Ed. S Kaji. Tokyo. pp. 361-367
    Roux, JC. 2003. On the perception and description of tone in the Sotho and Nguni languages. Proceedings of the Symposium Cross-Linguistic Studies of Tonal Phenomena: historical development, phonetics of tone and descriptive studies. Ed. S Kaji. Tokyo. pp. 155-176
    Zerbian, S & Barnard, E. 2008. Phonetics of intonation in South African Bantu languages. Southern African Linguistics and Applied Language Studies, 26(2): 235-254
  • 35.
  • 36. Mobile Platforms
  • 37. Why mobile platform?
    “Africa has become the fastest growing mobile market in the world with mobile penetration in the region ranging from 100% to 30%.”
    http://whiteafrican.com/2008/08/01/2007-african-mobile-phone-statistics
  • 38.
  • 39. Why speech output?
    Speech is a natural way of communication - hence, in this context, it provides access to information in a language of choice to non-literates as well a literates.
    Acutely aware of the low literacy rate in many parts of Africa.