Improvement in Quality of Speech associated with Braille codes - A Review


Published on

J. Anurag, P. Nupur and Agrawal, S.S.
School of Information Technology, Guru Gobind Singh Indraprastha University, Delhi, India
Centre for Development of Advanced Computing, Noida, India

Published in: Technology
1 Comment
1 Like
  • We want to net work with you Anuraj Jain about our Film 'JAIN ENLIGHTENMENT - A Way of Life' for America and the world

    We are New York documentary filmmakers who have produced a beautiful 10 min DEMO film 'JAIN ENLIGHTENMENT - A Way of Life' and also working on 'Palitana - City of Temples on the Hill' to inform and educate America about Ahimsa, Anekantvad, Aparigrah ... involving Forgiveness, Compassion, and Peace.

    We returned from India with over 200 hours of film and are also producing a series of films on Legend of Lord Bahubali; King Adhinathan, Lord Mahavira and Sacred Pilgrimages - including Ranakpur, Ellora etc. to show Jain Images of Perfection.

    Vinanti Sarkar,Director, Global Cultural Diversity Films (GCDF) Inc. 425 East 51st Street, New York, NY 10022. Tel: 22-759-4568 Website: Linkedin / Twitter / Facebook / MySpace, etc. Review short clips on 5084696 or 5084856 or 50864417 or 5092260 or 5092316 and join our discussions on blog: ttp:// where we are inviting donors to help in funding and receive free DVDs in return.
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Improvement in Quality of Speech associated with Braille codes - A Review

  1. 1. Improvement in Quality of Speech associated with Braille codes- A Review Anurag Jain, Nupur Prakash School of Information Technology, Guru Gobind Singh Indraprastha University, Delhi, India S.S. Agrawal Centre for Development of Advanced Computing, Noida, India
  2. 2. Abstract <ul><li>Reviewing of different technologies used to improve the quality of speech </li></ul><ul><li>Broad technology used </li></ul><ul><ul><li>Natural Language processing (NLP) </li></ul></ul><ul><ul><li>Text to Speech synthesis (TTS) </li></ul></ul><ul><li>Summarizes the future extension of existing technology </li></ul><ul><li>Befits to Persons with Visual Impairment (PVIs) </li></ul>
  3. 3. Introduction <ul><li>For PVIs </li></ul><ul><ul><li>According to WHO report </li></ul></ul><ul><ul><ul><li>40-45 million sightless </li></ul></ul></ul><ul><ul><ul><li>135 million low vision people </li></ul></ul></ul><ul><ul><li>This population is known as PVIs </li></ul></ul><ul><ul><li>They unable to gain from IT due to lack of availability of suitable technology </li></ul></ul><ul><ul><li>Requirement of regional language based information system. </li></ul></ul>
  4. 4. Introduction…… <ul><li>Major focus on </li></ul><ul><ul><li>English – most used and spoken language </li></ul></ul><ul><ul><li>Spanish- easy grapheme to phoneme rule. </li></ul></ul><ul><ul><li>Very few work on Indian Languages. </li></ul></ul><ul><li>Development of System having </li></ul><ul><ul><li>User friendly environment </li></ul></ul><ul><ul><li>Excellent quality of speech </li></ul></ul>
  5. 5. Introduction… <ul><li>Two ways for understanding the text by PVIs </li></ul><ul><ul><li>Braille coded text </li></ul></ul><ul><ul><ul><li>Transliteration of each file- busy task </li></ul></ul></ul><ul><ul><li>Speech system with correct pronunciation </li></ul></ul><ul><ul><ul><li>better as compared to transliteration </li></ul></ul></ul>
  6. 6. Introduction… <ul><li>For correct pronunciation and quality of speech- areas reviewed </li></ul><ul><ul><li>NLP </li></ul></ul><ul><ul><li>TTS </li></ul></ul><ul><ul><li>Braille code </li></ul></ul>
  7. 7. Introduction… <ul><li>(Q system ) PVI = (N+D)*B </li></ul><ul><ul><li>where </li></ul></ul><ul><ul><ul><li>(Q system) PVI -Quality of Speech system used by PVIs </li></ul></ul></ul><ul><ul><ul><li>N - phases of NLP </li></ul></ul></ul><ul><ul><ul><li>D - techniques/phases of TTS using Digital Signal Processing (DSP) </li></ul></ul></ul><ul><ul><ul><li>B - language structure of Braille Codes/ Indian languages </li></ul></ul></ul>
  8. 8. Introduction…. <ul><li>B- constant term </li></ul><ul><li>So Q system become the task of – </li></ul><ul><ul><li>Improving NLP techniques </li></ul></ul><ul><ul><li>Improving TTS techniques </li></ul></ul>
  9. 9. Review Work <ul><li>Under NLP </li></ul><ul><ul><li>Schwa Deletion </li></ul></ul><ul><ul><li>Word Sense Disambiguation </li></ul></ul><ul><li>Under TTS </li></ul><ul><ul><li>Prosody Parser </li></ul></ul><ul><ul><li>Synthesis strategies </li></ul></ul>
  10. 10. Schwa Deletion <ul><li>problem of schwa deletion in </li></ul><ul><li>languages like </li></ul><ul><ul><li>French </li></ul></ul><ul><ul><li>Dutch </li></ul></ul><ul><ul><li>English </li></ul></ul><ul><ul><li>Indian Languages </li></ul></ul><ul><li>In French and Dutch Language – optional and depends on the speaker and its context </li></ul>
  11. 11. Schwa Deletion…. <ul><li>[Fourgereon et al., 97] - relationship between schwa deletion for French and neutralization of lexical distinction </li></ul><ul><li>[Travel et al., 99] suggests the optionality of schwa deletion for French language </li></ul><ul><li>[Narsimhan et al., 2001] has developed computational models and combines Ohala’s work (1983). </li></ul><ul><li>[M. Choudhary et al., 2002] has proposed a rule based schwa deletion for Hindi. </li></ul>
  12. 12. Word Sense Disambiguation (WSD) <ul><li>automatic disambiguation of word senses has been a matter of great interest </li></ul><ul><li>Applications where it requires </li></ul><ul><ul><li>Machine translation: </li></ul></ul><ul><ul><li>Information retrieval and hypertext navigation </li></ul></ul><ul><ul><li>Intelligent Search Engines using Natural Language Interface (NLI) </li></ul></ul><ul><ul><li>Content and thematic analysis </li></ul></ul><ul><ul><li>Grammatical analysis </li></ul></ul><ul><ul><li>Speech processing </li></ul></ul><ul><ul><li>Text processing </li></ul></ul>
  13. 13. Word Sense Disambiguation… <ul><li>WSD was firstly suggested by [Warren Weaver 55] by looking at a small window around a word . </li></ul><ul><li>[Ng et al., 97] provides a more focused review from a machine learning perspective. </li></ul><ul><li>Description about corpus based experiment is given by [Wilks et al., 96] </li></ul><ul><li>[Ide et al., 98] has given a comprehensive review of the history and current state of WSD </li></ul>
  14. 14. Word Sense Disambiguation… <ul><li>Recently other approaches have also been proposed where translation approach permits multiple interpretations to be processed through the system, and to use context to disambiguate between alternatives in the final stage of the process, where knowledge can be exploited to the fullest. </li></ul>
  15. 15. Prosody Parser <ul><li>To provide the same functionality of punctuation marks, stopping, word boundary etc as in text, we require the same phenomena when speech is produced through any TTS system </li></ul>
  16. 16. Prosody Parser <ul><li>[Gregory et. al 2004] and [Kahn et. al 2004]. incorporated lexical and syntactic features as word tokens to their parsing models. </li></ul><ul><li>[Colins 2000] proposed a model for parsing reranking . </li></ul><ul><li>[Charnick and Johnson 2001] proposed a model again to get the prosodic ques using filtering out edit regions. </li></ul><ul><li>[Shriberg 1994] analyzed the conversational speech to identify the prosodic features and gave a good understanding on speech repairs. </li></ul><ul><li>[Core and Schubert, 1999] [Charniak and Johnson 2001]; [Engel et al., 2002] models are in focus for speech repairs. </li></ul>
  17. 17. Synthesis Strategy <ul><li>Two main classes of TTS system have emerged for providing synthesis analogy </li></ul><ul><ul><li>Rule based synthesis -to use generative approach of the phonation mechanism. </li></ul></ul><ul><ul><li>Concatenation based synthesis - speech signals may be encoded by speech models, these models are required to ensure that the concatenation of selected acoustic unit to the text. </li></ul></ul>
  18. 18. Synthesis Strategy… <ul><li>Methods of representation and concatenation of acoustic units- </li></ul><ul><ul><li>Time domain–Pitch Synchronous Overlap Add (TD-PSOLA) </li></ul></ul><ul><ul><li>MBROLA also known as MBR-PSOLA </li></ul></ul><ul><ul><li>Sinusoidal model </li></ul></ul><ul><ul><li>Linear Prediction based Concatenation (LPC) </li></ul></ul>
  19. 19. Synthesis Strategy… <ul><li>TD-PSOLA - a pitch Synchronous “analysis” and synthesis of speech. </li></ul><ul><li>MBROLA - resynthesizing voiced part of the speech data base with constant phase and constant pitch. . </li></ul><ul><li>Sinusoidal models - for synthesis by making use of an estimator of glottal closure instants. </li></ul><ul><li>LPC based methods - if the interaction of the excitation signal and the vocal tract filter is not taken into account, the modified speech signal is degraded. </li></ul>
  20. 20. Synthesis Strategy… <ul><li>[Jean Lavoche 2000] presents an improved technique based on the non overlapping inverse Fourier transform generated short term signals concatenation. </li></ul><ul><li>[Darragh 2001] describes the concatenation synthesis based on Harmonic model to solve the problem of inconsistency of stored unit’s prosody with that of target utterances and also solved the problem of mismatching at unit boundaries. </li></ul>
  21. 21. Discussion <ul><li>Under Schwa Deletion few modifications can also be equipped with speech systems like online Hindi Lexicon </li></ul><ul><li>Segmentation is really the most important contribution of prosody, Further improvements are therefore very likely. </li></ul><ul><li>Development of rules for bringing naturalness to the output synthetic speech is a continuing process and it is largely language dependent, a lot of work has still needs to be done specially for high quality synthesis of diphthongs, glides, consonant clusters and the prosodic variations. </li></ul>
  22. 22. Conclusion and future work <ul><li>Since PVIs are unable to see the text with exact expressions, high level of correctness is required with high prosody ques. </li></ul><ul><li>To design a speech synthesis system for PVIs, requires the main focus on s peech quality </li></ul><ul><li>To incorporate all the above mentioned techniques in the same system is really a tedious job, but on the other way it is useful for PVIs. </li></ul><ul><li>The system must be allow to support the following features like- </li></ul><ul><ul><li>Reading text with high prosodic ques </li></ul></ul><ul><ul><li>Identifying punctuation and word boundaries </li></ul></ul><ul><ul><li>Correct context of the word on sentence level. </li></ul></ul>
  23. 23. Acknowledgment <ul><li>The Authors wish to thanks Guru Gobind Singh Indraprastha University, Delhi (India), CDac Noida (India) and Indian Institute of Technology , Delhi (India) to given support to facilitate their Central Libraries for arranging the Research papers from Journal’s and Conference’s Proceedings, and all the visually handicapped users who gave their valuable suggestions to analyze the existing system </li></ul>
  24. 24. THANK YOU