Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
174
DEVICE FOR TEXT TO SPEECH PRODUCTION AND TO
BRAILLE SCRIPT
Hima Pradeep V, Jeevan K M, Miji Jacob
Department of Electronics and Communication Engineering, Sree Narayana Gurukulam College of Engineering,
Kadayiruppu, Kolencherry, India
ABSTRACT
Writing is a very effective means of communicating our thoughts to people. We use scripts provided by the
language to convey our thoughts to paper. However, in case of people who don’t have the sense of vision, they use a
different type of script, known as Braille, named after its founder, Louis Braille. It is unlike the scripts that sighted use
for writing. The current methods by which the unsighted and deaf are able to communicate are few, and all have serious
drawbacks. These people completely depend on Braille and Audio recordings provided. Audio recordings provided are
limited. Here we attempt to devise a system that will take the image of the text and convert it into speech& propose a
system which will take image of the text and convert it to Braille script. We hope that this system will be helpful in
bridging this communication gap that exists between sighted & non-sighted people. In this system MATLAB is used to
process the image & speech signals.
Keywords: Braille Script, Deaf Person, Image Acquisition, Threshold Value, Text-to-Speech.
1. INTRODUCTION
The learning process for the unsighted and deaf person is a difficult task. The current methods by which the
unsighted and deaf are able to communicate are few, & all have serious drawbacks. Braille writing is a widely spread
means of communication for blind or partially sighted people. It consists of a system of six or eight possible dot
combinations that are arranged in a fixed matrix, called a cell. Every dot can be set or cleared, giving 61 combinations in
six-dot & 256 combinations in eight-dot Braille. All dots of a Braille page should fall on the intersections of an
orthogonal grid. When texts are printed double-side (recto-verso), the grid of the verso text is shifted so that its dots fall
in between the recto dots. Braille has a low information density. An average page of 25x 29cm, can have 32 characters
on a line & 27 lines in a page. A typical dot has a diameter of 1.8 mm.
This paper presents a solution to such a problem, makes learning process for an unsighted & deaf person more
easier. As all textbooks will not be available in Braille script as well as Audio recordings of all textbooks are not
available. We will take the image of content in the textbook and it will be reproduced as sound for persons who are only
blind and as Braille script for persons who are both blind & deaf.
The remainder of this paper is organized as: Section 2 describes the block diagram for proposed solution and
section 3 describes software implementation and results. Section 4 concludes the paper.
INTERNATIONAL JOURNAL OF ELECTRONICS AND
COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)
ISSN 0976 – 6464(Print)
ISSN 0976 – 6472(Online)
Volume 5, Issue 12, December (2014), pp. 174-179
© IAEME: http://www.iaeme.com/IJECET.asp
Journal Impact Factor (2014): 7.2836 (Calculated by GISI)
www.jifactor.com
IJECET
© I A E M E
Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
175
2. DESCRIPTION OF THE PROPOSED SYSTEM
Fig.1 shows the basic block diagram of device to convert text to speech. The image of text is captured by
camera using image acquisition. The contrast adjustment is done using image enhancement technique. Filtering is done
for noise reduction. The edges in the image is determined with the help of edge detection methods, hence finding the
boundaries. Cropping is done here. The text present in the image are segmented into separate letters & extracted letters
is compared with the letters early stored in the system for character recognition. We use correlation matching technique
for the purpose. The corresponding letter is played. Here the letters obtained are separated to a words. We set a threshold
value for space, if value obtained is greater than threshold value it is considered as letter else space &thus separation of
words take place. Text-to-speech (TTS) synthesizer would start with the words in the text, convert each word one-by-one
into speech, & concatenate the result together. Thus the voice is produced from a text.
S SPEAKER
Figure 1: Block diagram for text to speech production
Fig.2 shows the proposed block diagram of device to convert text to Braille script. The image of text is captured
by camera using image acquisition. The contrast adjustment is done using image enhancement technique. Filtering is
done for noise reduction. The edges in the image is determined with the help of edge detection methods, hence finding
the boundaries. Cropping is done here. The text present in the image are segmented into separate letters & extracted
letters is compared with the letters early stored in the system for character recognition. We use correlation matching
technique for the purpose. The corresponding letter is played. Here the letters obtained are separated to a words. We set a
threshold value for space, if value obtained is greater than threshold value it is considered as letter else space &thus
separation of words take place. Characters are sent to the
Graphical User Interface (GUI) on the PC. The American Standard Code for Information Interchange (ASCII)
value of the character to be read can be sent wirelessly from PC to Microcontroller using the wireless CC 2500 Radio
Frequency (RF) Transreceiver module. The American Standard Code for Information Interchange (ASCII) value of the
character sent from the PC can be converted to the corresponding Braille code using a conversion algorithm. This
conversion program can be written in an Embedded C language and it can be recorded in microcontroller. The output of
the microcontroller can be taken from the general purpose input/output pins of the development board in the form of
voltages that is either 0 Volts or 5 Volts.
A six bit number in binary/hexadecimal form can be obtained from the output of the microcontroller
corresponding to the Braille code of the character. The output from the six Input/output pins can be further given to the
tactile display made of six solenoids that represent the Braille characters, the device will be having only a single Braille
cell. The touchpad can be interfaced to the device so that the user can navigate through the textbooks using gestures like
forward stroke, backward stroke, up or down movements.
CAMERA
IMAGE
ACQUISITION
IMAGE
ENHANCEMENT
FILTERING
EDGE
DETECTION
CHARACTER
SEGMENTATION
CHARACTER
RECOGNITION
SEPARATION
OF WORDS
TEXT TO
SPEECH
CONVERSION
Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
176
S
Figure 2: Block diagram for text to braille script
2.1. Camera
The camera here we use is a normal webcam which is of low cost. The advantage of using a webcam is that it
can be interfaces very easily and is able to take pictures real time. However it is preferred to use camera of better
resolution for better results.
2.2. Image acquisition
Matlab has image acquisition toolbox for getting image signals from a video device. For image capture, the
device configured must have a supporting adaptor & should be compatible with system resolution and colour patterns. A
video object is initialized here & the images are captured at desired intervals after setting required parameters.
2.3. Image Enhancement
This is improvement of digital image quality. Contrast adjustment is made by histrogram acquisition. Histeq is
the command used to do histrogram acquisition. Grayscale image only works.
2.4. Filtering
The technique of median filtering is used. A median filter operates over window by selecting the median
intensity in the window. Median filter is an example of Non-linear filtering, often used to remove noise. Median filtering
is very widely used in digital image processing because under certain conditions, it preserves edges while removing
noise.
2.5. Edge Detection
This is the image processing step in Matlab. At first the edges in the image is determined with the help of edge
detection methods, hence finding the boundaries. Cropping is done here. Performs a contrast enhancement if needed. The
image is then resized.
CAMERA IMAGE
ACQUISITION
IMAGE
ENHANCEMENT
FILTERING
EDGE
DETECTION
CHARACTER
SEGMENTATION
CHARACTER
RECOGNITION
GUI ON
PC
CC 2500
TRANSRECEIVER
MODULE
ASCII TO
BRAILLE
CONVERSION
ALGORITHM
MICROCONTROLLER
SOLENOIDS
TOUCHPAD
SEPARATION
OF
WORDS
Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
177
2.6. Character Segmentation
Partition of image into several components. Segmentation is an important part of practically any automated
image recognition system, because it is at this moment that one extracts the interesting objects, for further processing
such as description or recognition. Segmentation of an image is in practice the classification of each image pixel to one
of image parts.
2.7. Character Recognition
The captured feature extracted image is compared with the images early stored in the system for character
recognition. We use correlation matching technique for the purpose. The corresponding letter is played.
2.8. Separation of words
Here the letters obtained are separated to a words. We set a threshold value for space, if value obtained is greater
than threshold value it is considered as letter else space and thus separation of words takes place.
2.9. Text to Speech Conversion
Text-to-speech (TTS) synthesizer would start with the words in the text , convert each word one-by-one into
speech and concatenate the result together. The task of a TTS System is thus a complex one that involves mimicking
what human readers do. Windows Speech Application Program Interface is used here.
3. SOFTWARE IMPLEMENTATION
The whole system is implemented in Matlab environment. Image quality should be considerably well to obtain
efficient output. Text-to-speech synthesizer (TTS) would start with the words in the text, convert each word one-by-one
into speech and concatenate the result together. The task of a TTS system is thus a complex one that involves mimicking
what human readers do. Windows Speech Application Program Interface is used here. The Speech Application
Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech recognition and speech
synthesis within Windows applications. It is possible for a 3rd
-party company to produce their own Speech Recognition
& Text-To-Speech engines or adapt existing engines to work with SAPI. Here we use default sampling frequency 16000.
Speed can be set between -10 to +10. Normal speed is zero. Thus the text can be converted to speech. The proposed
system of converting text to Braille script can be doned by using GUI.
3.1. Simulation Windows
Figure 3: window for to select the mode
Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
178
Figure 4: window to get preview of image
Figure 5: window to capture image
Figure 6: window to process image
Here image captured will be processed. The text is converted to speech by TTS synthesizer.
Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
179
4. CONCLUSION
The device is a considerable improvement over currently available text to speech device. In particular, the
device is easy to use with little or no training used in most situations. The speed of hearing can be set & allow all people
to hear sound clearly. The trainers can easily train blind & deaf people. Thus blind & deaf people can perform their
studies easily. The implementation of text to Braille script can be done using solenoids. With slight modification the
system can be used for dumb people to communicate over telephone.
5. REFERENCES
[1] G.J. Awcock and R Thomas, Applied Image Processing, MacMillan Press Limited, 1995.
[2] Agui T. And Nagao T. Computer Image Processing and Recognition, Tokyo: Shoho-do, 1994.
[3] Gonzalez R.C. and Woods R. E., Digital Image Processing, Addison-Wesley, 1992.
[4] Marr D. And Hildreth, “Theory of edge detection”, Proc. of Royal Society London, B207, 1980, pp. 198-217.
[5] S. Thomas, M. Nageshwar Rao, H. A. Murthy, & C. S. Ramalingam, “Natural sounding speech based on
syallable-like units,” in EUSIPCO, Florence, Italy, 2006.
[6] P. V. S. Rao and R. B. Thosar, “A Programmimg system for studies in speech synthesis,” IEEE Trans.Acoust.,
Speech and Signal Processing , vol. 22 , no. 3, pp. 217-225, 1974.
[7] Sproat, R. And Olive, J. “Text-to-Speech Synthesis” Digital Signal Processing Handbook, Crc Press LLC, 1999.
[8] Mukul Bandodkar, Virat Chourasia, “Low Cost Real-Time Communication Braille.
[9] Hand-Glove for Visually Impaired Using Slot Sensors and Vibration Motors”, International Journal of
Electrical, Robotics, Electronics and Communications Engineering Vol:8, No:6, 2014.
[10] Vineeth Kartha, Dheeraj S. Nair, Sreekant S., Pranoy P. and Dr. P. Jayaprakash, “DRISHTI—A Gesture
Controlled Text to Braille Converter”, IEEE, 2012.
[11] A. A. Supekar, Prof. S. B. Somani and Prof. V.V. Shete, “A Teaching System for Non-Disabled People Who
Communicate with Deaf blind People”, International Journal of Electronics and Communication Engineering &
Technology (IJECET), Volume 4, Issue 4, 2013, pp. 221 - 225, ISSN Print: 0976- 6464, ISSN Online:
0976 –6472.

Device for text to speech production and to braille script

  • 1.
    Proceedings of theInternational Conference on Emerging Trends in Engineering and Management (ICETEM14) 30 – 31, December 2014, Ernakulam, India 174 DEVICE FOR TEXT TO SPEECH PRODUCTION AND TO BRAILLE SCRIPT Hima Pradeep V, Jeevan K M, Miji Jacob Department of Electronics and Communication Engineering, Sree Narayana Gurukulam College of Engineering, Kadayiruppu, Kolencherry, India ABSTRACT Writing is a very effective means of communicating our thoughts to people. We use scripts provided by the language to convey our thoughts to paper. However, in case of people who don’t have the sense of vision, they use a different type of script, known as Braille, named after its founder, Louis Braille. It is unlike the scripts that sighted use for writing. The current methods by which the unsighted and deaf are able to communicate are few, and all have serious drawbacks. These people completely depend on Braille and Audio recordings provided. Audio recordings provided are limited. Here we attempt to devise a system that will take the image of the text and convert it into speech& propose a system which will take image of the text and convert it to Braille script. We hope that this system will be helpful in bridging this communication gap that exists between sighted & non-sighted people. In this system MATLAB is used to process the image & speech signals. Keywords: Braille Script, Deaf Person, Image Acquisition, Threshold Value, Text-to-Speech. 1. INTRODUCTION The learning process for the unsighted and deaf person is a difficult task. The current methods by which the unsighted and deaf are able to communicate are few, & all have serious drawbacks. Braille writing is a widely spread means of communication for blind or partially sighted people. It consists of a system of six or eight possible dot combinations that are arranged in a fixed matrix, called a cell. Every dot can be set or cleared, giving 61 combinations in six-dot & 256 combinations in eight-dot Braille. All dots of a Braille page should fall on the intersections of an orthogonal grid. When texts are printed double-side (recto-verso), the grid of the verso text is shifted so that its dots fall in between the recto dots. Braille has a low information density. An average page of 25x 29cm, can have 32 characters on a line & 27 lines in a page. A typical dot has a diameter of 1.8 mm. This paper presents a solution to such a problem, makes learning process for an unsighted & deaf person more easier. As all textbooks will not be available in Braille script as well as Audio recordings of all textbooks are not available. We will take the image of content in the textbook and it will be reproduced as sound for persons who are only blind and as Braille script for persons who are both blind & deaf. The remainder of this paper is organized as: Section 2 describes the block diagram for proposed solution and section 3 describes software implementation and results. Section 4 concludes the paper. INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) ISSN 0976 – 6464(Print) ISSN 0976 – 6472(Online) Volume 5, Issue 12, December (2014), pp. 174-179 © IAEME: http://www.iaeme.com/IJECET.asp Journal Impact Factor (2014): 7.2836 (Calculated by GISI) www.jifactor.com IJECET © I A E M E
  • 2.
    Proceedings of theInternational Conference on Emerging Trends in Engineering and Management (ICETEM14) 30 – 31, December 2014, Ernakulam, India 175 2. DESCRIPTION OF THE PROPOSED SYSTEM Fig.1 shows the basic block diagram of device to convert text to speech. The image of text is captured by camera using image acquisition. The contrast adjustment is done using image enhancement technique. Filtering is done for noise reduction. The edges in the image is determined with the help of edge detection methods, hence finding the boundaries. Cropping is done here. The text present in the image are segmented into separate letters & extracted letters is compared with the letters early stored in the system for character recognition. We use correlation matching technique for the purpose. The corresponding letter is played. Here the letters obtained are separated to a words. We set a threshold value for space, if value obtained is greater than threshold value it is considered as letter else space &thus separation of words take place. Text-to-speech (TTS) synthesizer would start with the words in the text, convert each word one-by-one into speech, & concatenate the result together. Thus the voice is produced from a text. S SPEAKER Figure 1: Block diagram for text to speech production Fig.2 shows the proposed block diagram of device to convert text to Braille script. The image of text is captured by camera using image acquisition. The contrast adjustment is done using image enhancement technique. Filtering is done for noise reduction. The edges in the image is determined with the help of edge detection methods, hence finding the boundaries. Cropping is done here. The text present in the image are segmented into separate letters & extracted letters is compared with the letters early stored in the system for character recognition. We use correlation matching technique for the purpose. The corresponding letter is played. Here the letters obtained are separated to a words. We set a threshold value for space, if value obtained is greater than threshold value it is considered as letter else space &thus separation of words take place. Characters are sent to the Graphical User Interface (GUI) on the PC. The American Standard Code for Information Interchange (ASCII) value of the character to be read can be sent wirelessly from PC to Microcontroller using the wireless CC 2500 Radio Frequency (RF) Transreceiver module. The American Standard Code for Information Interchange (ASCII) value of the character sent from the PC can be converted to the corresponding Braille code using a conversion algorithm. This conversion program can be written in an Embedded C language and it can be recorded in microcontroller. The output of the microcontroller can be taken from the general purpose input/output pins of the development board in the form of voltages that is either 0 Volts or 5 Volts. A six bit number in binary/hexadecimal form can be obtained from the output of the microcontroller corresponding to the Braille code of the character. The output from the six Input/output pins can be further given to the tactile display made of six solenoids that represent the Braille characters, the device will be having only a single Braille cell. The touchpad can be interfaced to the device so that the user can navigate through the textbooks using gestures like forward stroke, backward stroke, up or down movements. CAMERA IMAGE ACQUISITION IMAGE ENHANCEMENT FILTERING EDGE DETECTION CHARACTER SEGMENTATION CHARACTER RECOGNITION SEPARATION OF WORDS TEXT TO SPEECH CONVERSION
  • 3.
    Proceedings of theInternational Conference on Emerging Trends in Engineering and Management (ICETEM14) 30 – 31, December 2014, Ernakulam, India 176 S Figure 2: Block diagram for text to braille script 2.1. Camera The camera here we use is a normal webcam which is of low cost. The advantage of using a webcam is that it can be interfaces very easily and is able to take pictures real time. However it is preferred to use camera of better resolution for better results. 2.2. Image acquisition Matlab has image acquisition toolbox for getting image signals from a video device. For image capture, the device configured must have a supporting adaptor & should be compatible with system resolution and colour patterns. A video object is initialized here & the images are captured at desired intervals after setting required parameters. 2.3. Image Enhancement This is improvement of digital image quality. Contrast adjustment is made by histrogram acquisition. Histeq is the command used to do histrogram acquisition. Grayscale image only works. 2.4. Filtering The technique of median filtering is used. A median filter operates over window by selecting the median intensity in the window. Median filter is an example of Non-linear filtering, often used to remove noise. Median filtering is very widely used in digital image processing because under certain conditions, it preserves edges while removing noise. 2.5. Edge Detection This is the image processing step in Matlab. At first the edges in the image is determined with the help of edge detection methods, hence finding the boundaries. Cropping is done here. Performs a contrast enhancement if needed. The image is then resized. CAMERA IMAGE ACQUISITION IMAGE ENHANCEMENT FILTERING EDGE DETECTION CHARACTER SEGMENTATION CHARACTER RECOGNITION GUI ON PC CC 2500 TRANSRECEIVER MODULE ASCII TO BRAILLE CONVERSION ALGORITHM MICROCONTROLLER SOLENOIDS TOUCHPAD SEPARATION OF WORDS
  • 4.
    Proceedings of theInternational Conference on Emerging Trends in Engineering and Management (ICETEM14) 30 – 31, December 2014, Ernakulam, India 177 2.6. Character Segmentation Partition of image into several components. Segmentation is an important part of practically any automated image recognition system, because it is at this moment that one extracts the interesting objects, for further processing such as description or recognition. Segmentation of an image is in practice the classification of each image pixel to one of image parts. 2.7. Character Recognition The captured feature extracted image is compared with the images early stored in the system for character recognition. We use correlation matching technique for the purpose. The corresponding letter is played. 2.8. Separation of words Here the letters obtained are separated to a words. We set a threshold value for space, if value obtained is greater than threshold value it is considered as letter else space and thus separation of words takes place. 2.9. Text to Speech Conversion Text-to-speech (TTS) synthesizer would start with the words in the text , convert each word one-by-one into speech and concatenate the result together. The task of a TTS System is thus a complex one that involves mimicking what human readers do. Windows Speech Application Program Interface is used here. 3. SOFTWARE IMPLEMENTATION The whole system is implemented in Matlab environment. Image quality should be considerably well to obtain efficient output. Text-to-speech synthesizer (TTS) would start with the words in the text, convert each word one-by-one into speech and concatenate the result together. The task of a TTS system is thus a complex one that involves mimicking what human readers do. Windows Speech Application Program Interface is used here. The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech recognition and speech synthesis within Windows applications. It is possible for a 3rd -party company to produce their own Speech Recognition & Text-To-Speech engines or adapt existing engines to work with SAPI. Here we use default sampling frequency 16000. Speed can be set between -10 to +10. Normal speed is zero. Thus the text can be converted to speech. The proposed system of converting text to Braille script can be doned by using GUI. 3.1. Simulation Windows Figure 3: window for to select the mode
  • 5.
    Proceedings of theInternational Conference on Emerging Trends in Engineering and Management (ICETEM14) 30 – 31, December 2014, Ernakulam, India 178 Figure 4: window to get preview of image Figure 5: window to capture image Figure 6: window to process image Here image captured will be processed. The text is converted to speech by TTS synthesizer.
  • 6.
    Proceedings of theInternational Conference on Emerging Trends in Engineering and Management (ICETEM14) 30 – 31, December 2014, Ernakulam, India 179 4. CONCLUSION The device is a considerable improvement over currently available text to speech device. In particular, the device is easy to use with little or no training used in most situations. The speed of hearing can be set & allow all people to hear sound clearly. The trainers can easily train blind & deaf people. Thus blind & deaf people can perform their studies easily. The implementation of text to Braille script can be done using solenoids. With slight modification the system can be used for dumb people to communicate over telephone. 5. REFERENCES [1] G.J. Awcock and R Thomas, Applied Image Processing, MacMillan Press Limited, 1995. [2] Agui T. And Nagao T. Computer Image Processing and Recognition, Tokyo: Shoho-do, 1994. [3] Gonzalez R.C. and Woods R. E., Digital Image Processing, Addison-Wesley, 1992. [4] Marr D. And Hildreth, “Theory of edge detection”, Proc. of Royal Society London, B207, 1980, pp. 198-217. [5] S. Thomas, M. Nageshwar Rao, H. A. Murthy, & C. S. Ramalingam, “Natural sounding speech based on syallable-like units,” in EUSIPCO, Florence, Italy, 2006. [6] P. V. S. Rao and R. B. Thosar, “A Programmimg system for studies in speech synthesis,” IEEE Trans.Acoust., Speech and Signal Processing , vol. 22 , no. 3, pp. 217-225, 1974. [7] Sproat, R. And Olive, J. “Text-to-Speech Synthesis” Digital Signal Processing Handbook, Crc Press LLC, 1999. [8] Mukul Bandodkar, Virat Chourasia, “Low Cost Real-Time Communication Braille. [9] Hand-Glove for Visually Impaired Using Slot Sensors and Vibration Motors”, International Journal of Electrical, Robotics, Electronics and Communications Engineering Vol:8, No:6, 2014. [10] Vineeth Kartha, Dheeraj S. Nair, Sreekant S., Pranoy P. and Dr. P. Jayaprakash, “DRISHTI—A Gesture Controlled Text to Braille Converter”, IEEE, 2012. [11] A. A. Supekar, Prof. S. B. Somani and Prof. V.V. Shete, “A Teaching System for Non-Disabled People Who Communicate with Deaf blind People”, International Journal of Electronics and Communication Engineering & Technology (IJECET), Volume 4, Issue 4, 2013, pp. 221 - 225, ISSN Print: 0976- 6464, ISSN Online: 0976 –6472.