Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.253-258 Speech to Text Conversion using Android Platform B. Raghavendhar Reddy#1, E. Mahender *2 #1 Department of Electronics Communication and Engineering Aurora’s Technological and Research Institute Parvathapur, Uppal, Hyderabad, India. 2Asst. Professor Aurora’s Technological and Research Institute Parvathapur, Uppal, Hyderabad, India.AbstractFor the past several decades, designers have remains speech. Market for smart mobile phonesprocessed speech for a wide variety of provides a number of applications with speechapplications ranging from mobile recognition implementation. Googles Voice Actionscommunications to automatic reading machines. and recently iphones Siri are applications thatSpeech recognition reduces the overhead caused enable control of a mobile phone using voice, suchby alternate communication methods. Speech has as calling businesses and contacts, sending texts andnot been used much in the field of electronics and email, listening to music, browsing the web, andcomputers due to the complexity and variety of completing common tasks. Both Siri and Voicespeech signals and sounds. However, with Actions require an active connection to a network inmodern processes, algorithms, and methods we order to process requests and most of Androidcan process speech signals easily and recognize phones can run on a 4G network which is faster thanthe text. In this project, we are going to develop the 3G network that the iPhone runs on. There isan on-line speech-to-text engine. The system also an issue of availability, Voice Actions areacquires speech at run time through a available on all Android devices above Android 2.2,microphone and processes the sampled speech to but Siri is available only for owners of the iPhonerecognize the uttered text. The recognized text 4S. The Siris advantage is that it can act on a widecan be stored in a file. We are developing this on variety of phrases and requests and can understandandroid platform using eclipse workbench. Our and learn from natural language, whereas Googlesspeech-to-text system directly acquires and Voice Actions can be operated only by using veryconverts speech to text. It can supplement other specific voice commands. In this work we havelarger systems, giving users a different choice for developed an application for sending SMS messagesdata entry. A speech-to-text system can also which uses Googles speech recognition engine. Theimprove system accessibility by providing data main goal of application Voice SMS is to allow userentry options for blind, deaf, or physically to input spoken information and send voice messagehandicapped users. Voice SMS is an application as desired text message. The user is able todeveloped in this work that allows a user to manipulate text message fast and easy without usingrecord and convert spoken messages into SMS keyboard, reducing spent time and effort. In thistext message. User can send messages to the case speech recognition provides alternative toentered phone number. Speech recognition is standard use of key board for text input, creatingdone via the Internet, connecting to Googles another dimension in modern communications.server. The application is adapted to inputmessages in English. Speech recognition for II. ANDROIDVoice uses a technique based on hidden Markov Android is a software environment formodels (HMM - Hidden Markov Model). It is mobile devices that includes an operating system,currently the most successful and most flexible middleware and key applications [1]. In 2005approach to speech recognition. Google took over company Android Inc., and two years later, in collaboration with the group the OpenKeywords: HMM, Android Os, DVM, Speech Handset Alliance, presented Android operatingRecognition, Intents system (OS). Main features of Android operating system are:I. INTRODUCTION  Enables free download of development Mobile phones have become an integral environment for application development.part of our Everyday life, causing higher demands  Free use and adaptation of operatingfor content that can be used on them. Smart phones system to manufacturers of mobileoffer customer enhanced methods to interact with devices.their phones but the most natural way of interaction 253 | P a g e
  2. 2. B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.253-258  Equality of basic core applications and construction of applications: activity, intent, service additional applications in access to and the content provider. An activity is the main resources. element of every application and simplified  Optimized use of memory and automatic description defines it as a window that users see on control of applications which are being their mobile device. The application can have one or executed. more activities. Main activity is the one that is used  Quick and easy development of as startup. The transition between the activities is applications using development tools and carried out in a way that launched activity calls a rich database of software libraries. new activity. Each activity as a separate component  High quality of audiovisual content, it is is implemented with inheritance of Activity class. possible to use vector graphics, and most During the execution of applications, activities are audio and video formats. added to the stack, currently running activity is on  Ability to test applications on most the top of the stack. computing platforms, including Windows, An intent is a message used to run the Linux… activities, services, or recipient’s multicast. An The Android operating system (OS) intent can contain the name of the components youarchitecture is divided into 5 layers (fig. 1.). The need to run, the action which is necessary toapplication layer of Android OS is visible to end execute, the address of stored data needed to run theuser, and consists of user applications. The component, and component type. A service is aapplication layer includes basic applications which component that runs in the background to performcome with the operating system and applications long running operations or to perform work forwhich user subsequently takes. All applications are remote processes. One service can link multiplewritten in the Java programming language. applications and service is executed until aFramework is extensible set of software components connection with all applications is done. A contentused by all applications in the operating system. The provider manages a shared set of application data.next layer represents the libraries, written in the C Data can be stored in the file system, a SQLiteand C + + programming languages, and OS accesses database, on the web, or any other persistent storagethem via framework. location which application can access [1].Dalvik Virtual Machine (DVM), forms the main Through the content provider, other applications canpart of the executive system environment. Virtual query or even modify the data (if the contentmachine is used to start the core libraries written in provider allows it).the Java III. SPEECH RECOGNITION Speech recognition for application Voice SMS is done on Google server, using the HMM algorithm. HMM algorithm is briefly described in this part. Process involves the conversion of acoustic speech into a set of words and is performed by software component. Accuracy of speech recognition systems differ in vocabulary size and confusability, speaker dependence vs. independence, modality of speech (isolated, discontinuous, or continuous speech, read or spontaneous speech), task and language constraints [2]. Speech recognition system can be divided into several blocks: feature extraction, acoustic models database which is built based on the training data, dictionary, language model and the speech recognition algorithm. Analog speech signal mustFig1: Android Architecture first be sampled on time and amplitude axes, or programming language. Unlike Java’s digitized. Samples of speech signal are analyzed invirtual machine, which is based on the stack, DVM even intervals. This period is usually 20 ms becausebases on registry structure and it is intended for signal in this interval is considered stationary.mobile devices. The last architecture layer of Speech feature extraction involves the formation ofAndroid operating system is kernel based on Linux equally spaced discrete vectors of speechOS, which serves as a hardware abstraction layer. characteristics. Feature vectors from trainingThe main reasons for its use are memory database are used to estimate the parameters ofmanagement and processes, security model, network acoustic models. Acoustic model describessystem and the constant development of systems. properties of the basic elements that can beThere are four basic components used in recognized. The basic element can be a phoneme for 254 | P a g e
  3. 3. B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.253-258continuous speech or word for isolated words output sequence. It is defined by recursive relation.recognition. During the search, n-best word sequences are Dictionary is used to connect acoustic generated using acoustic models and a languagemodels with vocabulary words. Language model model.reduces the number of acceptable wordcombinations based on the rules of language and IV. MAIN PARTS OF THE PROJECTstatistical information from different texts. Speech A. Voice Recognition Activity classrecognition systems, based on hidden Markov Voice Recognition Activity is startupmodels are today most widely applied in modern activity defined as launcher in AndroidManifest.xmltechnologies. They use the word or phoneme as a file. REQUEST_CODE is static integer variable,unit for modeling. The model output is hidden declared on the beginning of activity and used toprobabilistic functions of state and can’t be confirm response when engine for speechdeterministically specified. State sequence through recognition is started. REQUEST_CODE hasmodel is not exactly known. Speech recognition positive value. Results of recognition are saved insystems generally assume that the speech signal is a variable declared as ListView type.realization of some message encoded as a sequence Method onCreate is called when activity isof one or more symbols [3]. To effect the reverse initiated.operation of recognizing the underlying symbol This is where the most initialization goes:sequence given a spoken utterance, the continuous setContentView (R.layout.voice_recognition) isspeech waveform is first converted to a sequence of used to inflate the user interface defined in res >equally spaced discrete parameter vectors. Vectors layout > voice_recognition.xml, andof speech characteristics consist mostly of MFC findViewById(int) to programmatically interact(Mel Frequency Cepstral) coefficients [4], with widgets in the user interface. In this methodstandardized by the European Telecommunications there is also a check whether mobile phone, onStandards Institute for speech recognition. The which application is installed, has speechEuropean Telecommunications Standards Institute recognition possibility. Package Manager is class forin the early 2000s defined a standardized MFCC retrieving various kinds of information related to thealgorithm to be used in mobile phones [5]. Standard application packages that are currently installed onMFC coefficients are constructed in a few simple the device. FunctiongetPackageManager() returnssteps. A short-time Fourier analysis of the speech PackageManager instance to find global packagesignal using a finite-duration window (typically information. Using this class, we can detect if the20ms) is performed and the power spectrum is phone has a possibility for speech recognition. If acomputed. Then, variable bandwidth triangular mobile device doesn’t have one of many Google’sfilters are placed along the perceptually motivated applications which integrate speech recognition,mel frequency scale and filter bank energies are further work of this application Voice SMS will becalculated from the power spectrum. disabled and message on the screen will be Magnitude compression is employed using “Recognizer not present”. Recognition process isthe logarithmic function. Finally, auditory spectrum done trough one of Google’s speech recognitionthus obtained is decorrelated using the DCT and applications. If recognition activity is present userfirst (typically 13) coefficients represent the can start the speech recognition by pressing on theMFCCs. button and thus launching startActivityForResult These vectors of speech characteristics are (Intent intent, int requestCode). The applicationcalled observations and used in further calculations. uses startActivityForResult() to broadcast an intentTo develop an acoustic model, it is necessary to that requests voice recognition, including an extradefine states. Continuous speech recognition, each parameter that specifies one of two languagestate represents one phoneme. Under the concept of models. Intent is defined with intent.putExtratraining we mean the determination of probabilities (RecognizerIntent.of transition from one state to another andprobabilities of observations. Iterative Baum-Welch EXTRA_LANGUAGE_MODEL,procedure is used for training [6]. The process is RecognizerIntent.LANGUAGE_MODEL_FREE_Frepeated until a certain convergence criterion is ORM).reached, for example good accuracy in terms of The voice recognition application thatsmall changes of estimated parameters, in two handles the intent processes the voice input, thensuccessive iterations. In continuous speech the passes the recognized string back to Voice SMSprocedure is performed for each word in complex application by calling the onActivityResult()HM model. Once states, observations and transition callback. In this method we manage result ofmatrix for HMM are defined, the decoding (or activity startActivityForResult. Received resultrecognition) can be performed. Decoding represents Data is stored in ListView variable and shown onfinding of most likely sequence of hidden states the screen (Fig. 3.).using Viterbi algorithm, according to the observed 255 | P a g e
  4. 4. B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.253-258 the data associated with the string "key" which in this case is selected text from recognition Fig3. Testing Figure 4. Interface for sending SMS activity. The text is entered in the space for writing messages and displayed on the screen. By clicking the Send SMS button application checks whether the message and the number of recipient are entered to perform sending of message. When cursor is positioned in the space for recipient number from contacts, button attribute visibility is changed from default gone to visible. Pressing the button the command that allows you to enter the contact numbers. label Intent forContacts new Intent = (Intent.ACTION_PICK,Fig2: Enables search after clicking image button ContactsContract.Contacts.CONTENT_URI) startActivityForResult (forContacts, REQUEST_CODE) is executed. After selecting desired contact, message can be sent. C. XML files Application consists of two different interfaces. When the user runs application screen is defined in voice_recognition.xml. The linear arrangement of elements allows adding widget one below another. Width and height are defined with fill_parent attribute, which means to be equal as parent (in this case the screen). The second interface, defined within sms.xml file, is displayed when the user chooses one of offered messages.Fig3: Processes and gives text output AndroidManifest.xml realizes installing and launching applications on the mobile device. Every application must have an AndroidManifest.xml file Main Activity (with precisely that name) in its root directory. The manifest presents essential information about the application to the Android system, information the system must have before it can run any of the applications code. It defines the activities and permissions required for the application. Permissions are used to remove restrictions that prevent access to certain pieces of code or data on mobile device. Each permit is defined by a special label. If application needs data with distrait if must seek the necessary authorization. The syntax is: <uses-permission android:name= "Unique name of permission"> </ uses-permission>. Every activity must be named in manifest. Name is used as a parameter that is passed to the constructor of intent which is used to start desired activity. Activities can have different <category> attribute. VoiceRecognitionActivity.java is the main class and its onCreate method is executed atFig4: Interface for sending SMS application startup. The category tag is specially defined String that describes this feature of activityB. SMS class with keyword Launcher while other classes areIn onCreate method label setContentView marked with keyword Default.(R.layout.sms) opens new user interface. Command AndroidManifest.xml has to include permission tofinal String message1 = send and receive messages, to access Internet andgetIntent().getExtras().GetString ("Key") retrieves 256 | P a g e
  5. 5. B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.253-258permission to access the list of contacts from the and become more accessible to everyone. We havephonebook. been given a possibility to manage mobile devicesPermissions are: without installing complex software for speech_ android.permission.INTERNET. processing, resulting in memory savings. A lack of_ android.permission.SEND_SMS. application Voice SMS is its adaption only for_ android.permission.RECEIVE_SMS. English language, and the need for permanent_ android.permission.READ_CONTACTS. Internet connection. The objective for the future is development of models and databases for multiple languages whichV. APPLICATION FUNCTIONALITY could create a foundation for everyday use of thisPRINCIPLE technology worldwide. The main goal of application Application Voice SMS integrates direct is to allow sending of text messages based onspeech input enabling user to record spoken uttered voice messages. Testing has showninformation as text message, and send it as SMS simplicity of use and high accuracy of datamessage. After application has been started display processing. Results of recognition give user optionon mobile phone shows button which initiate voice to select the most accurate result. Further work isrecognition process. When speech has been detected planned to implement the model of speechapplication opens connection with Google’s server recognition for different language.and starts to communicate with it by sending blocksof speech signal. Simultaneously the figure of LITERATUREwaveform is generated on the screen. Speech [1] “Android developers”,recognition of the received signal is preformed on http://developer.android.comserver. Google has accumulated a very large [2] J. Tebelskis, Speech Recognition usingdatabase of words derived from the daily entries in Neural Networks, Pittsburgh: School ofthe Google search engines well as the digitalization Computer Science,Carnegie Mellonof more than 10 million books in Google Book University, 1995.Search project. The database contains more than 230 [3] S. J. Young et al., "The HTK Book",billion words. If we use this kind of speech Cambridge University Engineeringrecognizer it is very likely that our voice is stored on Department, 2006.Google’s servers. This fact provides continuous [4] K. Davis, and Mermelstein, P.,increase of data used for training, thus improving "Comparison of parametric representationaccuracy of the system. for monosyllabic word recognition in When process of recognition is over, user continuously spoken sentences", IEEEcan see the list of possible statements. Process can Trans. Acoust., Speech, Signal Process.be repeated clicking on the button Image Button. vol. 28, no. 4, pp. 357–366,1980.Pressing the most accurate option, selected result is [5] ETSI, "Speech Processing, Transmissionentered into interface for writing SMS messages. and Quality Aspects (STQ); Distributed Interface for writing SMS messages has all speech recognition; Front-end featurestandard features. User can correct text and input extraction algorithm; Compressionrecipient number in the empty textbox. Button algorithms.", Technical standard ES 201Contacts opens interface with contact numbers from 108, v1.1.3, 2003.mobile phone and enables user to chose phone [6] L. E. Baum, T. Petrie, G. Soules, and N.number on which message will be sent after Weiss, "A maximization techniquepressing button Send. Customer receives response in occurring in the statistical analysis ofa little cloud (toast) if message has been sent. probabilistic functions of Markov chains", Ann. Math. Statist., vol. 41, no. 1, pp. 164–VI. CONCLUSION 171, 1970. With the development of software andhardware capabilities of mobile devices, there is an REFERENCESincreased need for device-specific content, what [1] C. Allauzen, M. Riley, J. Schalkwyk, W.resulted in market changes. Speech recognition Skut, and M. Mohri. OpenFst: Atechnology is of particular interest due to the direct general and efficient weighted _nite-statesupport of communications between human and transducer library. Lecture Notes incomputers. Computer Science, 4783:11, 2007. Using the speech recognizer, which works [2] M. Bacchiani, F. Beaufays, J. Schalkwyk,over the Internet, allows much faster data M. Schuster, and B. Strope. Deployingprocessing. Another advantage is the much larger GOOG-411: Early lessons in data,databases that are used. Many believe that this mode measurement, and testing. In Proceedingsis a big step for speech recognition technology. The of ICASSP, pages 5260{5263, 2008.accuracy of the system has significantly increased 257 | P a g e
  6. 6. B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.253-258[3] MJF Gales. Semi-tied full-covariance matrices for hidden Markov models. 1997.[4] B. Harb, C. Chelba, J. Dean, and G. Ghemawhat. Back-o_ Language Model Compression. 2009.[5] H. Hermansky. Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America, 87(4):1738{1752, 1990.[6] Maryam Kamvar and Shumeet Baluja. A large scale study of wireless search behavior: Google mobile search. In CHI, pages 701{709, 2006.[7] S. Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. In IEEE Transactions on Acoustics, Speech and Signal Processing, volume 35, pages 400{01, March 1987.[8] D. Povey, D. Kanevsky, B. Kingsbury, B. Ramabhadran, G. Saon, and K. Visweswariah. Boosted MMI for model and feature-space discriminative training. In Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2008.[9] R. Sproat, C. Shih, W. Gale, and N. Chang. A stochastic _nite-state word-segmentation algorithm for Chinese. Computational linguistics,22(3):377{404, 1996.[10] C. Van Heerden, J. Schalkwyk, and B. Strope. Language modeling for what-with- where on GOOG-411. 2009. WEB SITES Minimalist GNU for Windows, 06/01/2010, http://www.mingw.org/ Code Blocks, The open source, cross platform, IDE, 06/01/2010, http://www.codeblocks.org/ Java JDK SE 1.6 update 18, 02/04/2010, http://java.sun.com/javase/ ECLIPSE GALILEO Release 3.5.2, 02/04/2010, http://www.eclipse.org/galileo/ Java sound API documentation, 11/04/2010, http://java.sun.com/j2se/1.5.0/docs/guide/s ound/programmer_guide/contents.html Android 2.2 SDK release 6, http://developer.android.com/sdk/index.ht ml Android 2.2 SDK reference page, http://developer.android.com/reference/pac kages.html 258 | P a g e