42 128-1-pb


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

42 128-1-pb

  1. 1. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 Self Organizing Markov Map for Speech and Gesture Recognition Ms. Nutan D Sonwane, Prof. S. A. Chhabria, Dr.R.V.Dharaskar environment. The approach involves the combination of selfAbstract— Gesture and Speech based human Computer Organizing Markov Map (SOM) and Markov Model. Itsinteraction is attractive attention across various areas such as most effective application is the development of strong andpattern recognition, computer vision. Thus kind of research friendly interfaces for human-machine interaction, sinceareas find many kind of application in Multimodal HCI, gesture and speech are a natural and powerful way ofRobotics control, Sign language recognition. This paper presents communication. The Principle component Analysis approachhead and hand Gesture as well as Speech recognition system forhuman computer interaction (HCI).This kind of vision based describes a method for gesture recognition It is a classicalsystem can show the capability of computer, which understand feature extraction technique widely used in the field ofand responding to the hand and head gesture also for Speech in pattern recognition and computer vision [1]. The gestureform of sentence. This recognition system consists of two main recognition using PCA algorithm that involves two phases: •modules namely 1.Gesture recognition 2.Speech recognition, Training Phase • Recognition Phase. Support VectorGesture recognition consists of various phases.i. image Machines it is a classical statistical technique for analyzingcapturing, ii. Feature extraction of gesture iii.Gesture modeling the covariance structure of multivariate data. Self-Growing(Direction, Position, generalized), 2.Speech recognition consists and Self-Organized Neural Gas (SGONG) network [2]of various phases i. taking voice signals ii. Spectral coding iii. describe a method which is an unsupervised neural classifier.Unit matching (BMU) iv. Lexical decoding v.syntactic,semantic analysis. Compared with many existing It achieves clustering of the input data, so as the distance ofalgorithms for gesture and speech recognition, SOM provides the data items within the same class (intra-cluster variance) isflexibility, robustness against noisy environment. The detection small and the distance of the data items stemming fromof gestures is based on discrete predestinated symbol sets, which different classes (inter-cluster variance) is large. The finalare manually labeled during the training phase. The number of classes is determined by the SGONG during thegesture-speech correlation is modelled by examining the learning process. (SOM) [3] Describes a method of selfco-occurring speech and gesture patterns. This correlation can organizing map for Speech recognition. Modular systembe used to fuse gesture and speech modalities for edutainment based on hidden Markov model [4] describes a layeredapplications (i.e. video games, 3-D animations) where natural method based on (HMM) Hidden Markov model.SOMMgestures of talking avatars are animated from speech. A speech architecture for gesture recognition, fusing separatedriven gesture animation example has been implemented fordemonstration. component model all of which are based on hand trajectory. The approach involves a combination of Self Organizing Keywords—Gesture recognition, Human computer Maps and Markov Models [5] for gesture trajectoryinteraction, speech recognition, self organizing map and Markov classification, using the trajectory of the hand segment andmodel direction of motion during a gesture. This classification scheme is based on the transformation of a gesture I INTRODUCTION representation from series of coordinates and movements to a This paper presents head and hand Gesture as well as Speech symbolic form and building probabilistic models based onrecognition system for human computer interaction these transformed representations. Automatic speech [6](HCI).This kind of vision based system can show the recognition is a process by which a machine identifiescapability of computer. Which understand and responding to speech. The machine takes a human utterance as an input andthe hand and head gesture, Speech in form of sentence. This returns a string of words phrases or continuous speech in therecognition system consists of four modules namely 1. form of text as output. since gesture and speech are a naturalManual Module 2.Head Tracker 3.Hand Recognition 4.Voice and powerful way of communication [2][3][4][6].Recognition which consists various Symbolic gesturecommand and voice command. i. Image capturing, ii. Featureextraction of gesture iii. Gesture modeling (Direction,Position, generalized), 2.Speech recognition consists ofvarious phases i. taking voice signals ii. Spectral coding iii.(BMU)Best Unit matching iv. Lexical decoding v. syntactic,semantic analysis. Compared with many existing algorithmsfor gesture and speech recognition, SOMM (Self OrganizingMarkov map) provides flexibility, robustness against noisy Figure: 1 Symbolic Hand Gesture 119 All Rights Reserved © 2012 IJARCSEE
  2. 2. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 II SELF ORGANIZING MAP Step III: Scale neighborsA self-organizing map or self-organizing feature [3] map is a 1) Determining Neighborstype of artificial neural network that is trained usingunsupervised learning to produce a low-dimensional There are actually two parts to scaling the neighboring(typically two-dimensional), discredited representation of the weights: determining which weights are considered asinput space of the training samples and called a map. neighbors and how much each weight can become moreSelf-organizing maps are different from other artificial neural like the sample vector. The neighbors of a winning weightnetworks. They use a neighborhood function to preserve the can be determined using a number of different methods.topological properties of the input space. Training builds the Some use concentric squares, others hexagons.map using input examples. It is a competitive process, alsocalled vector quantization. Mapping automatically classifies 2) Learninga new input vector. A self-organizing map consists of Learning in the self-organizing map is to cause differentcomponents called nodes or neurons. Three stage of SOM, parts of the network to respond similarly to certain input patterns.second part to scaling the neighbors is the learning 1) Initialization 2) gets best matching unit 3) scale function. The winning weight is rewarded with becoming nneighbors. more like the sample vector. The neighbors also becomeStep I: Initialization more like the sample vector. An attribute of this learning process is that the farther away the neighbor is from theInitialize the weight vector map. Each weight vector random winning vector, the less it learns. The rate at which thevalues for its data. Before the training, initial values are given amount a weight can learn decreases and can also be set .to the prototype vectors. The SOM is very robust with respect Here use a Gaussian function. This function will return ato the initialization, but properly accomplished it allows the value ranging between 0 and 1, where each neighbor isalgorithm to converge faster to a good solution. Typically one then changed using the parametric equation. So in the firstof the three following initialization procedures is used: iteration, the best matching unit will get a t of 1 for its learning function, so the weight will then come out of this1. Random initialization, where the weight vectors are process with the same exact values as the randomlyinitialized with small random values. selected sample.2. Sample initialization, where the weight vectors areinitialized with random samples drawn from the input data III HIDDEN MARKOV MODELset. Hidden Markov model (HMM) is a statistical Markov mode3. Linear initialization, where the weight vectors are [4] in which the system being modeled is assumed to be ainitialized in an orderly fashion along the linear subspace Markov process with unobserved (hidden) states. An HMMspanned by the two principal eigen vectors of the input data can be considered as the simplest dynamic Bayesian network.set. The eigenvectors can be calculated using Gram-Schmidt In a regular Markov model, the state is directly visible to theprocedure. In SOM Toolbox, random and linear observer, and therefore the state transition probabilities areinitializations have been implemented. Random initialization the only parameters. In a hidden Markov model, the state isis done by taking randomly values from the d-dimensional not directly visible, but output, dependent on the state, iscube defined by the minimum and maximum values of the visible. Each state has a probability distribution over thevariables. Linear initialization is done by selecting a mesh of possible output tokens. Therefore the sequence of tokenspoints from the d-dimensional min-max cube of the training generated by an HMM gives some information about thedata. The axis of the mesh is the eigenvectors corresponding sequence of states. The parameters of a hidden Markovto the m greatest values of the training data. model are of two types 1. Transition probabilities 2. Emission probabilities (also known as output probabilities).TheStep II: Get best matching unit transition probabilities control the way the hidden state at time t is chosen given the hidden state at time t − 1. TheGo through all the weight vectors and calculate the distance hidden state space is assumed to consist of one of N possiblefrom each weight to the chosen sample vector. The weight values, modeled as a categorical distribution. This means thatwith the shortest distance is the winner. If there is more than for each of the N possible states that a hidden variable at timeone with the same distance, then the winning weight is t can be in, there is a transition probability from this state tochosen randomly among the weights with the shortest each of the N possible states of the hidden variable at time t +distance. The most common method is to use the Euclidean 1, for a total of N2 transition probabilities. (Note, however,distance. Operation of calculating distances and comparing that the set of transition probabilities for transitions from anythem is done over the entire map and the weight with the given state must sum to 1, meaning that any one transitionshortest distance to the sample vector is the winner and the probability can be determined once the others are known,BMU. The square root is not computed in the program for leaving a total of N(N − 1) transition parameters.)speed optimization. 120 All Rights Reserved © 2012 IJARCSEE
  3. 3. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 speech utterances along with their transcriptions into phonemes and outputs the speech models for the phonemes.Hidden Markov models can model complex Markov IV HARDWARE COMPONENT AS WHEELCHAIRprocesses where the states emit the observations according to ROBOTsome probability distribution.One such example ofdistribution is Gaussian distribution, in such a Hidden A wheelchair robot move according to the command givenMarkov Model the states output is represented by a Gaussian to it from various kinds of Symbolic gesture and voicedistribution.HMM uses various technique to solve problem commands. The system takes symbolic gesture commands assuch as 1) Forward and backward 2) viterbi algorithm and input to hardware and it will move accordingly. Wheelchairposterior algorithm 3) Baum Welch algorithm. robot made up of various hardware component: III. ALGORITHM a) Microcontroller i. 2K bytes of Flash ii. 128 bytes of RAM iii 15 I/O lines iv Two 16-bit timer/counters v A five vector two-level interrupt architecture vi A full duplex serial port vii A precision analog comparator viii on-chip oscillator and clock circuitry b) Other devices: DC Motor, TX-RX Antenna, USB to serial connector, Battery Figure:2 Self Organizing MapKohonen Algorithm:Step1.Randomize the maps nodes weight vectorsStep 2.Grab an input vectorStep 3.Traverse each node in the map i) Use Euclidean distance formula to find similarity between the input vector and the maps nodes weight vector ii) Track the node that produces the smallest distance (thisnode is the best matching unit, BMU)Step 4.Update the nodes in the neighborhood of BMU bypulling them closer to the input vector Figure: 3 Wheelchair RobotStep 5.Increase t and repeat from step 2Markov Model include various algorithm:Use Viterbialgorithm for finding sequence of hidden states called theViterbi path. Baum-Welch algorithm is use for finding set ofstate transition and output probabilities of sequence.Step1.The (potentially) occupied state at time t is called qtStep2. A state can referred to by its index, e.g. qt = jStep3.1event equal to1 stateAt each time t, the occupied state outputs (“emits”) itscorresponding.Markov model is generator of events. Each Figure: 4 Internal circuit of Robotevent is discrete, has single output. In typical finite-statemachine, actions occur at transitions, but in most MarkovModels, actions occur at each state. The data in a speechrecognition system. Training takes as input a large number of 121 All Rights Reserved © 2012 IJARCSEE
  4. 4. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 V MODULES CONCLUSIONi. Manual mode ii Head Gesture iii Hand gesture iv Voice Proposed system includes both the approaches speech as wellrecognition .This all modules include in one system and as gesture recognition. System will take input in form of speech signal and gesture as hand & head coordinates.according to order of command in form of gesture and System will also use one wheelchair as hardware device forspeech, accordingly it will take movements. interaction with system. REFERENCES [1] Soloman Raju Kota,J.L Reheja,Ashutosh Gupta,Archna rathi , Shashikant Sharma”Principal component analysis for Gesture recognitionSpeech and Gesture Recognition: using systemC”2009 international conferences in advance technology in communication and computing 2009IEEE [2] Yean Choon Ham, Yu Shi “Developing a Smart Camera for Gesture Recognition in HCI Applications” The 13th IEEE International Symposium on Consumer Electronics (ISCE2009) 978-1-4244-2976-9/09/$25.00 ©2009 IEEE [3] E. Stergiopoulou and N. Papamarkos “A New Technique For Hand Gesture Recognition” 1-4244-0481-9/06/ © 2006 IEEE [4] Anjali Kalra, Sarbjeet Singh, Sukhvinder Singh”SpeechRecognition” Figure:5 Speech and Gesture Recognition International Journal of Computer Science and Network Security, VOL.10,2010.I. Manual mode [5] George Caridakis , Kostas Karpouzis, Athanasios Drosopoulos, Stefanos Kollias” SOMM: Self organizing Markov map for gesture recognition” Pattern Recognition Letters 31, 2010 [6] WU Song-Lin, CUI Rong-Yi “Human Behavior Recognition Based on Sitting Postures” 2010 International Symposium on Computer, Communication, Control and Automation. 978-1-4244-5567-6/10/ © 2010 IEEE [7] Jagdish Lal Raheja, Radhey shyam “Real Time Robotic Hand Control Using Hand Gesture” 978-0-7695-3977-5/10 © 2010 IEEE. Figure:6 Mannual mode [8] Mr. Chetan A. Burande, Prof. Raju M. Tugnayat, Prof.Dr. Nitin K. Choudhary “Advanced Recognition Techniques for Human ComputerII.Head Gesture III.Hand Gesture Interaction.” 978-1-4244-5586-7/10. 2010 IEEE [9] Shuai Jin, Guang-ming Lu, Jian-xun Luo, Wei-dong Chen Xiao-xiang Zheng ”SOM-based Hand Gesture Recognition for Virtual Interactions” in IEEE International Symposium on Virtual Reality Innovation 2011. [10] G.R.S Murthy, R.S Jadon “Hand gesture recognition using neural network” in 2nd International Advance Computing Conference 2010 Mr. Chetan A. Burande, Prof. Raju M. Tugnayat, Prof.Dr. Nitin K. Choudhary “Advanced Recognition Techniques for Human Computer Interaction.” 978-1-4244-5586-7/10. 2010 IEEE [11] M. Ajallooeian, A. Borji, B. N. Araabi , M. Nili Ahmadabadi, H. Moradi “Fast Hand Gesture Recognition based on Saliency Maps: An Application to Interactive Robotic Marionette Playing” The 18th IEEE International Symposium on Robot and Human Interactive Communication Toyama, Japan, Sept. 27-Oct. 2, 2009. 978-1-4244-5081-7 /09/ ©2009 IEEE Figure: 7 Head Gesture / Hand Gesture [12] wei-hua andrew wang, chun-liang tung Proceedings of the Seventh International Conference on Machine Learning and Cybernetics, Kunming, 12-15 July 2008 “Dynamic Hand Gesture Recognition Using Hierarchical Dynamic Bayesian Networks Through Low-Level Image Processing.” 978-1-4244-2096-4/08 ©2008 IEEE [13] Sridhar P. Arjunan, Dinesh K. Kumar School of Electrical and Computer Engineering “Recognition of facial movements and hand gestures using surface Electromyogram (sEMG) for HCI based applications”. 0-7695-3067-2/07 © 2007 IEEE [14] T Nakanot , T Mori&, M. Nagata , and A. Iwatat “A Cellular-Automaton-Type Image Extraction Algorithm and Its Implementation Using An Fpga” 0-7803-7690-0/02/$17.00 @2002 IEEE Figure:8 Speech RecognitionThis all are the output of particular module. Which performwork according to command. 122 All Rights Reserved © 2012 IJARCSEE
  5. 5. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012First Author:Ms. Nutan D. SonwaneIV sem MTech[CSE],G.H.Raisoni College of Engineering,Nagpur,R.T.M.N.U, NagpurSecond Author :Prof. S.A. ChhabriaHOD[IT] Department,G.H.Raisoni College of engineering,NagpurR.T.M.N.U, NagpurThird Author :Dr. R.V.DharaskarDirector of Matoshri Pratishthans Group of InstitutionsMPGI Integrated campus, Nanded IndiaS.R.T.M Nanded University 123 All Rights Reserved © 2012 IJARCSEE