Interactive speech based games for autistic children with asperger syndrome


Published on

Published in: Technology, Health & Medicine
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Interactive speech based games for autistic children with asperger syndrome

  1. 1. Interactive Speech Based Games for Autistic Children with Asperger Syndrome Amal alqahtani, Nouf Jaafar, Nourah Alfadda Information Technology Department King Saud University, Riyadh, KSA ,, Abstract 1. Introduction There are many computer users have varyingphysical or mental abilities like autistic person. Nowadays Artificial intelligence based technologiesTherefore, there are different approaches to deal with are becoming more used to improve traditionalthe technical applications, especially for people with applications or to develop new ones. Thesespecial needs who are suffering from using several technologies are qualified as “Enhanced Computingapplications. The problem that occurs with children Technologies” [1] because they allow the developmentwith autism is mainly when the child with autism tries of beneficial applications through the use of naturalto contact and communicate with others. This is mainly interfaces, the extraction of meaningful informationbecause children find it difficult to articulate their and/or the creation of adaptive systems that are morethoughts don’t have any suitable way to make the reactive to the environment.others around them understand what they need. One Interactive Speech Based Games (ISBG) is a systemof the issues to cope with the problem is how to get that is devoted to provide an appropriate support tobenefit from technology to help this user population. autistic children with Asperger Syndrome. Asperger In this paper, we will describe the objectives of Syndrome is one of the autism spectrum disordersInteractive Speech Based Games (ISBG) project and which are characterized by difficulties in socialthe overview of the system, then we will review the interaction and communication, with repetitivetechniques that were used in the project, and how it is behavior. Asperger’s children don’t have linguistic andintegrated with each other. cognitive problems like other autism spectrum disorders.Categories and Subject Descriptors The key idea behind this project is to create a K.3.1 [Computer Uses in Education]: Computer multimodal application that integrates speechassisted instruction. H.5.1 [Information Interfaces and technology in an attempt to extend users experience.Presentation]: Multimedia Information Systems. The application is multimodal in the sense that it allows users to choose the appropriate input method: speech,General Terms text or point and click method. Design, Human Factors. In addition to that, this project includes a web site application, two desktop applications along with bridges that allow linking desktop applications to theKeywords web application. By integrating Microsoft Speech ISBG, AI, Asperger Syndrome, multimodal Technology, two games have been implementedapplication, Puzzle game, PECS system. namely Puzzle game and PECS. The speechtechnologies include speech recognition and speech synthesis. 2. Motivations of ISBS project
  2. 2. The motivations behind the use of speech can be number of applications. SAPI decreases the codesummarized as follows [1]: required for an application to use speech recognition and text-to-speech [2]. 1. Using speech enables development of intuitive and more natural interfaces for the user. 3.2.1. Overview of Speech API. Speech Application 2. A very large base of users can use the Interface (SAPI) has the basic standard interfaces and application. functionality of the speech recognition technology that 3. Multimodal applications increase user allows the programmer to create an application and satisfaction. integrate it with the speech recognition technology [3]. 4. Using speech allows hand free access to The SAPI consists of two components (see figure 3.1): applications which is suitable for handicapped the Application Programming Interface (API) and the individuals as well as for busy ones (not hand Device Driver Interface (DDI) for speech engines to free). 5. Speech based applications provide an implement. API is used for the purpose of reducing the time required to create such intelligent application and appropriate support for the treatment of for the abstraction of feature that hides many low-level persons suffering from specific disabilities. details of the implementation of this technology, whereas the DDI is working with API to make the use of speech synthesis and speech recognition engines and application more convenient and this by removing many implementation details such as multi-threading and audio device management [4]. There are two types of SAPI engines. The first engine is called the text-to-speech (TTS) which converts text strings into synthesized spoken audio. The other one is called speech recognizer which recognizes the human audio and converts it to text strings or files [3]. All speech recognition has a set of predefined words that help the speech engine to better and speedily recognize the speech that sends from the application. The predefined words are called grammar [3]. Figure 1. The overview of ISBG system.3. Microsoft technologies3.1. Introduction The project was designed using on Microsofttechnologies, namely Microsoft Speech technology, Figure 2. The overview of SAPI [3].and Microsoft OLE DB provider for SQL server. 3.2.2. Speech Recognition. These technologies are used to develop speech API for speech Recognition engineenabled applications and to ensure connection between • Types of speech recognition enginesremote databases respectively. (ISpRecognizer) [3]:3.2. Microsoft Speech technology 1. Shared recognizer: The main purpose of this type of engine is to allow Microsoft developed Speech Application the sharing with other speech recognition applications,Programming Interface (SAPI), to make speech and this is the type that we used in our more attainable and helpful for a large 2. InProc speech recognition engine:
  3. 3. InProc speech recognition engine is more 3.2.3. Speech Synthesisappropriate for the large server applications that would API for text-to-speechrun alone on a system and for which performance is The text-to-speech operations can handlerequired. synchronously or asynchronously the voice by using the ISpVoice Component Object Model (COM)• ISpRecognizer components: interface. ISpVoice can convert the text string or text 1. ISpRecoContext: files into audio, also it has the ability to play audio file.” An ISpVoice object forwards events back to the The ISpRecoContext is the main interface of the application when the corresponding audio data hasspeech recognition; it receives a notification when been rendered to the output device” [3].speech recognition events occur on the application [3]. 2. ISpRecoGrammar: DDI of speech synthesis This objects is created within the There are two text-to-speech engine interfaces at the CreatRecoContext method which allow to define a set DDI level which are: of words that will be recognized when an ISpTTSEngine interface and ISpTTSEngineSite ISpRecoContext object has created and received a interface (see figure 3.2). The main interface of the notify. This grammar can be dictation or a command SAPI speech synthesis is the IspTTSEngine which has and control grammar [3]. a primary method Speak, this method simply called by SAPI to convert the string text into audio [6]. DDI for Speech Recognition (Engine-Level- Interfaces) • The Speak method has two general functionality The speech recognition engine using DDI (speech [6]:recognition manager) to receives the audio from the 1. Creates a linked list of text fragments.SAPI and return the recognitions result [3]. There are 2. Receives a pointer to the ISpVoice objecttwo interfaces for doing this: ISpSREngine, and for the purpose of creating a queue eventsISpSREngineSite.The SAPI communicate to the speech and write output data.engine through using the ISpSREngine interface. The • SAPI have a free-threaded architecture thatmain recognition process is made by the enables the SAPI to do several things [6]:RecognizeStream method provided by ISpSREngine - SAPI can calls the TTS engine objects on ainterface. This method informs the speech recognitionengine to start the recognition processing and send single thread.back the results to the SAPI application [5]. - SAPI ensures that parameter validation and During the execution of RecognizeStream method, thread synchronization have been performedthe SAPI will call the method SetSite provided by properly before calling a TTS engine.ISpSREngine interface. This method will create apointer to the ISpSREngineSite interface then theengine will communicate to SAPI [5]. Once an engine has recognized a phrase it sends anotification to the application which is the source ofthat phrase, then the engine will call the AddEventmethod [5]. “This method add an event to the speechrecognition engine so the stream position passed intoAddEvent indicates the point in the audio stream afterwhich the engine is seeking recognitions” [5]. “When the engine gets the final recognition, it willcall Recognition method provided in SpSREngineSitewith an SPRECORESULTINFO structure with the Figure 3: Main objects of TTShypothesis flag not set” [5]. So the SAPI DDI gives thepossibility of the engine to have one thread which 3.3. The Microsoft OLE DB Provider for SQLexecutes between SAPI and an engine [5]. Server 3.3.1. The problem. One of the needs related to data management is to move the data from its original
  4. 4. containing system into some type of databasemanagement system (DBMS) but this method is costlyand redundant. More than that another need is to beable to access the Data within a DBMS as well as toaccess the data via any other type of informationcontainer. OLE DB is a Microsoft tool created toaddress this issue [8].3.3.2. What is OLE DB? OLE DB (Object Linkingand Embedding, Database, sometimes written asOLEDB or OLE-DB) "OLE DB is a set of ComponentObject Model (COM) interfaces" which allowapplications to access data that stored in various datasources and it also provide applications with the abilityto implement database services. OLE DB is support the 3.3.4. The Microsoft OLE DB Provider for SQLmany of DBMS functionality that enabling it to share Server. The Microsoft OLE DB Provider for SQLits database [8]. Server provides is an OLE DB interface to Microsoft® SQL Server™ 2000 databases by allowing Activex3.3.3. OLE DB Providers Overview. The architecture data object (ADO) to directly access Microsoft SQLof OLE DB is spelt in two main components the first Server as the pervious figure with this provider,one is consumer which is the application which uses application can access data in remotely SQL ServerOLE DB and other one is provider which is the [10].software component that implements the OLE DBinterface and provides the data to the consumer [9]. 4. Exploratory survey Providers also split into two categories: serviceprovider and data provider. The first one encapsulates a A qualitative study was conducted with theservice by (producing as provider and consuming as objective of assessing the usefulness of an interactiveconsumer) data through OLE DB interfaces. But the multimodal application using speech for autisticdata provider owns its data and it not dependent on children especially those who have Asperger syndrome.other providers to provide data to the consumer (like The study sample of respondents was comprisedSQL server) [9]. As the following figure: basically of specialists working in autism centers as well as parents of children. The questionnaire was published on a specialized website for autism and Asperger patients. The number of respondents in this survey is 30 people. • Description: • The developed questionnaire contains two types of questions: - Closed questions, that require specific answers regarding issues related to the use of speech within a computational tool dedicated to Autistic. - Open questions, to get more feedback from the voters about the project and methods that could be used. a. Closed questions: i. The first question explored whether autistic children accept the use of computers and modern technologies. Figure 4: Architecture of OLE DB [9]
  5. 5. Results indicate that the majority of one of respondents disagreed with the respondents agreed with the statement view that displaying pictures with (66%) and the other 27% indicated that it pronunciation is perceived as helpful for sometimes helps while only 7% of autistic children. respondents disagreed with the view that technology is perceived as helpful for vi. The sixth question explored whether autistic children. sounds and songs help in developing the ii. The second question explored whether communication skills of autistic children. it’s possible to develop speech and Results indicate that the majority of pronunciation skills for autistic children respondents agreed with the statement by using speech based technologies. (80%) and the other 20% indicated that it Results indicate that the majority of sometimes helps while no one of respondents agreed with the statement (66%) and the other 27% indicated that it respondents disagreed with the view that sometimes helps while only 7% of sounds and songs are perceived as helpful respondents disagreed with the view that for autistic children. speech based technologies are seem as beneficial for autistic children. vii. The seventh question explored whether playing puzzle game -which is the mostiii. The third question explored whether it’s popular game in the autism world- using possible to further enhance the speech speech is contribute in developing speech skills of autistic children by a continuous and pronunciation skills of autistic training during the day with their children. Results indicate that the specialist at the centre and their parents at majority of respondents agreed with the home. Results indicate that the majority statement (67%) and the other 20% of respondents agreed with the statement indicated that they didnt know while 13% (87%) and the other 13% indicated that it of respondents disagreed. sometimes helps while no one of respondents disagree with the view. b. Open question: - How can communication skills of autisticiv. The fourth question explored whether the children be developed using computers? communication unit in autism centers • By designing sites online as well as programs using speech based software. Results for autistic children such as: movies cartoon. indicate that the majority of respondents • By offering appropriate educational programs have built-in: the voice and image, and the agreed with the statement (40%) and the gradient in these programs, commensurate other 33% indicated that they didnt know with the child. while only 27% of respondents disagreed • The program should be not deaf and not with the view that speech based speaking with a powerful psychotropic. technologies are used in the unit of • By viewing pictures with their own names, communication in their autism centers. and say their name clearly, and make the child repeat the word for the picture, and ask the v. The fifth question explored whether it’s child about them and correct the mistakes possible to enrich the vocabulary of he/she does. • Through the design of educational programs autistic children by displaying pictures using names, sounds and forms of animals, with pronunciation of their reference at birds, fruits and other things around the child. the same time. Results indicate that the • Through the establishment of programs majority of respondents agreed with the animated by voices, graphical images, statement (93%) and the other 7% simulations and also make sounds, graphics, indicated that they didnt know while no combined with each other, the civil aviation pictures of things in the childs daily life
  6. 6. integrated with voices familiar voices of the 2011), from: children of parents at home. us/library/ms720151(VS.85).aspx • Insert voices of teachers, showing pictures, [4] MSDN Library. SAPI Overview. and record the sounds of their parents. Retrieved ( Sep 12, 2011), from: • Use of programs for pronunciation, and speech recognition. • Flash programs supported with pictures and [5] MSDN Library. Speech Recognition API and DDI. pronunciation of names is an appealing way. Retrieved ( Sep 10, 2011), from: • Using the link between the images with the word lyrics (meaning) and there should be [6] MSDN Library. Speech Synthesis API and DDI. such simple songs to be fun for autistic Retrieved ( Sep 5, 2011), from: children. Conclusion [7] MSDN Library. TTS Engine Vendor Porting Guide. Retrieved (Sep 7, 2011), from: This project is about using speech technology to us/library/ms717037(v=vs.85).aspxcreate a multimodal application which is essential forenabling interaction with human-machine interfaces for [8] MSDN Library. Microsoft OLE DB.people with cognitive impaiments. With this Retrieved (Sep 7, 2011), from:technology, we will enable the system to respond to input as well as traditional methods. With us/library/ms722784(v=VS.85).aspxvoice-enabled feature, we can help not only regularusers that use keyboard or mouse, but also enable [9] MSDN Library. OLE DB Providers Overview. Retrieved (Sep 6, 2011) from:people with some disabilities like Asperger Syndrome easily connect to website and download the games. us/library/ms709836(VS.85).aspxSo, the key contribution of this system is a moreintuitive interactive application for our target [10] MSDN Library. OLE DB Provider for SQL Server.population which reduces some accessibility obstacles. Retrieved (Sep 6, 2011), from: us/library/aa213282(SQL.80).aspx6. Acknowledgments We acknowledge the support of the College ofComputer Science in King Saud University and thesupport of the Information Technology Department.We would like to express our sincerest thanks to ouradvisor, Dr. Souham Meshoul, for her guidance andsupport to achieve this project. Special thanks toDr.Areej Al-Wabil for reviewing a draft of thismanuscript.7. References[1] Rea, S. M. (2005). Building Intelligent .NET Application. Amsterdam, Addison-Wesley.[2] MSDN Library. Microsoft Speech Technologies Developer Center, Microsoft Corporation. Retrieved ( Sep 5, 2011), from:[3] MSDN Library. Microsoft Speech API 5.3 Speech API Overview. Retrieved ( Sep 12,