InterSpeech 2012Home13th Annual Conference of the International Speech Communication AssociationSeptember 9-13, 2012 |Portland, OregonAbout TheConferenceProgramGrants andAwardsSponsorsISDN Number:1990-9770ConferencePosterHelpful HotlinksEventScribeAttendee RosterFinal AgendaOrganizing SecretariatComputer-Assisted LanguageLearning (CALL) SystemsOverviewComputer-assisted language learning (CALL) provides an effectivelearning environment so that students can practice in an interactivemanner using multi-media content, either with the supervision ofteachers or on their own pace in self-learning. The advancement ofspeech and language technologies has opened new perspectives onCALL systems, such as automatic pronunciation assessment andsimulated conversational-style lessons. CALL is also regarded as oneof new and promising applications of speech analysis, recognition andsynthesis. CALL covers a variety of aspects including segmental,prosodic and lexical features. Modeling non-native speech to correctlysegment/recognize utterances while detecting errors included in themposes a number of challenges in speech processing. Assessingintelligibility of non-native speech or proficiency of non-nativespeakers is also an important issue. In this tutorial, we will give anoverview on these issues and current solutions. The tutorial is mainlytargeted for speech researchers and engineers interested in CALL, butalso for those engaged in language teaching or learning technology.First we review speech recognition technologies for pronunciationlearning, specifically pronunciation evaluation and error detection.Statistical approaches to these problems are formulated, and thenacoustic and pronunciation modeling of non-native speech is described.
Unlike the conventional non-native speech recognition, error detectioncapability is required in CALL, thus an effective error predictionscheme is vitally important. Next, we address prosodic modeling andevaluation, such as duration, stress and tones, and then the use ofspeech synthesis technologies including re-synthesis and morphing.After the review of basic component technologies, we introduce anumber of practical CALL systems which have been developed ascommercial products or deployed in classrooms, including those in ouruniversities. Majority of them focus on learning English as a secondlanguage (ESL), but some deal with other languages such as Japaneseand Chinese. We also review databases of non-native speech, whichare necessary to develop CALL systems.Outline1. Introduction and Overview (Kawahara)Review history and category of CALL systems.2. Segmental aspect and speech recognition technology(Kawahara)2.1. Speech analysis for CALL2.2. Segmentation of non-native speech2.3. Error detection of non-native speech2.4. Scoring of non-native speech2.5. Acoustic model for non-native speech2.6. Pronunciation model for non-native speech2.7. Discriminative modeling3. Prosodic aspect (Minematsu)3.1. Prosodic deviations found in non-native pronunciation
3.2. Duration modeling & evaluation3.3. Stress and tone modeling & evaluation3.4. Intonation modeling & evaluation4. Speech synthesis technology for CALL (Minematsu)4.1. Text-to-speech for CALL4.2. Re-synthesis for CALL4.3. Morphing for CALL5. Practical CALL systems (Kawahara)Review major CALL systems that have been developed anddeployed for learning English and other languages.6. Database for CALL (Minematsu)Review major databases of non-native speech, which arecritical resources in developing CALL systems.Short BiographiesTatsuya Kawahara is a professor in Academic Center for Computingand Media Studies and an affiliated professor in School of Informatics,Kyoto University.He has also been an invited researcher at ATR and NICT. He was avisiting researcher at Bell Laboratories from 1995 to 1996. He haspublished more than 200 technical papers on speech recognition,spoken language processing, and spoken dialog systems. He has beenmanaging several speech-related projects including a free speechrecognition engine Julius (http://julius.sourceforge.jp/) and theautomatic transcription system for the Japanese Parliament (Diet).From 2003 to 2006, he was a member of IEEE SPS Speech TechnicalCommittee. From 2011, he is a secretary of IEEE SPS Japan Chapter.He was a general chair of IEEE Automatic Speech Recognition &Understanding workshop (ASRU 2007). He has also served as atutorial chair of INTERSPEECH 2010 and a local arrangement chair ofICASSP 2012. He is an editorial board member of Elsevier Journal of
Computer Speech and Language, ACM Transactions on Speech andLanguage Processing, and APSIPA Transactions on Signal andInformation. He is a senior member of IEEE.E-mail: email@example.comWebpage: http://www.ar.media.kyoto-u.ac.jp/members/kawahara/Nobuaki Minematsu is an associate professor in Graduate School ofInformation Science and Technology, the University of Tokyo. He wasa visiting researcher at Royal Institute of Technology, Sweden (KTH)from 2002 to 2003. He has a very wide interest in speechcommunication covering from science to engineering. He has publishedmore than 200 scientific and technical papers including conferencepapers. Those papers are on speech analysis, speech perception, speechrecognition, speech synthesis, language learning systems, etc. He was amember of the organizing committee of Speech Prosody 2004, L2WS2010, INTERSPEECH 2010. From 2006, he is a member of SLaTE(ISCA SIG on Speech and Language Technology in Education). From2011, he is a treasurer of IEEE SPS Japan Chapter. He has also beenserving as an editorial board member of Acoustic Society of Japan, TheInstitute of Electronics, Information and Communication Engineers,and Information Processing Society of Japan.