Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this document? Why not share!

Instant speech translation 10BM60080 - VGSOM

on

  • 1,852 views

Instant speech translation

Instant speech translation

Statistics

Views

Total Views
1,852
Views on SlideShare
1,851
Embed Views
1

Actions

Likes
1
Downloads
32
Comments
0

1 Embed 1

http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Instant speech translation   10BM60080 - VGSOM Instant speech translation 10BM60080 - VGSOM Document Transcript

  • INSTANT SPEECH TRANSLATION By SATHIYASEELAN M 10BM60080 I Year M.B.A VGSOM, IIT Kharagpur
  • Index 1. Abstract .................................................................................................3 2. Instant Speech Translation – Eliminating Language Barriers ...........3 3. System Requirements ..........................................................................3 3.1. Speech Recognition ...............................................................................4 3.2. Language Parsing ..................................................................................5 3.3. Translation .............................................................................................5 4. Applications and their Business Potential ..........................................6 4.1. Mobile Applications and Services ...........................................................6 4.2. Voice Interface Devices with Local Language support ............................8 4.3. Data Entry Applications – in Multiple Languages ....................................9 4.4. e-Learning .............................................................................................9 4.5. Business Applications .......................................................................... 10 5. Key Players ......................................................................................... 11 6. Challenges Ahead ............................................................................... 11 7. Conclusion .......................................................................................... 12 8. References .......................................................................................... 13
  • 1. Abstract With the current pace of globalization, any Industry needs to look beyond Geographical borders. Indian IT firms provide services to Japanese, Korean clients etc. These firms also invest a lot on foreign language training programs. An Application that provides instant translation will not only cut down these costs but will also help gathering requirements more precisely and in a short span of time. Instant speech translation [IST] finds wide applications in other industries as well. Say in a country like India where numerous vernacular languages are in use, IST can be used in a number of ways in day-to-day life. There is huge potential for IST applications in mobile phones. All major players such as Google, Microso ft, and IBM have already come up with some sort of prototype for these kind of applications. Google Translator is one such primitive example. A lot many such applications will be in our gadgets soon. This Paper elaborates on few such applications and their business potential. 2. Instant Speech Translation – Eliminating Language Barriers Internet and mobile services has reached even remote villages. Now rural markets are considered significant in countries like China and India. Breaking Language barriers will further open up these markets for international business. Knowledge anywhere in any form should be used for the growth of the humanity. We should create opportunities for those who want to learn and share knowledge using their own native languages. Instant Speech translation will create a platform for them. This could unravel many things that are not known to the world. In “The Hitchhiker’s Guide to the Galaxy” Babel fish, a fictitious animal performs instant translations when kept in the ear. If such an application is there on the mobile, Say I call a person in Japan, I speak to him in English which would be translated to Japanese by the application and then transmitted through a telecom service provider. This will eliminate language boundaries and create a truly connected world. 3. System Requirements “We think speech-to-speech translation should be possible and work reasonably well in a few years’ time. Clearly, for it to work smoothly, you need a combination of high-accuracy machine translation and high-accuracy voice recognition, and that’s what we’re working on .If you look at the progress in machine translation and corresponding advances in voice recognition, there has been huge progress recently.” - Franz Och, Google’s head of translation services To develop an Instant speech Translation application, we need a robust speech recognition and Machine translation system. Following figure depicts the basic blocks of an instant speech translation system.
  • Fig. Basic Functional Blocks of Instant Speech Translation 3.1. Speech Recognition Advances in speech-recognition and dictation technology have made stunning leaps forward in recent years although it isn't perfect. Word Error Rate (WER) has drastically come down in the recent past. Fig. Word Error Rate of Speech Recognition Systems over Years Source - http://cacm.acm.org/ Communications of the ACM
  • Speech recognition has achieved good usability and there is a sudden surge in the speech controlled devices. Even Microsoft Vista had speech recognition capabilities which turned out to be a failure. But we had witnessed basic commands working in it. Just a listening and guessing system is not going to thi s forward. Robust speech recognition technology is an crucial part of Instant speech translation. Main problem systems face is in understanding the nuance of user’s enunciation and voice patterns. When used over a period of time it could reduce the speech recognition error rate. Mobile phones will have an upper hand over gadgets in this. As a mobile phone is used by only one user mostly and even users can’t avoid mobile phone usage. Mobiles can also soon recognise user’s natural free-style speech. Speech recognition systems can be customized to a particular user by having a predefined set of commands or words to be uttered by the user. This could help the system recognize its master’s voice patterns. This could be done with the help of a professional in early stages of development for this sort of customization. 3.2. Language Parsing Human sentences can’t be easily parsed by programs as they parse mathematical expressions. There is substantial ambiguity associated with the structure of human language. Some sort of linguistic analysis needs to be done to fetch the relevant information. Language parser splits the raw text into understandable word units and selects the correct form and class for each word that can have more than one interpretation and identifies the head words of a sentence. The information that is analysed by the language parser is passed to the machine translation engine for further tasks. There should be some set of protocols defined for communication between different languages. Say for e.g. Indian languages generally use SUBJECT-OBJECT-VERB pattern but in English SUBJECT-VERB-OBJECT pattern is generally used. Language Parser role is provide parsed language stream that can be easily interpreted by translators. 3.3. Translation Machine Translator translates a parsed input language stream to a well defined output language stream. Translation done by Machine translator will abide by the set of protocols defined for communication between a set of languages.
  • Fig. Machine Translation 4. Applications and their Business Potential IST applications have great business potential. Various players are almost set ready to roll out these services in various types of gadgets. 4.1. Mobile Applications and Services IST as a service: Instant Speech translation will have a lot many applications on mobile. It is highly impossible for an IST service provider to cover all languages and various colloquial forms in them. Hence the service provider can expose certain Application Programming I nterfaces (APIs) so that interested third parties can develop and sell them back to the IST service provider. This will become a viable business model once regional language enthusiasts start involving in this. IST service provider can bill the users based on usage. This sort of services can be launched in collaboration with the telecomm service provider.
  • Fig. A Model of IST Services on mobile IST as a product: Even these services can be packaged into a product. But this will be a heavy application to support an almost perfect translation. So in the initial stages user preferred language packs can be packed into a product and sold to the user. Fig. Users interacting through an IST application on mobile Service model will suit Indian languages and Product model will suit for international languages like Japanese. Service model will facilitate wide spread of these applications and it will also bring in various players into it.
  • Even IST applications can be used in other type of gadgets like iPod, iPad etc. Few basic stuffs are already available in App store for e.g. Jibbigo Voice Translation Fig. Screenshot of Jibbigo Application on iPod IST Development Standards To facilitate easy development and learning some set of standards need to be established similar to HTML in web design. As XML and JSON for machine readable data sharing, VOXML (Voice XML) can be used for these types of applications. 4.2. Voice Interface Devices with Local Language support Voice interface devices that support Local languages will soon be on use. Say a localities interacting with a railway information kiosk with their local language through speech. Instant speech translation will play a vital role in these types of interfaces. IST Applications can be at the front end of such devices. This will also consume lesser query resolving time as compared to traditional key entry enquiry system. As most of the voice driven applications currently support English. Even same is the case with Windows 7 Operating System. IST Application when used at the front end can translate local language speech input to English which can be further processed by Speech recognition systems supported by various Operating Systems.
  • English Local Normal Processing IST Command / Query Language done in a Railway Applications Generator Speech input Information Kiosk Fig. Various blocks in a Railway Information Kiosk that supports Regional Language support through speech 4.3. Data Entry Applications – in Multiple Languages IST Applications can help in Data entry applications in multiple languages. This could assist in translating legal documents to various languages. We have witnessed many court proceedings getting delayed due to lack of documents in regional languages. Our Governm ent also invests a lot in translating various documents to regional languages. In the years to come Microsoft word will have options to view translated versions while typing. This could cut down costs and time involved in such activities. 4.4. e-Learning Advancement in computing and bandwidth has brought the benefits of traditional classroom education into a distance learning environment. IST will take this a step forward by removing language barriers that impede the sharing of ideas and knowledge. Below figure depicts the schema of an e-classroom that uses IST.
  • Fig. IST Applications supporting Distance Learning in Various Languages Even IST applications could be used in webcasting in a similar way. 4.5. Business Applications IST Applications could also assist Business enterprises to interact with customers located across different geographies. IST will help in understanding customer requirements in short span of time. Users’ contribution to IST applications is very crucial. They can provide suggestions t o improvise the translation provided by the application. Some credits can be given to regular users who provide valuable suggestions. This will encourage local participation, which would ultimately help in improving the quality of service provided by IST applications. Applications of IST discussed here is just a tip of an iceberg. We would see a lot many such applications in future when IST applications are usable in real time. Then IST applications could be expanded to lot many sensitive areas like Health care, defence etc.
  • 5. Key Players Google was the first company to announce that it was working on speech -to-speech translation for mobile phones. The Latest Apps from Google Android that supports translation is Babylon that will give dictionary results in 75 different languages as well as full text translations in over 12 languages. Apple is working with IBM to roll out speech-to-speech translator for iPhones. IBM and Apple are already working closely on a few applications that will run on iPhone and iPad. IBM has been working on translation software and machine translation for years. In fact, they created MASTOR and the SMT (Statistical Machine Translation) technology that many other Translating Applications are using. Microsoft has inbuilt speech recognition support in its Operating systems. It has recently demonstrated German-English translation of a conversation between two Microsoft employees. It has made no official announcements on projects pertaining to Instant Speech Translation. Videos of Instant Speech Translation applications by other major players like AT&T, NEC, ATR float in YouTube. Nespole, Babylon, Verbmobil, MATRIX etc. are few well known speech translation systems developed by these players in this field. Extensive Research Projects are going on to improve the usability of Speech translation systems. PDA manufacturers could work in collaboration with these Application developers to accelerate these projects, which would also help them in gaining an upper hand over their competitors. 6. Challenges Ahead System that works well in real time environment will only be successful in the long run. Numerous hurdles need to be crossed to reach a perfect real time IST. One such is Speech Recognition with high accuracy. It is heavily dependent upon the quality of the input speech. Acoustical degradations produced by additive noise are an obstacle to reach desired accuracy. In a real time user is not going to use IST applications in a noise free environment. Hence IST application should be intelligent enough to separate out the user’s voice form the noise in the environment. IST applications are also expected to be intelligent enough to capture the user’s mood in the future. Monotonous voice from an IST application will soon make the user bored with these applications. Even a customisable voice from the IST application will make them more expressive and friendly. Adding Phonemes to computerised voice will it nearer to a human voice.
  • Industry should work in collaboration with research communiti es in resolving these hurdles and achieve a human like performance. 7. Conclusion Speech/Text Translation Applications are being used in variety of forms in number of devices. To attain humanlike performance, we must continue to invest in research. Along with speech, other sensory user inputs can also be integrated with IST applications to attain humanlike performance. Once that is achieved Instant speech translation will soon spread to devices like T.V. It wouldn’t be a surprise if text in the web now gets replaced by audio and video in the future “glocal” world.
  • 8. References 1. “Enhancing Global and Synchronous Distance Learning and Teaching by Using Instant Transcript and Translation” By Ivan Ho, Hajime Kiyohara, Akira Sugimoto, and Kazuo Yana Hosei. University Research Institute, California. 2. http://mashable.com/2010/02/08/speech-to-speech/ 3. http://domino.research.ibm.com/comm/research.nsf/pages/r.uit.innovation.html 4. http://technology.timesonline.co.uk/tol/news/tech_and_web/personal_tech/article701783 1.ece 5. http://blog.gts-translation.com/2010/03/02/microsoft-demos-speech-to-speech- translator/ 6. http://www.jibbigo.com/website/index.php 7. http://cacm.acm.org/magazines/2004/1/6588-challenges-in-adopting-speech-recognition