• Save
Machine Tanslation
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
3,363
On Slideshare
3,357
From Embeds
6
Number of Embeds
1

Actions

Shares
Downloads
2
Comments
2
Likes
3

Embeds 6

http://www.linkedin.com 6

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  •  Statistical machine translation (SMT) can be defined as the process of maximizing the probability of a sentence s in the source language matching a sentence t in the target language.
  • the knowledge-based approach’s is very labour intensive, time-consuming, and expensive. And even after decades of work, the systems don’t generally provide more than the basic idea of a document’s meaning. However, until recently, knowledge based systems were still preferred by many researchers who contended that the statistical approach was too simple to effectively handle a complex task like translation. In addition, statistical systems require fast processors and large amounts of RAM, which were not readily and inexpensively available until several years ago.
  • It is critical to continue research and development in any field, knowing the current state of the technology, rather than re-inventing the wheel. Existing translation engines will be explained in the following slides.
  • Even as technology opens up e-commerce opportunities, companies must overcome language barriers to reach new potential customers and business partners. For example, many companies have decided to develop Web sites in the languages of the countries in which their customers and partners live.
  • Building machines to automate tasks requiring intelligent behaviour. Machine Translation (MT) is a subfield of natural language processing that involves automatic translation of sentences from one natural language to another.The sub-domain of artificial intelligence concerned with the task of developing programs possessing some capability of ‘understanding’ a natural language in order to achieve some specific goal.
  • Machine Translation also known as Automatic Translation is the process that translates one human language to another. A Machine Translation Systems can be thought of as a compiler. A compiler translates a high-level programming language like C++, Java and the like to low-level languages like assembly and machine language. The only difference being that the grammar of a natural language like English or Hindi is much more complex compared to the grammar of a programming language.
  • A Machine Translation Systems can be thought of as a compiler. A compiler translates a high-level programming language like C++, Java and the like to low-level languages like assembly and machine language. The only difference being that the grammar of a natural language like English or Hindi is much more complex compared to the grammar of a programming language.
  • Commercial interest:U.S. has invested in MT for intelligence purposesMT is popular on the web—it is the most used of Google’s special featuresEU spends more than $1 billion on translation costs each year.(Semi-)automated translation could lead to huge savingsAcademic interest:One of the most challenging problems in NLP researchRequires knowledge from many NLP sub-areas, e.g., lexical semantics, parsing, morphological analysis, statistical modeling,…Being able to establish links between two languages allows for transferring resources from one language to another
  • The sentence construction is parallel, but the meanings are entirely different: the first is a figure of speech involving a metaphor and the second is a literal description. And the identical words in the sentences - flies and like - are used in different grammatical categories

Transcript

  • 1. An Overview of Machine Translation A Presentation by:
  • 2. Outline 2  Introduction  A brief introduction to Translation technology  Interest in MT  Problems Involved in Machine Translation Translation Technology  Knowledge-based systems  Statistical machine translation systems  Rule-Based vs. Statistical MT  Current State of Machine Translation in Use Personal Speech-to-Speech Translators Machine Translation
  • 3. Introduction 3 •These factors have increased both the demand for translation services and interest in computerized translation technology. •Some industry observers say machine translation, a largely experimental technology that has been around since the late 1950s, is now ready to become commercially viable. Machine Translation
  • 4. Definition The sub-domain of artificial intelligence concerned with the task of developing programs possessing some capability of NLP ’ a natural language in order to achieve some specific goal. A transformation from one representation (the input text) to Understanding another (internal representation) Machine Translation
  • 5. Introduction: 5 Machine Translation : The use of computers to translate from one language to another. One of the oldest dreams of NLP, AI, and CS (first system in 1954). Machine Translation
  • 6. 6 Why Machine Translation? •Cheap, universal access to world’s online information regardless of original language. (That’s the goal) Machine Translation
  • 7. Interest in MT 7 Interest in MT Commercial interest Academic interest challenging problems in Requires knowledge from U.S. has invested in MT transferring resources from one NLP research many NLP sub-areas language to another MT is popular on the web lexical semantics parsing EU spends more than $1 statistical morphological analysis billion on translation modeling (Semi-)automated translation Machine Translation
  • 8. Problems Involved in Machine Translation 8 Ambiguity syntactic irregularity multiple word meanings the influence of context are the main problems faced by MT systems. A classic example is illustrated in the following pair of sentences: Time flies like an arrow. Fruit flies like an apple. Machine Translation
  • 9. How can a machine understand these differences? 9  Get the cat with the gloves. Machine Translation
  • 10. Outline 10  Introduction  A brief introduction to Translation technology  Interest in MT  Problems Involved in Machine Translation Translation Technology  Knowledge-based systems  Statistical machine translation systems  Rule-Based vs. Statistical MT  Current State of Machine Translation in Use Personal Speech-to-Speech Translators Machine Translation
  • 11. TRANSLATION TECHNOLOGY 11 •There are two kinds of machine translation: •Knowledge-based systems •Statistical machine translation •Knowledge-based systems Traditional translation technology takes a knowledge- based approach. These expert systems—used by vendors such as Fujitsu, Logos, and Systran—translate documents by converting words and grammar directly from one language into another. Machine Translation
  • 12. Knowledge-based systems 12 How they work. Hmm, every time he sees Knowledge based systems ―banco‖, he either types rely on programmers to enter ―bank‖ or ―bench‖ … but if various languages’ vocabulary Man, this is so boring. he sees ―banco de…‖, he always types ―bank‖, and syntax information into never ―bench‖… data bases. The programmers then write lists of rules that describe the possible relationships between a language’s parts of speech. The software, which can run Translated documents on a high-powered PC, analyzes a document and examines the rules for both the Machine Translation 12 text’s language and the target
  • 13. Statistical machine translation systems 13 Rather than using the knowledge based system’s Statistical machine translation direct word-by-word translation techniques, statistical approaches translate documents by statistically analyzing entire phrases and, over time, ―learning‖ how various languages work. How it works. Statistical systems start with minimal dictionary and language resources. Users then must train the system before they can work with it on extensive translations. During the training, researchers feed the system documents for which they already have accurate human translations. The system then uses its resources to guess at the documents’ meanings. Machine Translation
  • 14. Statistical machine translation 14 systems Statistical systems generally work by dividing documents into N-grams, with N the number of words, usually three, in a phrase. N-grams are statistical translation’s building blocks. Analyzing N-grams helps improve translation accuracy and performance because, while a word by itself may have many definitions, it has far fewer potential meanings when used as part of a phrase. Machine Translation
  • 15. Statistical machine translation 15 systems Machine Learning Magic Books in Same books, English in Farsi P(F|E) model Statistical machine translation (SMT) can be defined as the process of maximizing the probability of a sentence s in the source language matching a sentence t in the target language. We call collections stored in two languages parallel corpora or parallel texts. Machine Translation
  • 16. Statistical machine translation 16 systems Statistical machine translation systems, which statistically analyze entire phrases and ―learn‖ how various languages work, frequently work with other types of systems to improve output quality. The lexicon system provides translated words and their variations. The alignment system assures that phrases from the source language are converted to the proper phrases and presented in the proper order in the target language. The language system performs a morphological analysis of individual words or a syntactic analysis of sentences and thereby produces translations that read properly. Machine Translation
  • 17. Rule-Based vs. Statistical MT 17  Rule-based MT:  very labour intensive, time-consuming, and expensive  Rules can be based on lexical or structural transfer  Each program must be customized for each language-pair it works with.  Pro: firm grip on complex translation phenomena  Con: time-consuming, and expensive,Often very labor-intensive -> lack of robustness  Statistical MT  Mainly word or phrase-based translations  Translation are learned from actual data  In general, in statistical machine translation, if more data will be provided for learning; higher will be the quality of translation.  Pro: Translations are learned automatically  Con: Difficult to model complex translation phenomena Machine Translation
  • 18. Current State of Machine Translation in Use 18 Google Translate is a service provided by Google Inc. to translate a section of text, or a webpage, into another language, with limits to the number of paragraphs, or range of technical terms, translated. For some languages, users are asked for alternative translations, such as for technical terms, to be included for future updates to the translation process. Google translate is based on an approach called statistical machine translation. Machine Translation
  • 19. Current State of Machine Translation in Use cont. 19 SYSTRAN's methodology is a sentence by sentence approach, concentrating on individual words and their dictionary data, then on the parse of the sentence unit, followed by the translation of the parsed sentence. AltaVista’s Babel fish Babel Fish is a web-based application developed by AltaVista (now part of Yahoo!) which automatically translates text or web pages from one of several languages into another. The translation technology for Babel Fish is provided by SYSTRAN, whose technology also powers a number of other sites and portals. Machine Translation
  • 20. Current State of Machine Translation in Use cont. 20 is a Los Angeles, California–based company that was founded in 2002 by the University of Southern California's Kevin Knight and Daniel Marcu, to commercialize a statistical approach to automatic |language translation and natural language processing - now known globally as statistical machine translation software (SMTS) Language Weaver’s statistically-based translation software is an instance of a recent advance in automated translation. is a service provided by Microsoft as part of its Windows Live services which allow users to translate texts or entire web pages into different languages. Computer-related texts are translated by Microsoft's own statistical machine translation technology for eight supported languages Machine Translation
  • 21. Personal Speech-to-Speech Translators 21 •One of the newest research areas in machine translation is the personal speech to- speech translator. People on business or personal trips could use these devices to translate on the fly. Speech-to-speech translation, which is still in the experimental stage, is a complex process requiring speech-recognition technology that converts speech to text, machine translation of the text, and then text- to-speech conversion. •IBM is working on the handheld multilingual automatic speech-to-speech translator (Mastor), which uses a hybrid statistical/knowledge-base engine to translate the content. Mastor tries to determine the general meaning of a phrase, rather than its exact translation. This approach requires less database capacity, which makes it more suitable for small devices. Machine Translation
  • 22. LOOKING AHEAD 22 •Because of ongoing demand for better translation systems, research money will continue to flow into the field. In addition, companies are likely to develop and release more commercial products. Machine Translation
  • 23. Questions ? 23 http://www.youtube.com/watch?v=jZCecsdlM7Q Machine Translation