Machine Translation


Published on

Brief overview of the various types of machine translation, the benefits of using a machine translation solution; includes translation samples and resources.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Machine Translation

  1. 1. solutions Dispelling the myths of machine translation It is not surprising that myths, half-truths, and misunderstandings abound regarding machine translation: It seems as if the experience most players in the translation field have with this technology does not go beyond toying a little with one of the free online translation tools. Almost every week, I come across an article informing its readers either that machine translation is and always will be a complete waste of time or that machine translation, while being a waste of time today, might actually be useful some time in the distant future. In the hope of setting the record straight, here is a closer look at some of the most common myths about machine translation. Photo: Vasiliy Koval 22 AUGUST 2008 #5801_tcworld_04-08.indd 22 20.06.2008 14:17:19 Uhr
  2. 2. solutions By Uwe Muegge pre-translated in a machine translation system.1 • While most translations will Myth: Machine require some editing and many even rewriting, it is fair to expect translation simply does that a considerable percentage not work of machine-generated trans- lations turn out to be perfect With free online translation services available all (this is especially true for short over the web, anyone can run a text through a instructions, headings, legends, machine translation (MT) engine and then share and the like). the results with the public as proof of the fact • ?At a minimum, key terms will that machine translation is capable of little more be translated correctly and than the most rudimentary rough translations consistently. And not only that, (gisting), and, of course, providing nearly endless in most cases these terms will entertainment. also be inflected correctly and The main problem with these ‘tests’ is that using appear in the correct singular or any of the free online translation environments plural form (try to do that with gives only a glimpse of the true power of a full- your translation memory!) fledged professional machine translation system. Example of a German>English machine translation from the author’s For example, the typical online translation service Fact: Machine translation website does not allow users to select a subject field or enables the translation of provide user terminology, let alone set stylistic material that would otherwise preferences. In fact, many - if not most - of the not be translated free text translation tools support no translation Very few organizations, if any, parameters other than the specification of the currently translate all materials that language pair and the source text. No wonder would benefit from translation into that the translations these machine translation all the languages spoken by all of websites produce can be so ridiculously off target. their current or future customers. The primary reason for this is that Fact: Machine translation improves the for many types of documents, German search page for the Microsoft Knowledge Base with machine productivity and consistency of human especially in the after-sales domain, translation option enabled translators the budget is simply not available Whenever new source text for a project is created, for large-scale human translation. that text will have to be translated at some point. A number of organizations are using machine guage is as widely-held as it is wrong. All popular Even when you work in what is considered a translation solutions for making large volumes machine translation systems, including the free state-of-the-art globalization environment, i.e. of text available to their global customers in online translation services such as systransoft. an integrated content management/translation their local language without involving any com,, and windowsli- workflow system, you will end up with a certain human translators in the process. The Microsoft employ highly sophisticated percentage of low match/no match sentences. Knowledge Base, which contains more than algorithms that are the result of years of research In a well-planned and well-managed globalizati- 200000 documents in English, is a well-known and development. on project where writers, as well as software de- example of a text repository where the number velopers, use a comprehensive project glossary, of machine-translated documents by far exceeds Fact: There is not one but many very different as well as a style guide aimed at easy readability/ the number of those translated by humans. machine translation technologies that are all comprehensibility, the low/no match sentences capable of producing excellent translation Myth: Machine can be pre-translated in a machine translation results in the right environment system before being edited by human translators. Machine translation has been around for more translation systems can Benefits of machine-generated pre-translation: than 50 years, and during this half century a wide only handle word-for- • Translators always have a proposal to work range of MT technologies have evolved, e.g. word translation with instead of starting each new translation dictionary-based, rules-based, example-based, from scratch. A representative case study statistical - plus countless hybrid forms. Here is a recently conducted at Symantec indicates The belief that machine translation is basically brief discussion of the three machine translation that the productivity of human translators limited to the sequential substitution of words in technologies that are most relevant for commer- can double when unknown sentences are the source language with words in the target lan- cial applications today. 23 AUGUST 2008 #5801_tcworld_04-08.indd 23 20.06.2008 14:17:22 Uhr
  3. 3. solutions Rules-based Machine Translation translation packages are available for dozens of some of the rules-based systems, this MT techno- Rules-based machine translation, also known language combinations, many languages are still logy is primarily used by government agencies as transfer machine translation, is the dominant not covered. – the intelligence community in particular – and MT paradigm today. Systran, Babelfish, promt, large corporations. to name just a few, are all rules-based systems. Statistical Machine Translation Rules-based MT systems use a three-stage trans- Statistical machine translation (SMT) is getting a Direct Machine Translation lation process: lot of media attention these days, especially after In its most primitive form, the only thing a direct 1. Analysis: Parses the source sentence to create Microsoft announced that it is using a proprietary machine translation system does is to replace a tree of the syntactic structure of that sen- SMT system to translate its huge Knowledge Base the words in the source language with words in document repository2 and Google won a large- tence. the target language – in the same sequence and 2. Transfer: Converts the syntactic tree for the scale machine translation evaluation contest without any linguistic analysis or processing. The with its statistical machine translation engine.3 source language into the corresponding tree only resource direct machine translation uses for the target language. Statistical machine translation systems typically is a bilingual dictionary, which is why this MT 3. Generation: Populates the target tree with consist of two major components: technology is also known as dictionary-driven corresponding words to create a sentence in • Translation Model: Generates translation machine translation. the target language. proposals based on corresponding word se- Due to this rather unsophisticated technology, Benefits of rules-based machine translation quences in aligned source and target training direct machine translation has been considered include: data. obsolete for many years, and there are hardly any • Mature, proven technology that can be imple- • Language Model: Selects the best translation commercial products available that use direct MT. mented quickly and at relatively low cost. proposal based on training data in the target Despite its limited capabilities, I strongly believe • Many commercial systems available covering language only. that direct machine translation still has a place many language combinations. The good news about statistical machine in today’s arsenal of automated translation tools. • Highly customizable through dictionary and translation is that once an SMT system has been For a number of common real-world applications, style settings (some systems also support the trained on customer-specific data, this is the MT word-for-word or phrase-for-phrase substitution customization of the rules base). technology that typically produces the highest is all that is required for successful translation. Rules-based machine translation systems translation quality. On the flip side, that training Think of domains where both vocabulary and have been in use in commercial settings for effort requires a substantial body of existing syntax are standardized, as is the case with many years, e.g. at Autodesk, Daimler, and the translations: Language Weaver, the leading weather reports, financial profiles, and many European Commission’s Translation Service. vendor of statistical machine translation systems, e-commerce applications. The two primary challenges for rules-based MT recommends a bilingual corpus of two million In one recent implementation, Medtronic, a are first, that the rules base of any system is by words or more per language pair. Because of the large medical device manufacturer, used direct necessity limited, meaning that for best results, demanding training requirements, combined machine translation to translate a large product database into multiple languages.4 Human trans- authors need to adjust their writing style, and with the fact that statistical machine translation second, while commercial rules-based machine systems tend to have a higher sticker price than lation was not an option for this project because Flare without Help is like Help without Flare single package! Request your free demo versions now! 24 AUGUST 2008 + 49 Contact: #5801_tcworld_04-08.indd 24 20.06.2008 14:17:24 Uhr
  4. 4. solutions of cost and, yes, quality concerns (an analysis of ons may differ in many ways, the core translation previous human translation projects indicated an engine is typically the same in both products. In Sources: unacceptably high error rate among numeric va- other words: In terms of out-of-the-box translati- 1 Systran Software Inc. 2007. Systran lues such as product numbers and dimensions). on quality, there is generally little if any difference Case Study: Symantec. Systran Software Inc. Also, initial tests had shown that both translation between the 1000 dollar professional version Web site. [Online] 2007. [Cited: June 6, 2008.] memories and rules-based machine translation and the 50000 dollar corporate version of a given systems produced poor results with text that has machine translation product. dies/2007.12.Symantec.pdf. the following characteristics: In addition, the developers of commercial ma- 2 Microsoft Corporation. 2008. Machine – little or no repetition on the sentence level; chine translation systems have invested heavily Translation - Home. Microsoft Corporation Web – high repetition on the word/phrase level; into making their products as intuitive to use as site. [Online] 2008. [Cited: June 6, 2008.] http:// – telegraphic/elliptic style, e.g. ‘winds from possible. In fact, I would even say that it is easier southerly direction, speed reaching 55 km/h’, – and certainly faster – to produce your first trans- aspx. ‘American Technology Associates (AMTA) strong lation with a typical MT product than it is with the 3 Institute of Standards and Technolo- buy, Avion (AVIO) market outperform’, or ‘plate typical translation memory tool. gy. 2006. NIST 2006 Machine Translation 2456dr15 right-angled, slotted, 15 ea’. A few more facts to consider: Evaluation Official Results. National Instititue This type of translation project is most definitely • Many low-priced machine translation pro- of Standards and Technology Web site. [Online] among those that any self-respecting human ducts either feature a built-in translation me- November 1, 2006. [Cited: June 6, 2008.] translator could easily do without. And since mory (TM) module to improve the efficiency direct machine translation does not require of the post-editing process (‘never correct the doc/mt06eval_official_results.html. human post-editing in a best case scenario, using same mistake twice’), and a few MT systems 4 Fully Automatic High Quality Machine Trans- MT in this kind of environment might for once like promt Expert offer seamless integration lation of Restricted Text: A Case Study. Muegge, be welcomed by translators (who would hate to with the SDL Trados translation memory Uwe. 2006. London: The Association of do these translations themselves) and translati- system. Information Management (Aslib), 2006. Pro- on buyers (who would love the idea of almost • A number of translation tools vendors, such ceedings of the Twenty-eighth International instant, almost free translations). as Across, that cater to small and mid-sized Conference on Translating and the Computer. companies, offer TM-MT system bundles and/ ISBN 978-0-85142-5. Myth: Machine or MT integration via API. • User education and MT system customization translation is only for (e.g. building dictionaries), which are major fa large organizations ctors in achieving the best possible transla- tion results, are often easier to accomplish in Yes, it is true: If you read any success stories smaller organizations than in larger ones. about machine translation, they typically come from the Caterpillars, Microsofts, and Symantecs The bottom line of this world. But that is true for many - if not most - emerging technologies. It is also true that some of the most powerful machine transla- Since its inception, machine translation has been tion systems in use today are the result of the a highly controversial technology, and it will contact multi-million dollar research and development probably continue to be so for some time. Much programs only corporate giants can afford. But of this controversy is based on false assumptions that does not mean you have to spend big bucks about what machine translation can do and who Uwe Muegge is the cor- to deploy a machine translation solution. might benefit from using this type of technology. porate terminologist at Let me say it loud and clear: In general, the com- Medtronic, a manufacturer Fact: Being both affordable and user-friendly, mercial machine translation systems available of medical technology. many machine translation packages are today cannot replace human translators, especial- He serves in ISO Technical available for even the smallest of businesses, ly when those MT systems are operated by users Committee 37 SC3 Compu- including freelancers who have no linguistic background. However, ter Applications in Termnology and teaches Ter- If you do a little research, you will find that many when the goal is to improve the efficiency of the minology Management and Computer-Assisted commercial machine translation packages are in human translation process or to create compre- Translation at the Monterey Institute of Interna- the same price range as their translation memory hensible translations in environments where hu- tional Studies in Monterey, California. counterparts, and that is mostly true for both man translation is not an option, and when these workstation solutions for single users and client- systems are operated by trained and motivated server solutions for many users. And the secret is translation professionals, then machine translati- out that while corporate and small-business versi- on is and has been a very powerful solution. 25 AUGUST 2008 #5801_tcworld_04-08.indd 25 20.06.2008 14:17:26 Uhr