Machine Translation MT
&
Computer-Assisted
Translation CAT
Machine Translation
• Introduction to MT systems
• Generations of MT systems
• Different types of MT systems
• Construction of MT systems
• Knowledge representation
• Knowledge processing
• MT engines
• New directions of MT systems
• Evaluation of MT & CAT systems
Machine Translation:
Introduction
• Machine translation ( MT) is a long-term scientific dream
of enormous social, political and commercial importance.
• It was one of the earliest applications suggested for
digital computers, but turning this dream into reality has
turned out to be a much harder.
• Despite different problems and difficulties, some degree
of Machine translation is now a daily reality and it is likely
that in the future, the bulk of routine technical and
business translation will be done with some kind of
machine translation tools.
Machine Translation:
History
• The history of MT research has gone through a number
of phases in which certain frameworks have dominated.
• First generation: From the late 1960s the syntactic orientation
was dominant with syntactic transfer approaches.
• Second generation: In the 1980s the AI orientation was popular
and more attention was paid to semantics.
• Third generation: from 1990s the corpus-based model with
example-based methodologies is the focus of much translation
activity. (e.g. old versions of Electronic Dictionaries)
• Forth generation: from 2000s research on spoken translation has
developed into a major focus of MT activity. (e.g. latest versions
of Electronic Dictionaries )
• Last ten years: research on Computer-Assisted Translation CAT
has developed into a major focus of translation activity.
How To Construct an
MT system
Knowledge Representation
• Different kinds of knowledge are generally needed for
Machine translation and must be represented in such a
way it can be processed automatically by MTs
• Knowledge of the source language
• Knowledge of the target language
• Knowledge of the various correspondences between source
language and target language (at least knowledge of how
individual words can be translated)
• Knowledge of the culture, social conventions, etc.
• Etc.
• Several kinds of linguistic knowledge are usually
distinguished:
• Phonological knowledge
• Morphological knowledge
• Syntactic knowledge
• Semantic knowledge
Knowledge Representation:
Dictionary
• The central and largest component of an MT
system is Dictionary.
• The size and quality of the dictionary limits the
scope and coverage of a system and the quality of
translation that can be expected.
• “Electronic dictionaries” of MT must at least
represent the information we can find in “paper
dictionaries” in an appropriate fashion.
Knowledge Representation:
Paper Dictionaries
Knowledge Representation:
Electronic Dictionaries
• Entries in MT monolingual-dictionary will be equivalent to
collection of attributes and values, like the following:
• Lex = button, cat=n, ntype=common, number=sing, human=no,
concrete=yes.
• Lex=button, cat=v, vtype=main, finite=, person=, number=,.
• Entries can be implemented as records in a database.
• Entries in MT bilingual-dictionary are generally
represented by translation rules, like the following:
Button  ‫زر‬
/
‫برعم‬
/
‫زرر‬
/
‫زود‬
...
‫إلخ‬
This allows the replacement of certain source language oriented
information with corresponding target language information.
Knowledge Representation:
Morphology
• Morphology is concerned with the internal structure of words
and how words can be formed.
• MT & CAT systems must add a morphological components that
can recognize different word formation processes:
• Inflection: a word is derived from another word form by maintaining
the same part of speech or category: walk  walks
• Describe regular inflections by general rules, like:
Lex= walks, cat=v, +finite, person=3rd,number=sing, tense=pres)  V+s
• Describe irregular inflections by explicit rules, like:
Lex=be, cat=v, +finite, person=3rd,number=sing, tense=pres)  is
• Derivation: a word of a different category is derived from another
word or word stem by application of a process involving stems and
affixes: grammar, grammatical, arrive arrival
• regular derivational processes can be described by rules
• Irregular derivations can be solved simply by listing all derived words
• Compounding: a new word or unit is formed by combination of two
or more words
Knowledge Representation:
Syntax and Grammars
• Syntax is concerned with how sentences can be made up out of words.
• To describe syntax, a grammar (set of rules) is generally used in MT & CAT.
• For the first kind of information, programmers and developers with
consultations of linguists have to represent the concerned divisions of the
sentence into their constituent parts and the categorization of these parts as
nominal, verbal, and so on.
• Consider that in English “a sentence consists of noun phrase followed by an
auxiliary verb followed by a verb phrase. Noun phrase consists of …etc”. We
can represent these knowledge by the following grammar:
• S  NP (AUX) VP
• NP  (DET) (ADJ) N PP*
• VP  V (NP) PP*
• PP  P NP
• N  user | printer
• V  clean
• AUX  should
• DET  the | a
• P  with
• “a user should clean the printer” is a sentence in the above grammar
Knowledge Representation:
Meaning
• Knowledge about the meaning of sentences are an
important part of the translation process and allow MT &
CAT systems to produce better results.
• Three useful kinds of knowledge relating to the meaning
can be distinguished:
• Semantic knowledge: meaning of words and sentences
independently of the context they appear in.
• Pragmatic knowledge: meaning of expressions in situations
• Real world or common sense knowledge
• It is useful to represent these kind of knowledge in MT &
CAT systems in order to increase their performance.
Accomplishing this goal proved to be the most difficult
task in the developing the MT & CAT systems.
Knowledge Processing
• We give now an idea of how knowledge can be
manipulated automatically by MT systems
• This can be done in two stages: parsing and generation
• Parsing: is the process of taking an input string of expressions and
producing representations appropriate to the translation
• Generation: is the process of taking an appropriate
representation and producing the corresponding sentence
• A graphical representation will be used for parsing and
generation processes. However, the internal
representations are lists (very useful data structures).
Knowledge Processing:
Parsing
• The task of a parser is to take a formal grammar and a
sentence and
• Check if it is indeed grammatical
• Show how the words are combined into phrases
• Different parsing methods exist and are subdivided into
two categories: Top-Down parsing method and Bottom-
Up parsing method.
• Examples of parsing using grammars defined in the
previous sections and sentence “the user should clean
the printer” are given bellow.
Parsing: Bottom-Up algorithm
Parsing: Top-Down algorithm
Latest Engines in MT:
• Speech Recognition MT: trying to apply to MT
techniques which have been highly successful in
Automatic Speech Recognition.
• Computer-assisted Translation: the idea is to
collect a bilingual corpus of translation pairs and
then use a best match algorithm to find the
closest example to the source phrase in question.
Ex; Trados, Worsfast …etc.
What is a CAT Tool?
• CAT stands for "Computer Aided Translation Tool". The
terms "Translation Memory" and "TM" are sometimes
used to refer to the same type of tool. A CAT tool is a
computer program that helps a translator to work
efficiently.
This is achieved through three main functions:
• A CAT tool breaks texts into segments (sentences or
sentence fragments) and presents the segments in a
convenient way, to make translating easier and faster. In
some tools, for example Tardos , each segment is
presented in a special box, and the translation can be
entered in another box right below the source text.
• The translation of each segment is saved together with the
source text. Source text and translation will always be treated
and presented as a translation units (TU). You can return to a
segment at any time to check the translation. There are
special functions which help to navigate through the text and
to find segments which need to be translated or revised
(quality control).
• The main function of a CAT tool is to save the translation units
in a database, called translation memory , so that they can be
re-used for any other text, or even in the same text. Through
special "search" features. The search functions of CAT tools
can also find segments which do not match 100%. This saves
time and effort and helps the translator to use consistent
terminology.
•
Evaluation of MT & CAT Systems
• The evaluation of MT & CAT systems is a complex task. This
is not only because many different factors are involved, but
because measuring translation performance is itself
difficult.
• Clarity: a traditionally way of assessing the quality of translation is
to assign scores to output sentences.
• Accuracy: It is important to check whether the meaning of the
source is preserved in the translation.
• Error Analysis: tries to establish how seriously errors affect the
translation output.
• Test Suite: running the system on a large corpus of test texts will
reveal different possible problems.
How to start using Trados?
Steps to follow for creating, opening and exporting
a translation memory, and further basic features
of the software.
You have to take into account that these steps
correspond to SDL Trados 2006, so some menus
can be different in other Trados versions.
To create a translation memory:
1. Go to Windows / Start /All programas/ SDL Internacional
SDL Trados 2006 / Translator’s Workbench. The software will
start running and will request the user name.
2. Go to File / New.
3. A window will show where you have to choose the source and
target language by clicking on Add…. Then, click on Create….
4. A window will display where you have to enter the name for
the TM and browse where to save it.
Note:
Next time you open Translator’s Workbench, the last memory
used will be opened by default.
To open a translation memory:
There are two ways of opening a translation memory:
1. You can double click on the icon of the TM you wish to open,
or open Trados Translator’s Workbench.
2. Go to File / Open
3. A window will be displayed, were you have to look for the TM
you want to open, and once found, click on it.
• The Trados TM will provide an existing equivalent sentence in
the TL if it matches 100%.
• The Trados TM will provide suggested words or phrases in
different colors if the equivalent sentence does not match
100%.
• Easily select from the possible suggestions offered by the MT.
• Confirm these suggestions offered by MT or simply type your
own words or phrases.
• click Ctrl & Enter to confirm and move on to another
sentence.
• Once finished translating the whole text, click File …. Save as
….. rename the file …. Saving is accomplished .
Some pieces of advice:
• Don’t press Enter when you are inside a
translation unit since you can break it.
• Using the commands from the keyboard
speeds up the job.
• If you have any problems go to Help/Help
Topics in Translator’s Workbench.
Creatinga MultitermTermbaseto Use in
SDLTrados Studio
TWO IMPORTANT NOTES BEFORE YOU GET STARTED:
1. Multiterm is a separate program, it's not part of Trados or
Studio. It needs to be downloaded and installed separately, and
it appears as a standalone program in your SDL folder in your All
Programs list in Windows. If you don't see it there, make sure to
go to your SDL account and download and install the program
from the My Downloads page.
2. Termbases cannot be created in Trados or Studio. The "Create
New Termbase" you see in the SDL Trados main page or the
"Terminology Management" button in the Studio home page are
merely links that will take you to Multiterm, if it's installed in
your computer.
Creatinga simpleMultiterm termbase
Multiterm can be as simple or as complex as you want it to be.
In this example, the simplest kind of termbase will created:
source term = target term.
No other index fields will be included.
1. Open SDL Multiterm Desktop, Go to File, then select Create
Termbase then Save your termbase in the dialog box that opens:
Click Next on Step 1 to 5 of the Termbase Wizard ….choose your
languages…..Click Finish,
In this case the termbase has been created but it's empty.
To manually add terms, click on the Terms tab on the bottom
left of your screen, click F3 or click on the Add New Entry icon
right under the Edit menu. You will see the Entry screen, as
shown below.
Double click on the little box next to the pencil icon and enter
the term for each entry.
Press F12 to save the changes.
The term is now part of your termbase and therefore will be
available when you use the termbase in Studio.
This concludes the basics of termbase creation.

mt_cat_presentations CAT TRANSLATION PPT

  • 1.
  • 2.
    Machine Translation • Introductionto MT systems • Generations of MT systems • Different types of MT systems • Construction of MT systems • Knowledge representation • Knowledge processing • MT engines • New directions of MT systems • Evaluation of MT & CAT systems
  • 3.
    Machine Translation: Introduction • Machinetranslation ( MT) is a long-term scientific dream of enormous social, political and commercial importance. • It was one of the earliest applications suggested for digital computers, but turning this dream into reality has turned out to be a much harder. • Despite different problems and difficulties, some degree of Machine translation is now a daily reality and it is likely that in the future, the bulk of routine technical and business translation will be done with some kind of machine translation tools.
  • 4.
    Machine Translation: History • Thehistory of MT research has gone through a number of phases in which certain frameworks have dominated. • First generation: From the late 1960s the syntactic orientation was dominant with syntactic transfer approaches. • Second generation: In the 1980s the AI orientation was popular and more attention was paid to semantics. • Third generation: from 1990s the corpus-based model with example-based methodologies is the focus of much translation activity. (e.g. old versions of Electronic Dictionaries) • Forth generation: from 2000s research on spoken translation has developed into a major focus of MT activity. (e.g. latest versions of Electronic Dictionaries ) • Last ten years: research on Computer-Assisted Translation CAT has developed into a major focus of translation activity.
  • 5.
    How To Constructan MT system
  • 6.
    Knowledge Representation • Differentkinds of knowledge are generally needed for Machine translation and must be represented in such a way it can be processed automatically by MTs • Knowledge of the source language • Knowledge of the target language • Knowledge of the various correspondences between source language and target language (at least knowledge of how individual words can be translated) • Knowledge of the culture, social conventions, etc. • Etc. • Several kinds of linguistic knowledge are usually distinguished: • Phonological knowledge • Morphological knowledge • Syntactic knowledge • Semantic knowledge
  • 7.
    Knowledge Representation: Dictionary • Thecentral and largest component of an MT system is Dictionary. • The size and quality of the dictionary limits the scope and coverage of a system and the quality of translation that can be expected. • “Electronic dictionaries” of MT must at least represent the information we can find in “paper dictionaries” in an appropriate fashion.
  • 8.
  • 9.
    Knowledge Representation: Electronic Dictionaries •Entries in MT monolingual-dictionary will be equivalent to collection of attributes and values, like the following: • Lex = button, cat=n, ntype=common, number=sing, human=no, concrete=yes. • Lex=button, cat=v, vtype=main, finite=, person=, number=,. • Entries can be implemented as records in a database. • Entries in MT bilingual-dictionary are generally represented by translation rules, like the following: Button  ‫زر‬ / ‫برعم‬ / ‫زرر‬ / ‫زود‬ ... ‫إلخ‬ This allows the replacement of certain source language oriented information with corresponding target language information.
  • 10.
    Knowledge Representation: Morphology • Morphologyis concerned with the internal structure of words and how words can be formed. • MT & CAT systems must add a morphological components that can recognize different word formation processes: • Inflection: a word is derived from another word form by maintaining the same part of speech or category: walk  walks • Describe regular inflections by general rules, like: Lex= walks, cat=v, +finite, person=3rd,number=sing, tense=pres)  V+s • Describe irregular inflections by explicit rules, like: Lex=be, cat=v, +finite, person=3rd,number=sing, tense=pres)  is • Derivation: a word of a different category is derived from another word or word stem by application of a process involving stems and affixes: grammar, grammatical, arrive arrival • regular derivational processes can be described by rules • Irregular derivations can be solved simply by listing all derived words • Compounding: a new word or unit is formed by combination of two or more words
  • 11.
    Knowledge Representation: Syntax andGrammars • Syntax is concerned with how sentences can be made up out of words. • To describe syntax, a grammar (set of rules) is generally used in MT & CAT. • For the first kind of information, programmers and developers with consultations of linguists have to represent the concerned divisions of the sentence into their constituent parts and the categorization of these parts as nominal, verbal, and so on. • Consider that in English “a sentence consists of noun phrase followed by an auxiliary verb followed by a verb phrase. Noun phrase consists of …etc”. We can represent these knowledge by the following grammar: • S  NP (AUX) VP • NP  (DET) (ADJ) N PP* • VP  V (NP) PP* • PP  P NP • N  user | printer • V  clean • AUX  should • DET  the | a • P  with • “a user should clean the printer” is a sentence in the above grammar
  • 12.
    Knowledge Representation: Meaning • Knowledgeabout the meaning of sentences are an important part of the translation process and allow MT & CAT systems to produce better results. • Three useful kinds of knowledge relating to the meaning can be distinguished: • Semantic knowledge: meaning of words and sentences independently of the context they appear in. • Pragmatic knowledge: meaning of expressions in situations • Real world or common sense knowledge • It is useful to represent these kind of knowledge in MT & CAT systems in order to increase their performance. Accomplishing this goal proved to be the most difficult task in the developing the MT & CAT systems.
  • 13.
    Knowledge Processing • Wegive now an idea of how knowledge can be manipulated automatically by MT systems • This can be done in two stages: parsing and generation • Parsing: is the process of taking an input string of expressions and producing representations appropriate to the translation • Generation: is the process of taking an appropriate representation and producing the corresponding sentence • A graphical representation will be used for parsing and generation processes. However, the internal representations are lists (very useful data structures).
  • 14.
    Knowledge Processing: Parsing • Thetask of a parser is to take a formal grammar and a sentence and • Check if it is indeed grammatical • Show how the words are combined into phrases • Different parsing methods exist and are subdivided into two categories: Top-Down parsing method and Bottom- Up parsing method. • Examples of parsing using grammars defined in the previous sections and sentence “the user should clean the printer” are given bellow.
  • 15.
  • 16.
  • 17.
    Latest Engines inMT: • Speech Recognition MT: trying to apply to MT techniques which have been highly successful in Automatic Speech Recognition. • Computer-assisted Translation: the idea is to collect a bilingual corpus of translation pairs and then use a best match algorithm to find the closest example to the source phrase in question. Ex; Trados, Worsfast …etc.
  • 18.
    What is aCAT Tool? • CAT stands for "Computer Aided Translation Tool". The terms "Translation Memory" and "TM" are sometimes used to refer to the same type of tool. A CAT tool is a computer program that helps a translator to work efficiently. This is achieved through three main functions: • A CAT tool breaks texts into segments (sentences or sentence fragments) and presents the segments in a convenient way, to make translating easier and faster. In some tools, for example Tardos , each segment is presented in a special box, and the translation can be entered in another box right below the source text.
  • 19.
    • The translationof each segment is saved together with the source text. Source text and translation will always be treated and presented as a translation units (TU). You can return to a segment at any time to check the translation. There are special functions which help to navigate through the text and to find segments which need to be translated or revised (quality control). • The main function of a CAT tool is to save the translation units in a database, called translation memory , so that they can be re-used for any other text, or even in the same text. Through special "search" features. The search functions of CAT tools can also find segments which do not match 100%. This saves time and effort and helps the translator to use consistent terminology. •
  • 20.
    Evaluation of MT& CAT Systems • The evaluation of MT & CAT systems is a complex task. This is not only because many different factors are involved, but because measuring translation performance is itself difficult. • Clarity: a traditionally way of assessing the quality of translation is to assign scores to output sentences. • Accuracy: It is important to check whether the meaning of the source is preserved in the translation. • Error Analysis: tries to establish how seriously errors affect the translation output. • Test Suite: running the system on a large corpus of test texts will reveal different possible problems.
  • 21.
    How to startusing Trados? Steps to follow for creating, opening and exporting a translation memory, and further basic features of the software. You have to take into account that these steps correspond to SDL Trados 2006, so some menus can be different in other Trados versions.
  • 22.
    To create atranslation memory: 1. Go to Windows / Start /All programas/ SDL Internacional SDL Trados 2006 / Translator’s Workbench. The software will start running and will request the user name. 2. Go to File / New. 3. A window will show where you have to choose the source and target language by clicking on Add…. Then, click on Create…. 4. A window will display where you have to enter the name for the TM and browse where to save it. Note: Next time you open Translator’s Workbench, the last memory used will be opened by default.
  • 25.
    To open atranslation memory: There are two ways of opening a translation memory: 1. You can double click on the icon of the TM you wish to open, or open Trados Translator’s Workbench. 2. Go to File / Open 3. A window will be displayed, were you have to look for the TM you want to open, and once found, click on it.
  • 26.
    • The TradosTM will provide an existing equivalent sentence in the TL if it matches 100%. • The Trados TM will provide suggested words or phrases in different colors if the equivalent sentence does not match 100%. • Easily select from the possible suggestions offered by the MT. • Confirm these suggestions offered by MT or simply type your own words or phrases. • click Ctrl & Enter to confirm and move on to another sentence. • Once finished translating the whole text, click File …. Save as ….. rename the file …. Saving is accomplished .
  • 27.
    Some pieces ofadvice: • Don’t press Enter when you are inside a translation unit since you can break it. • Using the commands from the keyboard speeds up the job. • If you have any problems go to Help/Help Topics in Translator’s Workbench.
  • 28.
    Creatinga MultitermTermbaseto Usein SDLTrados Studio TWO IMPORTANT NOTES BEFORE YOU GET STARTED: 1. Multiterm is a separate program, it's not part of Trados or Studio. It needs to be downloaded and installed separately, and it appears as a standalone program in your SDL folder in your All Programs list in Windows. If you don't see it there, make sure to go to your SDL account and download and install the program from the My Downloads page. 2. Termbases cannot be created in Trados or Studio. The "Create New Termbase" you see in the SDL Trados main page or the "Terminology Management" button in the Studio home page are merely links that will take you to Multiterm, if it's installed in your computer.
  • 29.
    Creatinga simpleMultiterm termbase Multitermcan be as simple or as complex as you want it to be. In this example, the simplest kind of termbase will created: source term = target term. No other index fields will be included. 1. Open SDL Multiterm Desktop, Go to File, then select Create Termbase then Save your termbase in the dialog box that opens:
  • 30.
    Click Next onStep 1 to 5 of the Termbase Wizard ….choose your languages…..Click Finish, In this case the termbase has been created but it's empty.
  • 31.
    To manually addterms, click on the Terms tab on the bottom left of your screen, click F3 or click on the Add New Entry icon right under the Edit menu. You will see the Entry screen, as shown below.
  • 32.
    Double click onthe little box next to the pencil icon and enter the term for each entry.
  • 33.
    Press F12 tosave the changes. The term is now part of your termbase and therefore will be available when you use the termbase in Studio. This concludes the basics of termbase creation.