Machine translation

 ある言語のテキストを別の言語の等価なテキストに置き
換える技術
 高度な知性が要求される作業
 自然言語処理の応用分野だけにとどまらず人工知能の応
用分野

 理想：
• 高品質かつ、どの処理プロセスにおいても人手不要となるもの

 現実：
• 一般的には非現実的であり、難しいものとされている

 Pre-edited(予備編集):
• input is marked to indicate prefixes, suffixes
• 入力値の接頭辞、接尾辞などに指標を付ける作業
 Control(入力値コントロール):
• to control the vocabularies and grammatical structure
• 語順の入れ替え/差し替え作業
 Sublanguage(部分言語):
• MT system specialized in a specific field

 Bilingual systems(二か国語):
• Designed for two particular languages;特定の二言語用に設計されたもの
• Unidirectional(一方向性):
 Source Language(対象言語):SL  Target Language(目的言語):TL
• Bidirectional(両方向性):
 SL  TL
• Most are unidirectional
 多くは一方向となる
 Multilingual systems(多言語):
• Designed for more than a single pair of languages
• 多言語に対応するように設計されたもの
• Most are bidirectional
• 両方向が多い

 Direct translation approach(直接置き換え方式):
• SL  TL
 Interlingua approach(中間言語方式):
• SL  Interlingua(中間言語)  TL
 Transfer approach(トランスファ方式):
• SL  SL-表現  TL-表現 TL

 必要最低限の構文、意味解析以外は逐語的に翻訳
 SL解析は曖昧性、TLでの表現、TLでの正当語順の解消
 よって、SL解析は妥当な表現の生成のため
 ゆえに、
• Bilingual
• Unidirectional

 2段階プロセス：
• SLから他言語対応の意味-構文表現への変換
• 上から得た表現からTL生成
 TL生成プログラムの使い回しが可能という利便性
 表現変換プログラムが複雑となる

 3段階プロセス:
• SLから構造へ
• SL構造からTL構造への変換
• TL構造からTLへ
 中間言語方式では
• SLの曖昧性などは構造変換前に解消されていなければならない
 トランスファ方式では
• 構造同士の変換時解消される

In Analysis:解析
 Morphology(形態論):
• Identification of word endings, word compounds
• -ing, -ed などの接尾語、airport, housetopなどの複合語の認証
 Syntax(統語論):
• Identification of phrase structures, dependency subordination
• フレーズ、従属関係の認証
 Semantics(意味論):
• Resolution of lexical and structural ambiguities
• 語彙、構造の曖昧解消
In Synthesis:統合（生成）
 Semantics:
• Selection of appropriate compatible lexical and structural forms
• 語彙、構造の妥当選別
 Syntax:
• Generation of required phrase and sentence structures
• フレーズ、構文の生成
 Morphology:
• Generation of correct word forms
• 正当語形生成

 Empirical approach: based on parallel corpora
• Example-based MT(実例型機械翻訳方式)
 「ある文と似た文は元の文と同じように訳される」という原理に基づく
 Analogy-based, memory-based(記憶型推論), case-based, etc
• Statistical MT(統計的機械翻訳)
 Mathematical aspects of estimation
 数学的解釈による評価
 Rule-based MT
 Spoken-language MT(音声翻訳)

 The idea:
• to reuse examples of already existing translations as the basis for a new
translation.
• 「ある文と似た文は元の文と同じように訳される」という原理
• ある文を翻訳することをそれとよく似た文の翻訳を見つけ、それを模倣す
ることによって行う
 Three stages (3段階プロセス):
• Matching: matching the input against database
• マッチング：入力とデータベースの照合
• Alignment: identifying corresponding translation fragments
• アライメント：対応した翻訳部位との判定
• Recombination: recombining these fragments
• リコンビネーション：つなぎ合わせ

 Steps:
• First to align phrases, word groups and individual words of the
parallel texts
• フレーズ、語彙群、個々の語彙との位置合わせ
• Calculate the probabilities of correspondence of words in SL, TL
• SL,TLとの一致度合の確率の計算

 ルールベース方式、実例型方法、統計方法
• 一長一短
 Hybrid方式が生まれた
• どれにも対応可

 システム構成：
• 入力日本語文→規則方式機械翻訳→統計的後編集→後編集後英語文
 方法：
規則方式機械翻訳：辞書・文法
• 統計的後編集：文対応した日英特許対訳データを訓練データの機械
学習で得られた言語モデルと翻訳モデル
• 評価方法: BLEU, NISTと呼ばれるアルゴリズム
 結果：規則方式規則方式＋後編集
BLEU 0.1961 0.2998
NIST 6.1913 7.3058

 The Oxford Handbook of COMPUTATIONAL LINGUISTICS:
• Edited by RUSLAN MITKOV
• Oxford University Press出版
 自然言語処理 :
• 天野真家、石崎俊、宇津呂武仁、成田真澄、福本淳一
• オーム社出版
 自然言語処理：
• 長尾真
• 岩波書店出版
 規則方式機械翻訳と統計的後編集を組み合わせた特許文の日英機械翻訳
• 山梨英和大学人間文化学部人間文化学科教授江原暉将
• http://www.japio.or.jp/00yearbook/yearbook2010.html
 NIST:
• http://en.wikipedia.org/wiki/NIST_(metric)
 BLEU
• http://en.wikipedia.org/wiki/BLEU

Machine translation

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Similar to Machine translation

Similar to Machine translation (20)

More from Hiroshi Matsumoto

More from Hiroshi Matsumoto (17)

Machine translation