Katsuhito Sudoh - 2015 Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2015

•

0 likes•56 views

Association for Computational Linguistics

The document summarizes research on Chinese-to-Japanese patent machine translation using syntactic pre-ordering. It presents two pre-ordering approaches: rule-based pre-ordering that reorders Chinese sentences to head-final order based on dependency parsing; and data-driven pre-ordering that learns to rank word order using machine learning. Evaluation shows the rule-based system achieves comparable results to the baseline translation system, while being more efficient, though with some loss of accuracy compared to the data-driven approach. Pre-ordering is found to be effective but challenges remain in word alignment and domain adaptation.

Education

Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2015
Katsuhito Sudoh and Masaaki Nagata, NTT Communication Science Laboratories, Japan
Overview Rule-based pre-ordering SMT setup
Results
Conclusion
Data-driven pre-ordering
Syntactic Analysis
Dependency-based pre-ordering for Zh-Ja MT
- Patent-adapted in-house dependency parser
- Two-types of pre-ordering:
* Rule-based, Head Final Chinese (Han+ 2012)
* Data-driven, Learning to Rank (Yang+ 2012)
- Rule-based system is better, comparable to T2S
References:
Han, Dan et al., Head Finalization Reordering for Chinese-to-Japanese MT,
Proc. SSST-6 (2012)
Hoshino, Sho et al., Discriminative Preordering Meets Kendall’s tau Maxi-
mization, Proc. ACL (2015)
Isozaki, Hideki et al, HPSG-Based Preprocessing for English-to-Japanese
Translation, ACM TALIP No.11 Vol.3 (2012)
Suzuki, Jun et al, An Empirical Study of Semi-supervised Structured Con-
ditional Models for Depenedency Parsing, Proc. EMNLP (2009)
Suzuki, Jun et al., 拡張ラグランジュ緩和を用いた同時自然言語解析法,
Proc. NLP (2012) [in Japanese]
Yang, Nan et. al., A Ranking-based Approach to Word Reordering for SMT,
Proc. ACL (2012)
Standard Moses Phrase-based MT
- MGIZA word alignment, g-d-f-a symal
- Kneser-Ney phrase-table score smoothing
- Word 5-gram LM with Kneser-Ney smoothing
- Distortion limit: 9 (chosen over 0,3,6,9)
- Weights chosen over 5 indep. MERT runs
Comparable to T2S baseline
Rule-based is better than data-driven
Pre-ordering is a deterministic approx. of T2S
--- good in efficiency with some loss in accuracy
> forest-based pre-ordering, pre-ordering lattice
Rule-based pre-ordering works robustly
--- due to head-final nature in Japanese
Data-driven pre-ordering is still challenging...
--- difficulty in word alignment, non-parallelism
--- constituent or dependency structures?
Remained patent MT issues:
- Context awareness (consistency)
- Domain awareness (lexical choice)
Reordering into head-final order in Japanese
(En-Ja: Isozaki+ 2012, Zh-Ja: Han+ 2012)
Base rule: Moving a head word after its modifiers
Exceptions (placed after their head words):
AS (aspect particle), SP (sentence-final particle)
PU (punctuation), CC (coordinating conjunction)
IJ (interjection), “不”(negation),“等”(”etc.”)
Reordering by reranking a head & its modifiers
(Yang+ 2012)
- Implemented with Ranking SVM
* Features: - surface/POS (head & modifier)
- head surface/POS (h & m)
- modifier surface/POS (head)
- span surfaces/POSs (modifier)
- relative position (h & m)
* Reordering oracles are determined by maxi-
mizing Kendall’s tau criterion (Hoshino+ 2015)
Pros: stability, domain independence (?)
Cons: effort for rule management
Pros: no special effort, target adaptability
Cons: instability, noisy auto. word alignment
[Word segmentation & POS tagging]
- Joint sequential labeling (Suzuki+ 2012)
[Dependency parsing (untyped)]
- Second-order graph-based parsing
[Semi-supervised learning] (Suzuki+ 2009)
- Labeled: 31K sents. (news), 35K sents. (patents)
- Unlabeled: 9GB (news), 100GB (patents)
Accuracy
(F0 / UAS)
Word seg. POS Dep.
0.927 0.855 0.927
Table 1: Performance in Chinese syntactic analysis
Table 2: Official evaluation results
我不看电视
PN
我
PNAD
不
ADVV
看
VVNN
电视
NN
我不看电视
PN
我
PNAD
不
ADVV
看
VVNN
电视
NN
Head Finalization
Exception
1 4 3 2
BL PBMT
Rule-based
Data-driven
BL T2S
Human
n/a
20.75
16.25
8.00
0.781
0.814
0.822
0.812
0.382
0.394
0.406
0.399
BLEURIBES

This document discusses H-Cholesky factorization on many-core accelerators. It summarizes that H-Cholesky factorization decomposes large data matrices into lower-order forms to reveal their inherent structure and characteristics. It also describes how hierarchical matrices represent dense matrices from integral/partial differential equations in a hierarchical, block-oriented way with log-linear memory costs. Finally, it shows that H-Cholesky factorization can be efficiently parallelized and optimized for multicore and manycore architectures like GPUs.

ch7

KITE www.kitecolleges.com

The document discusses relational database design and normalization. It covers topics like functional dependencies, normal forms like first normal form, Boyce-Codd normal form and third normal form. The goal of normalization is to organize data in a database to avoid insertion, deletion and update anomalies. Normalization techniques like decomposition are used to divide tables and isolate data to achieve higher normal forms.

Semantic Rules Representation in Controlled Natural Language in FluentEditor

Cognitum

Abstract. The purpose of this paper is to present a way of representation of semantic rules (SWRL) in controlled natural language (English) in order to facilitate understanding the rules by humans interacting with a machine. The rule representation is implemented in FluentEditor – ontology editor with controlled natural language (CNL). The representation can be used in a lot of domains where people interact with machines and use specialized interfaces to define knowledge in a system (semantic knowledge base), e.g. representing medical knowledge and guidelines, procedures in crisis management or in management of any coordination processes. Such knowledge bases are able to support decision making in any discipline provided there is a knowledge stored in a proper semantic way.

4 the relational data model and relational database constraints

Kumar

The document discusses the relational data model and constraints in relational databases. It begins by defining key concepts in the relational model such as relations, tuples, attributes, domains and relation schemas. It then covers relational constraints including key constraints, entity integrity constraints, and referential integrity constraints. Examples are provided to illustrate these concepts and constraints. The chapter aims to provide an overview of the formal relational model and constraints that must hold in relational databases.

RuleML2015: Explanation of proofs of regulatory (non-)complianceusing semanti...

RuleML

With recent regulatory advances, modern enterprises have to not only comply with regulations but have to be prepared to provide explanation of proof of (non-)compliance. On top of compliance checking, this necessitates modeling concepts from regulations and enterprise operations so that stakeholder-specific and close to natural language explanations could be generated. We take a step in this direction by using Semantics of Business Vocabulary and Rules to model and map vocabularies of regulations and operations of enterprise. Using these vocabularies and leveraging proof generation abilities of an existing compliance engine, we show how such explanations can be created. Basic natural language explanations that we generate can be easily enriched by adding requisite domain knowledge to the vocabularies.

RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...

RuleML

Rulelog is in process of industry standardization via RuleML and W3C: RIF-Rulelog specification, version of of May 24, 2013, Michael Kifer, ed. RIF-Rulelog is a powerful dialect of W3C Rule Interchange Format (RIF) that is in draft as a submission from RuleML to W3C. Several industry standards in the areas are based heavily on our team’s contributions to the authoring/editing of the specifications and conducting the underlying research and earlier-phase standards design. These include most notably the two most important industry standards on rules knowledge: W3C Rule Interchange Format (RIF), which is primarily based on the RuleML standards design (semantic web rules) W3C OWL 2 RL Profile (rule-based web ontologies) The team has also contributed to the development of W3C SPARQL and ISO Common Logic, and been strongly involved in other related standardization efforts at OMG and Oasis.

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text

Association for Computational Linguistics

This paper contributes a noun phrase-annotated SMS corpus and proposes a weak semi-Markov CRF model for noun phrase chunking in informal text. The weak semi-CRF model improves training speed over linear-CRF and semi-CRF models while maintaining similar accuracy. Experiments on the SMS corpus show the weak semi-CRF achieves F1 scores comparable to other models but trains faster, especially with larger training data sizes.

Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...

Association for Computational Linguistics

This document presents a new method for automatically detecting false friends between Spanish and Portuguese using word embeddings. The method builds word vector spaces for each language using word2vec, finds a linear transformation between the spaces, and measures vector distances to classify word pairs as cognates or false friends. In experiments on a dataset of 710 word pairs, the method achieved state-of-the-art accuracy of 77.28% and high coverage of 97.91%, outperforming previous work. Future work will explore using different word embeddings and fine-grained classifications of partial false friends.

This document describes a Spanish language corpus for humor analysis that was created through crowd-sourcing annotations. Over 27,000 tweets were collected from humorous accounts and annotated through a web interface. The corpus contains over 100,000 annotations of the tweets' humor and funniness. Inter-annotator agreement was higher for this corpus than a previous Spanish humor corpus. The dataset will help analyze subjectivity in humor and was used in a shared task on humor classification and funniness prediction.

Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...

Association for Computational Linguistics

This document discusses position bias in instructor interventions in MOOC discussion forums. It finds that instructors are more likely to intervene in threads that appear higher on the discussion forum user interface due to their recent activity. To address this, it proposes a debiased classifier that weights examples based on their propensity for intervention. It finds this approach identifies intervention opportunities that were overlooked due to position bias. The debiased classifier outperforms a standard classifier on several metrics, demonstrating it can better predict unbiased intervention needs.

Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions

Association for Computational Linguistics

The document summarizes the history and current state of the ACL Anthology, a repository of publications from ACL-sponsored conferences. It discusses how the Anthology was established in 2001 and is now maintained by volunteers, containing over 45,000 papers. The presentation calls for community involvement to help future-proof the Anthology through efforts like migrating its infrastructure and improving documentation. It also proposes hosting the Anthology on the main ACL website and recruiting a new editor.

Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification

Association for Computational Linguistics

The document presents SAMSA, a new automatic evaluation measure for structural text simplification. SAMSA uses semantic parsing to measure the preservation of semantic structures and relations between an original text and its simplified version. It correlates significantly better with human judgments of meaning preservation and structural simplicity than prior reference-based metrics. SAMSA is the first evaluation method designed specifically for structural simplification operations like sentence splitting.

Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions

Association for Computational Linguistics

Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...

Association for Computational Linguistics

(1) Sequicity is a framework that simplifies task-oriented dialogue systems using single sequence-to-sequence architectures. (2) It formalizes dialogues as sequences of belief spans and responses and decodes them in two stages: generating a belief span followed by a response. (3) An experiment on two datasets found that a two-stage CopyNet instantiation of Sequicity outperformed several baselines in effectiveness, efficiency and handling out-of-vocabulary requests.

Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...

Association for Computational Linguistics

The document summarizes a study that explored how people's strategies for giving commands to a robot change over time during a collaborative navigation task. Ten participants each directed a robot for one hour via dialogue. Initially, participants predominantly used metric units like distances in their commands, but over time their commands increasingly referred to environmental landmarks. The study collected audio, text, and robot data to analyze parameters in commands. Future work aims to automate dialogue response generation based on this data.

Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...

Association for Computational Linguistics

The document describes a system for estimating emotion intensity in tweets. It takes a lexicon-based and word vector-based approach to create sentence embeddings for tweets. Various regression models are trained and an ensemble is used to predict emotion intensity scores between 0-1 for anger, sadness, joy and fear. The system achieved third place in predicting emotion intensity and second place for intensities over 0.5. Future work involves using contextual sentence embeddings to improve predictions.

Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop

Association for Computational Linguistics

This document describes Toshiba's machine translation system submitted to the WAT2015 workshop. It discusses using statistical post-editing (SPE) to improve rule-based machine translation (RBMT) output, as well as combining SPE and SMT systems using reranking with recurrent neural network language models. Experimental results show that the combined system achieved the best BLEU and RIBES scores compared to the individual SPE and SMT systems on several language pairs, including Japanese-English and Chinese-Japanese. However, human evaluation correlations were not entirely clear.

Chenchen Ding - 2015 - NICT at WAT 2015

Association for Computational Linguistics

John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...

Association for Computational Linguistics

The document describes improvements made to the KyotoEBMT machine translation system. It discusses using forest parsing of input sentences to handle parsing errors and syntactic divergences. It also describes using the Nile alignment tool along with constituent parsing to improve word alignments from the training corpus. New features were added and the reranking was improved by incorporating a neural machine translation-based bilingual language model.

John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...

Association for Computational Linguistics

El documento describe el sistema de traducción basado en ejemplos KyotoEBMT. El sistema utiliza análisis de dependencia tanto del idioma origen como del idioma destino y puede manejar ambigüedades en las hipótesis de traducción mediante el uso de reglas de rejilla. Los resultados oficiales del WAT2015 muestran mejoras en las métricas BLEU y RIBES con la reranqueación de traducciones, aunque la reranqueación empeora la evaluación humana para la dirección de traducción japonés-chino. El sistema Ky

Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...

Association for Computational Linguistics

This document evaluates several neural machine translation models for English to Japanese translation. It finds that simple neural models outperform statistical machine translation baselines. Soft attention models with LSTM units performed best. However, training these models on pre-reordered data hurt performance. The neural models tended to produce grammatically correct but incomplete translations by omitting information. Replacing unknown words helped some models but more sophisticated solutions are needed for models trained on natural order data.

Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...

Association for Computational Linguistics

This document evaluates various neural machine translation models for English to Japanese translation. It compares different network architectures, recurrent units, and training data configurations. Results show that soft-attention models outperformed multi-layer encoder-decoder models, and training on pre-reordered data hurt performance. Neural machine translation models tended to generate grammatically correct but incomplete translations.

Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015

Association for Computational Linguistics

This document describes NAVER's machine translation systems for the WAT 2015 evaluation. For English-to-Japanese translation, the best system combined tree-to-string syntax-based machine translation with neural machine translation re-ranking, achieving a BLEU score of 34.60. For Korean-to-Japanese translation, the top system used phrase-based machine translation and neural machine translation re-ranking, obtaining a BLEU score of 71.38. The document also analyzes the effectiveness of character-level tokenization and other techniques for neural machine translation.

Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop

Association for Computational Linguistics

Toshiba presented their machine translation system for the WAT2015 workshop. Their system uses statistical post-editing (SPE) to correct rule-based machine translation (RBMT) output. It also combines SPE and phrase-based statistical machine translation (SMT) results by reranking the merged n-best lists using a recurrent neural network language model. Evaluation showed the combined system achieved the best results on most language pairs compared to SPE and SMT individually. Analysis of system selections by the combination found it primarily chose translations from SPE.

Chenchen Ding - 2015 - NICT at WAT 2015

Association for Computational Linguistics

Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...

Association for Computational Linguistics

Neural reranking of machine translation output improves both automatic metrics and subjective human evaluations of translation quality. The document analyzes reranking results from a statistical machine translation system using an attentional neural machine translation model. Reranking corrected errors related to reordering, insertion, deletion, substitution and conjugation. Specifically, it improved phrasal reordering, auxiliary verb insertion/deletion, and coordinate structures. The gains were mainly in grammatical aspects rather than lexical selection. While reranking is shown to be effective, questions remain about comparing it to pure neural machine translation and neural language models.

Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...

Association for Computational Linguistics

This document discusses using neural reranking to improve the subjective quality of machine translation. It finds that reranking N-best lists generated by a baseline machine translation system using neural models leads to improvements in both automatic metrics like BLEU and manual evaluations of translation quality. A qualitative analysis shows that reranking most improves reordering, insertion, and conjugation errors while having less success with terminology. The analysis suggests neural reranking is an effective technique for machine translation enhancement.

Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...

Association for Computational Linguistics

Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...

Diana Rendina

Librarians are leading the way in creating future-ready citizens – now we need to update our spaces to match. In this session, attendees will get inspiration for transforming their library spaces. You’ll learn how to survey students and patrons, create a focus group, and use design thinking to brainstorm ideas for your space. We’ll discuss budget friendly ways to change your space as well as how to find funding. No matter where you’re at, you’ll find ideas for reimagining your space in this session.

ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf

Priyankaranawat4

Recently uploaded

Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...

Diana Rendina

ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf

Priyankaranawat4

Walmart Business+ and Spark Good for Nonprofits.pdf

TechSoup

"Learn about all the ways Walmart supports nonprofit organizations. You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money. The webinar may also give some examples on how nonprofits can best leverage Walmart Business+. The event will cover the following:: Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping. Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders. Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates. Answers about how you can do more with Walmart!"

What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...

GeorgeMilliken2

Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...

National Information Standards Organization (NISO)

spot a liar (Haiqa 146).pptx Technical writhing and presentation skills

haiqairshad

Liberal Approach to the Study of Indian Politics.pdf

WaniBasim

Cognitive Development Adolescence Psychology

paigestewart1632

Digital Artefact 1 - Tiny Home Environmental Design

amberjdewit93

How to Manage Your Lost Opportunities in Odoo 17 CRM

Celine George

writing about opinions about Australia the movie

Nicholas Montgomery

How to Fix the Import Error in the Odoo 17

Celine George

Advanced Java[Extra Concepts, Not Difficult].docx

adhitya5119

Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx

siemaillard

South African Journal of Science: Writing with integrity workshop (2024)

Academy of Science of South Africa

clinical examination of hip joint (1).pdf

Priyankaranawat4

RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students

Himanshu Rai

Film vocab for eal 3 students: Australia the movie

Nicholas Montgomery

BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...

Nguyen Thanh Tu Collection

Your Skill Boost Masterclass: Strategies for Effective Upskilling

Excellence Foundation for South Sudan

Recently uploaded (20)

Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...

ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf

Walmart Business+ and Spark Good for Nonprofits.pdf

What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...

Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...

spot a liar (Haiqa 146).pptx Technical writhing and presentation skills

Liberal Approach to the Study of Indian Politics.pdf

Cognitive Development Adolescence Psychology

Digital Artefact 1 - Tiny Home Environmental Design

How to Manage Your Lost Opportunities in Odoo 17 CRM

writing about opinions about Australia the movie

How to Fix the Import Error in the Odoo 17

Advanced Java[Extra Concepts, Not Difficult].docx

Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx

South African Journal of Science: Writing with integrity workshop (2024)

clinical examination of hip joint (1).pdf

RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students

Film vocab for eal 3 students: Australia the movie

BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...

Your Skill Boost Masterclass: Strategies for Effective Upskilling

Katsuhito Sudoh - 2015 Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2015

1. Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2015 Katsuhito Sudoh and Masaaki Nagata, NTT Communication Science Laboratories, Japan Overview Rule-based pre-ordering SMT setup Results Conclusion Data-driven pre-ordering Syntactic Analysis Dependency-based pre-ordering for Zh-Ja MT - Patent-adapted in-house dependency parser - Two-types of pre-ordering: * Rule-based, Head Final Chinese (Han+ 2012) * Data-driven, Learning to Rank (Yang+ 2012) - Rule-based system is better, comparable to T2S References: Han, Dan et al., Head Finalization Reordering for Chinese-to-Japanese MT, Proc. SSST-6 (2012) Hoshino, Sho et al., Discriminative Preordering Meets Kendall’s tau Maxi- mization, Proc. ACL (2015) Isozaki, Hideki et al, HPSG-Based Preprocessing for English-to-Japanese Translation, ACM TALIP No.11 Vol.3 (2012) Suzuki, Jun et al, An Empirical Study of Semi-supervised Structured Con- ditional Models for Depenedency Parsing, Proc. EMNLP (2009) Suzuki, Jun et al., 拡張ラグランジュ緩和を用いた同時自然言語解析法, Proc. NLP (2012) [in Japanese] Yang, Nan et. al., A Ranking-based Approach to Word Reordering for SMT, Proc. ACL (2012) Standard Moses Phrase-based MT - MGIZA word alignment, g-d-f-a symal - Kneser-Ney phrase-table score smoothing - Word 5-gram LM with Kneser-Ney smoothing - Distortion limit: 9 (chosen over 0,3,6,9) - Weights chosen over 5 indep. MERT runs Comparable to T2S baseline Rule-based is better than data-driven Pre-ordering is a deterministic approx. of T2S --- good in efficiency with some loss in accuracy > forest-based pre-ordering, pre-ordering lattice Rule-based pre-ordering works robustly --- due to head-final nature in Japanese Data-driven pre-ordering is still challenging... --- difficulty in word alignment, non-parallelism --- constituent or dependency structures? Remained patent MT issues: - Context awareness (consistency) - Domain awareness (lexical choice) Reordering into head-final order in Japanese (En-Ja: Isozaki+ 2012, Zh-Ja: Han+ 2012) Base rule: Moving a head word after its modifiers Exceptions (placed after their head words): AS (aspect particle), SP (sentence-final particle) PU (punctuation), CC (coordinating conjunction) IJ (interjection), “不”(negation),“等”(”etc.”) Reordering by reranking a head & its modifiers (Yang+ 2012) - Implemented with Ranking SVM * Features: - surface/POS (head & modifier) - head surface/POS (h & m) - modifier surface/POS (head) - span surfaces/POSs (modifier) - relative position (h & m) * Reordering oracles are determined by maxi- mizing Kendall’s tau criterion (Hoshino+ 2015) Pros: stability, domain independence (?) Cons: effort for rule management Pros: no special effort, target adaptability Cons: instability, noisy auto. word alignment [Word segmentation & POS tagging] - Joint sequential labeling (Suzuki+ 2012) [Dependency parsing (untyped)] - Second-order graph-based parsing [Semi-supervised learning] (Suzuki+ 2009) - Labeled: 31K sents. (news), 35K sents. (patents) - Unlabeled: 9GB (news), 100GB (patents) Accuracy (F0 / UAS) Word seg. POS Dep. 0.927 0.855 0.927 Table 1: Performance in Chinese syntactic analysis Table 2: Official evaluation results 我不看电视 PN 我 PNAD 不 ADVV 看 VVNN 电视 NN 我不看电视 PN 我 PNAD 不 ADVV 看 VVNN 电视 NN Head Finalization Exception 1 4 3 2 BL PBMT Rule-based Data-driven BL T2S Human n/a 20.75 16.25 8.00 0.781 0.814 0.822 0.812 0.382 0.394 0.406 0.399 BLEURIBES

Katsuhito Sudoh - 2015 Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2015

Recommended

Recommended

More Related Content

More from Association for Computational Linguistics

More from Association for Computational Linguistics (20)

Recently uploaded

Recently uploaded (20)

Katsuhito Sudoh - 2015 Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2015