Abstract of Aaron Han’s Presentation
The main topic of this presentation will be the “evaluation of machine translation”. With the rapid development of machine translation (MT), the MT evaluation becomes more and more important to tell whether they make some progresses. The traditional human judgments are very time-consuming and expensive. On the other hand, there are some weaknesses in the existing automatic MT evaluation metrics:
– perform well in certain language pairs but weak on others, which we call the language-bias problem;
– consider no linguistic information (leading the metrics result in low correlation with human judgments) or too many linguistic features (difficult in replicability), which we call the extremism problem;
– design incomprehensive factors (e.g. precision only).
To address the existing problems, he has developed several automatic evaluation metrics:
– Design tunable parameters to address the language-bias problem;
– Use concise linguistic features for the linguistic extremism problem;
– Design augmented factors.
The experiments on ACL-WMT corpora show the proposed metrics yield higher correlation with human judgments. The proposed metrics have been published on international top conferences, e.g. COLING and MT SUMMIT. Actually speaking, the evaluation works are very related to the similarity measuring. So these works can be further developed into other literature, such as information retrieval, question and answering, searching, etc.
A brief introduction about some of his other researches will also be mentioned, such as Chinese named entity recognition, word segmentation, and multilingual treebanks, which have been published on Springer LNCS and LNAI series. Precious suggestions and comments are much appreciated. The opportunities of further corporation will be more exciting.
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim
Review of paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
ArXiv link: https://arxiv.org/abs/1810.04805
YouTube Presentation: https://youtu.be/GK4IO3qOnLc
(Slides are written in English, but the presentation is done in Korean)
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...Lifeng (Aaron) Han
Presentation PPT in MT SUMMIT 2013.
Language-independent Model for Machine Translation Evaluation with Reinforced Factors
International Association for Machine Translation2013
Authors: Aaron Li-Feng Han, Derek Wong, Lidia S. Chao, Yervant Ho, Yi Lu, Anson Xing, Samuel Zeng
Proceedings of the 14th biennial International Conference of Machine Translation Summit (MT Summit 2013). Nice, France. 2 - 6 September 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor (Machine Translation Archive)
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim
Review of paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
ArXiv link: https://arxiv.org/abs/1810.04805
YouTube Presentation: https://youtu.be/GK4IO3qOnLc
(Slides are written in English, but the presentation is done in Korean)
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...Lifeng (Aaron) Han
Presentation PPT in MT SUMMIT 2013.
Language-independent Model for Machine Translation Evaluation with Reinforced Factors
International Association for Machine Translation2013
Authors: Aaron Li-Feng Han, Derek Wong, Lidia S. Chao, Yervant Ho, Yi Lu, Anson Xing, Samuel Zeng
Proceedings of the 14th biennial International Conference of Machine Translation Summit (MT Summit 2013). Nice, France. 2 - 6 September 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor (Machine Translation Archive)
Presentation of "Challenges in transfer learning in NLP" from Madrid Natural Language Processing Meetup Event, May, 2019.
https://www.meetup.com/es-ES/Madrid-Natural-Language-Processing-meetup/
Practical related work in repository: https://github.com/laraolmos/madrid-nlp-meetup
Over the last two years, the field of Natural Language Processing (NLP) has witnessed the emergence of transfer learning methods and architectures which significantly improved upon the state-of-the-art on pretty much every NLP tasks.
The wide availability and ease of integration of these transfer learning models are strong indicators that these methods will become a common tool in the NLP landscape as well as a major research direction.
In this talk, I'll present a quick overview of modern transfer learning methods in NLP and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks, focusing on open-source solutions.
Website: https://fwdays.com/event/data-science-fwdays-2019/review/transfer-learning-in-nlp
TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...GeekPwn Keen
Youtube: https://www.youtube.com/watch?v=ofPPObIXdpI
The application of deep recurrent networks to audio transcription has led to impressive gains in automatic speech recognition (ASR) systems. Many have demonstrated that small adversarial perturbations can fool deep neural networks into incorrectly predicting a specified target with high confidence. Current work on fooling ASR systems have focused on white-box attacks, in which the model architecture and parameters are known. In this paper, we adopt a black-box approach to adversarial generation, combining the approaches of both genetic algorithms and gradient estimation to solve the task. We achieve a 89.25% targeted attack similarity after 3000 generations while maintaining 94.6% audio file similarity.
Rohan Taori(Tweet@rtaori13) is an undergrade at UC Berkeley studying EECS with an interest in machine learning and AI. He heads the educational division at Machine Learning at Berkeley and is also a researcher at BAIR (Berkeley AI Research).
Amog Kamsetty is an undergraduate studying EECS at UC Berkeley, with an interest in both machine learning and systems. He is involved with Machine Learning @ Berkeley and is currently pursuing research at UC Berkeley RISE Lab.
[slide] A Compare-Aggregate Model with Latent Clustering for Answer SelectionSeoul National University
CIKM 2019
In this paper, we propose a novel method for a sentence-level answer-selection task that is one of the fundamental problems in natural language processing. First, we explore the effect of additional information by adopting a pretrained language model to compute the vector representation of the input text and by applying transfer learning from a large-scale corpus. Second, we enhance the compare-aggregate model by proposing a novel latent clustering method to compute additional information within the target corpus and by changing the objective function from listwise to pointwise. To evaluate the performance of the proposed approaches, experiments are performed with the WikiQA and TRECQA datasets. The empirical results demonstrate the superiority of our proposed approach, which achieve state-of-the-art performance on both datasets.
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
Machine translation (MT) was developed as one of the hottest research topics in the natural language processing (NLP) literature. One important issue in MT is that how to evaluate the MT system reasonably and tell us whether the translation system makes an improvement or not. The traditional manual judgment methods are expensive, time-consuming, unrepeatable, and sometimes with low agreement. On the other hand, the popular automatic MT evaluation methods have some weaknesses. Firstly, they tend to perform well on the language pairs with English as the target language, but weak when English is used as source. Secondly, some methods rely on many additional linguistic features to achieve good performance, which makes the metric unable to replicateand apply to other language pairs easily. Thirdly, some popular metrics utilize incomprehensive factors, which result in low performance on some practical tasks.
In this thesis, to address the existing problems, we design novel MT evaluation methods and investigate their performances on different languages. Firstly, we design augmented factors to yield highly accurate evaluation.Secondly, we design a tunable evaluation model where weighting of factors can be optimized according to the characteristics of languages. Thirdly, in the enhanced version of our methods, we design concise linguistic feature using POS to show that our methods can yield even higher performance when using some external linguistic resources. Finally, we introduce the practical performance of our metrics in the ACL-WMT workshop shared tasks, which show that the proposed methods are robust across different languages.
The NLP muppets revolution! @ Data Science London 2019
video: https://skillsmatter.com/skillscasts/13940-a-deep-dive-into-contextual-word-embeddings-and-understanding-what-nlp-models-learn
event: https://www.meetup.com/Data-Science-London/events/261483332/
In this presentation we discuss several concepts that include Word Representation using SVD as well as neural networks based techniques. In addition we also cover core concepts such as cosine similarity, atomic and distributed representations.
While academic research is more and more focusing on integration of deep learning approaches for machine translation, also called Neural Machine Translation, and shows promising and exciting results – the resulting systems still have important pragmatic limitations compared to the current generation of translation engine. We will be discussing how SYSTRAN is integrating these new techniques into production systems, the results and benefits for the end users, and our vision for the next versions.
TSD2013.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFORMATIONLifeng (Aaron) Han
Proceedings of the 16th International Conference of Text, Speech and Dialogue (TSD 2013). Plzen, Czech Republic, September 2013. LNAI Vol. 8082, pp. 121-128. Volume Editors: I. Habernal and V. Matousek. Springer-Verlag Berlin Heidelberg 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor
Presentation of "Challenges in transfer learning in NLP" from Madrid Natural Language Processing Meetup Event, May, 2019.
https://www.meetup.com/es-ES/Madrid-Natural-Language-Processing-meetup/
Practical related work in repository: https://github.com/laraolmos/madrid-nlp-meetup
Over the last two years, the field of Natural Language Processing (NLP) has witnessed the emergence of transfer learning methods and architectures which significantly improved upon the state-of-the-art on pretty much every NLP tasks.
The wide availability and ease of integration of these transfer learning models are strong indicators that these methods will become a common tool in the NLP landscape as well as a major research direction.
In this talk, I'll present a quick overview of modern transfer learning methods in NLP and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks, focusing on open-source solutions.
Website: https://fwdays.com/event/data-science-fwdays-2019/review/transfer-learning-in-nlp
TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...GeekPwn Keen
Youtube: https://www.youtube.com/watch?v=ofPPObIXdpI
The application of deep recurrent networks to audio transcription has led to impressive gains in automatic speech recognition (ASR) systems. Many have demonstrated that small adversarial perturbations can fool deep neural networks into incorrectly predicting a specified target with high confidence. Current work on fooling ASR systems have focused on white-box attacks, in which the model architecture and parameters are known. In this paper, we adopt a black-box approach to adversarial generation, combining the approaches of both genetic algorithms and gradient estimation to solve the task. We achieve a 89.25% targeted attack similarity after 3000 generations while maintaining 94.6% audio file similarity.
Rohan Taori(Tweet@rtaori13) is an undergrade at UC Berkeley studying EECS with an interest in machine learning and AI. He heads the educational division at Machine Learning at Berkeley and is also a researcher at BAIR (Berkeley AI Research).
Amog Kamsetty is an undergraduate studying EECS at UC Berkeley, with an interest in both machine learning and systems. He is involved with Machine Learning @ Berkeley and is currently pursuing research at UC Berkeley RISE Lab.
[slide] A Compare-Aggregate Model with Latent Clustering for Answer SelectionSeoul National University
CIKM 2019
In this paper, we propose a novel method for a sentence-level answer-selection task that is one of the fundamental problems in natural language processing. First, we explore the effect of additional information by adopting a pretrained language model to compute the vector representation of the input text and by applying transfer learning from a large-scale corpus. Second, we enhance the compare-aggregate model by proposing a novel latent clustering method to compute additional information within the target corpus and by changing the objective function from listwise to pointwise. To evaluate the performance of the proposed approaches, experiments are performed with the WikiQA and TRECQA datasets. The empirical results demonstrate the superiority of our proposed approach, which achieve state-of-the-art performance on both datasets.
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
Machine translation (MT) was developed as one of the hottest research topics in the natural language processing (NLP) literature. One important issue in MT is that how to evaluate the MT system reasonably and tell us whether the translation system makes an improvement or not. The traditional manual judgment methods are expensive, time-consuming, unrepeatable, and sometimes with low agreement. On the other hand, the popular automatic MT evaluation methods have some weaknesses. Firstly, they tend to perform well on the language pairs with English as the target language, but weak when English is used as source. Secondly, some methods rely on many additional linguistic features to achieve good performance, which makes the metric unable to replicateand apply to other language pairs easily. Thirdly, some popular metrics utilize incomprehensive factors, which result in low performance on some practical tasks.
In this thesis, to address the existing problems, we design novel MT evaluation methods and investigate their performances on different languages. Firstly, we design augmented factors to yield highly accurate evaluation.Secondly, we design a tunable evaluation model where weighting of factors can be optimized according to the characteristics of languages. Thirdly, in the enhanced version of our methods, we design concise linguistic feature using POS to show that our methods can yield even higher performance when using some external linguistic resources. Finally, we introduce the practical performance of our metrics in the ACL-WMT workshop shared tasks, which show that the proposed methods are robust across different languages.
The NLP muppets revolution! @ Data Science London 2019
video: https://skillsmatter.com/skillscasts/13940-a-deep-dive-into-contextual-word-embeddings-and-understanding-what-nlp-models-learn
event: https://www.meetup.com/Data-Science-London/events/261483332/
In this presentation we discuss several concepts that include Word Representation using SVD as well as neural networks based techniques. In addition we also cover core concepts such as cosine similarity, atomic and distributed representations.
While academic research is more and more focusing on integration of deep learning approaches for machine translation, also called Neural Machine Translation, and shows promising and exciting results – the resulting systems still have important pragmatic limitations compared to the current generation of translation engine. We will be discussing how SYSTRAN is integrating these new techniques into production systems, the results and benefits for the end users, and our vision for the next versions.
TSD2013.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFORMATIONLifeng (Aaron) Han
Proceedings of the 16th International Conference of Text, Speech and Dialogue (TSD 2013). Plzen, Czech Republic, September 2013. LNAI Vol. 8082, pp. 121-128. Volume Editors: I. Habernal and V. Matousek. Springer-Verlag Berlin Heidelberg 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor
Becoming a Tech-Savvy Translator and Interpreter in the Digital AgeBrauerTraining .com
I believe that learning technology is equivalent to learning another language. Technology in itself is a whole separate language that we need to learn in order to perform in the digital age.
Lets suppose we are language interpreters in the language combination English to French. If we were to become ASL interpreters in that language combination, we would first have to fully learn American Sing Language. But that is not enough, because we would need to learn the difference with the French Sign Language. Same with Technology. We need to learn the skills as if we were learning ASL plus FSL techniques. We need not only learn about the technology but, more important than that, we need to PRACTICE with it to acquire the skills needed to WORK with technology. That takes time and money and we need to be ready and available to make that investment. Technology is no longer an option, it is a requirement of the Digital Age, at least in the world of business.
In the past 20 years, the world became interconnected, creating the need to deliver content in multiple languages at all points of contact. Digital technologies caused tectonic changes in the language services industry, impacting translators and interpreters, who now need to revamp their knowledge/abilities to remain relevant in the Digital Age. They need to “upgrade” their skills and become tech savvy.
There is a need for change. Mostly a change in understanding and subsequent behavior, which are the most difficult of changes. Behavior on the part of translators and interpreters in regards to the future of the industry.
Translators and interpreters need to start investing time and money to “update their skills” and so become an integral part of this evolving industry. We have been severed from the most important conversations about our own future. Many of us are afraid of the new technologies because there is yet no clear answer to the question “what’s in it for me?”. We need to become part of the equation going forward. If translators and interpreters do not learn –quickly and swiftly– to use 21st century technologies, we may not survive as a viable profession.
Becoming a tech-savvy translator and interpreter is the most efficient way to tap into a short-term opportunity to transform current knowledge and experience into useful and valuable skills that may help fuel a new generation of translators and interpreters that respond to the new challenges faced by the Digital Age.
Many translators and interpreters have lost sight of the changes occurred in the “means of production” of the goods and services we deliver. In a world of increased competition and decreasing margins of profit, translators and interpreters need to understand the investments (in time AND money) they need to make in software, training and processes to catch up to the demand for multilingual content, “immediately”.
Translators and interpreters need to stop being suspicious of innovations in
Collection evaluation techniques for academic libraries ALISS
Sally Halper, Lead Content Specialist - Business & Management, British Library. An excellent introduction to some really good practical qualitative and quantitative tools including White's brief tests. A bibliography of further readings is also provided.
Subject: English 18
Translation and Editing Text
Topic: Techniques in Translation
Techniques in Translation
1. Computer assisted
2. Machine translation
3. Subtitling
4. editing/Post editing
1. COMPUTER-ASSISTED
Computer-assisted translations also called 'computer-aided translation or machine-aided human translation. It is a form of translation wherein human translator creates a target text with the assistance of a computer program. The machine supports a human translator.
What is Computer Aided Translation?
Computer aided translation (also called computer assisted translation) is a system in which a human translator uses a computer in the translation process.
Humans and computers each have their strengths and weaknesses. The idea of computer aided translation (CAT) software is to make the most of the strengths of people and computers.
Translation performed solely by computers ("machine translation") has very poor quality. Meanwhile, no human can translate as fast as a computer can. By using a CAT tool, however, you can gain some of the speed, consistency, and memory benefits of the computer, without sacrificing the high quality of human translation.
Translation Skills: Theory and practice
The theoretical base should include general information regarding the translator's workshop and the issues one should be familiar with.
*Internet
It is worth discussing is the role of the internet as a source of information. It is important to use the translations which have been on the market for some time and are recognized by other people. This is where the internet becomes very useful for it allows us to search forgiven information (google.com, yahoo.com, altavista.com, etc.), use online dictionaries and corpora, or compare different language versions of the same site (Wikipedia the Free Encyclopedia and the ability to switch from different languages defining a given notion-www.wikipedia.org). Google itself is a powerful tool since it allows us not only to search for information on webpages but also it indexes*.doc and *pdf files stored on servers, allowing us to browse through their contents in search for a context.
*Software
A successful translator needs to know how to handle various computer applications in his/her work. That's why basic software used to compress and decompress files should be mentioned (WinZip, WinRAR). PDF and multimedia files readers (images, audio). Last, the use of different word processors, are usually the first application that leads people using a computer for their work. This comprises of spell checking, standard layouts, ability to have some characters appear in bold print, italics, or underlined. We can save documents, so it can be used again, and we can print the documents.
It is important to mention CAT tool, how the
Assessment of student learning must be directly connected to the learning objectives of your course. You should make these connections clear to students in your syllabus.
What makes Japanese companies more progressive than others? It actually lies in their employee centered way of management and utmost dedication to Quality.
Machine translation is an easy tool for translating text from one language to another. You've probably used it. But do you know what machine translation really is? Or when you should or shouldn't use it? Navigate through this presentation to learn more!
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...Lifeng (Aaron) Han
Publisher: Springer-Verlag Berlin Heidelberg 20132013
Authors: Aaron Li-Feng Han, Derek F. Wong, Lidia S. Chao, Yervant Ho
Proceedings of the 16th International Conference of Text, Speech and Dialogue (TSD 2013). Plzen, Czech Republic, September 2013. LNAI Vol. 8082, pp. 121-128. Volume Editors: I. Habernal and V. Matousek. Springer-Verlag Berlin Heidelberg 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...Lifeng (Aaron) Han
LP&IIS 2013 Presentation PPT. Authors: Aaron Li-Feng Han, Derek Fai Wong and Lidia Sam Chao
In Proceeding of International Conference of Language Processing and Intelligent Information Systems. M.A. Klopotek et al. (Eds.): IIS 2013, LNCS Vol. 7912, pp. 57–68, 17 - 18 June 2013, Warsaw, Poland. Springer-Verlag Berlin Heidelberg 2013
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...Lifeng (Aaron) Han
"LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors"
Publisher: Association for Computational LinguisticsDecember 2012
Authors: Aaron Li-Feng Han, Derek F. Wong and Lidia S. Chao
Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012): Posters, pages 441–450, Mumbai, December 2012. Open tool https://github.com/aaronlifenghan/aaron-project-lepor
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...Lifeng (Aaron) Han
Authors: Aaron Li-Feng Han, Derek Wong, Lidia S. Chao, Yervant Ho, Yi Lu, Anson Xing, Samuel Zeng
Proceedings of the 14th biennial International Conference of Machine Translation Summit (MT Summit 2013) pp. 215-222. Nice, France. 2 - 6 September 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor (Machine Translation Archive)
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Lifeng (Aaron) Han
Starting from 1950s, Machine Translation (MT) was challenged from different scientific solutions which included rule-based methods, example-based and statistical models (SMT), to hybrid models, and very recent years the neural models (NMT).
While NMT has achieved a huge quality improvement in comparison to conventional methodologies, by taking advantages of huge amount of parallel corpora available from internet and the recently developed super computational power support with an acceptable cost, it struggles to achieve real human parity in many domains and most language pairs, if not all of them.
Alongside the long road of MT research and development, quality evaluation metrics played very important roles in MT advancement and evolution.
In this tutorial, we overview the traditional human judgement criteria, automatic evaluation metrics, unsupervised quality estimation models, as well as the meta-evaluation of the evaluation methods. Among these, we will also cover the very recent work in the MT evaluation (MTE) fields taking advantages of large size of pre-trained language models for automatic metric customisation towards exactly deployed language pairs and domains. In addition, we also introduce the statistical confidence estimation regarding sample size needed for human evaluation in real practice simulation.
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
These are the slides for the technical briefing given at ICSE 2021, given by Alessio Ferrari, Liping Zhao, and Waad Alhoshan
It covers RE tasks to which NLP is applied, an overview of a recent systematic mapping study on the topic, and a hands-on tutorial on using transfer learning for requirements classification.
Please find the links to the colab notebooks here:
https://colab.research.google.com/drive/158H-lEJE1pc-xHc1ISBAKGDHMt_eg4Gn?usp=sharing
https://colab.research.google.com/d rive/1B_5ow3rvS0Qz1y-KyJtlMNnm gmx9w3kJ?usp=sharing
https://colab.research.google.com/d rive/1Xrm0gNaa41YwlM5g2CRYYX cRvpbDnTRT?usp=sharing
Machine translation from English to HindiRajat Jain
Machine translation a part of natural language processing.The algorithm suggested is word based algorithm.We have done Translation from English to Hindi
submitted by
Garvita Sharma,10103467,B3
Rajat Jain,10103571,B6
WMT2022 Biomedical MT PPT: Logrus Global and Uni ManchesterLifeng (Aaron) Han
Pre-trained language models (PLMs) often take
advantage of the monolingual and multilingual
dataset that is freely available online to acquire
general or mixed domain knowledge before
deployment into specific tasks. Extra-large
PLMs (xLPLMs) are proposed very recently
to claim supreme performances over smallersized PLMs such as in machine translation
(MT) tasks. These xLPLMs include Meta-AI’s
wmt21-dense-24-wide-en-X (2021) and NLLB
(2022). In this work, we examine if xLPLMs are
absolutely superior to smaller-sized PLMs in
fine-tuning toward domain-specific MTs. We
use two different in-domain data of different
sizes: commercial automotive in-house data
and clinical shared task data from the ClinSpEn2022 challenge at WMT2022. We choose
popular Marian Helsinki as smaller sized PLM
and two massive-sized Mega-Transformers
from Meta-AI as xLPLMs.
Our experimental investigation shows that 1)
on smaller sized in-domain commercial automotive data, xLPLM wmt21-dense-24-wideen-X indeed shows much better evaluation
scores using SACREBLEU and hLEPOR metrics than smaller-sized Marian, even though
its score increase rate is lower than Marian
after fine-tuning; 2) on relatively larger-size
well prepared clinical data fine-tuning, the
xLPLM NLLB tends to lose its advantage
over smaller-sized Marian on two sub-tasks
(clinical terms and ontology concepts) using
ClinSpEn offered metrics METEOR, COMET,
and ROUGE-L, and totally lost to Marian on
Task-1 (clinical cases) on all official metrics
including SACREBLEU and BLEU; 3) metrics do not always agree with each other
on the same tasks using the same model outputs; 4) clinic-Marian ranked No.2 on Task-1
(via SACREBLEU/BLEU) and Task-3 (via METEOR and ROUGE) among all submissions.
Measuring Uncertainty in Translation Quality Evaluation (TQE)Lifeng (Aaron) Han
From both human translators (HT) and machine translation (MT) researchers' point of view, translation quality evaluation (TQE) is an essential task. Translation service providers (TSPs) have to deliver large volumes of translations which meet customer specifications with harsh constraints of required quality level in tight time-frames and costs. MT researchers strive to make their models better, which also requires reliable quality evaluation. While automatic machine translation evaluation (MTE) metrics and quality estimation (QE) tools are widely available and easy to access, existing automated tools are not good enough, and human assessment from professional translators (HAPs) are often chosen as the golden standard \cite{han-etal-2021-TQA}.
Human evaluations, however, are often accused of having low reliability and agreement. Is this caused by subjectivity or statistics is at play? How to avoid the entire text to be checked and be more efficient with TQE from cost and efficiency perspectives, and what is the optimal sample size of the translated text, so as to reliably estimate the translation quality of the entire material? This work carries out such a motivated research to correctly estimate the confidence intervals \cite{Brown_etal2001Interval}
depending on the sample size of translated text, e.g. the amount of words or sentences, that needs to be processed on TQE workflow step for confident and reliable evaluation of overall translation quality.
The methodology we applied for this work is from Bernoulli Statistical Distribution Modeling (BSDM) and Monte Carlo Sampling Analysis (MCSA).
Reference: S Gladkoff, I Sorokina, L Han, A Alekseeva. 2022. Measuring Uncertainty in Translation Quality Evaluation (TQE). LREC2022. arXiv preprint arXiv:2111.07699
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...Lifeng (Aaron) Han
Traditional automatic evaluation metrics for machine translation have been widely criticized by linguists due to their low accuracy, lack of transparency, focus on language mechanics rather than semantics, and low agreement with human quality evaluation. Human evaluations in the form of MQM-like scorecards have always been carried out in real industry setting by both clients and translation service providers (TSPs). However, traditional human translation quality evaluations are costly to perform and go into great linguistic detail, raise issues as to
inter-rater reliability (IRR) and are not designed to measure quality of worse than premium quality translations.
In this work, we introduce \textbf{HOPE}, a task-oriented and \textit{\textbf{h}}uman-centric evaluation framework for machine translation output based \textit{\textbf{o}}n professional \textit{\textbf{p}}ost-\textit{\textbf{e}}diting annotations. It contains only a limited number of commonly occurring error types, and uses a scoring model with geometric progression of error penalty points (EPPs) reflecting error severity level to each translation unit.
The initial experimental work carried out on English-Russian language pair MT outputs on marketing content type of text from highly technical domain reveals that our evaluation framework is quite effective in reflecting the MT output quality regarding both overall system-level performance and segment-level transparency, and it increases the IRR for error type interpretation.
The approach has several key advantages, such as ability to measure and compare less than perfect MT output from different systems, ability to indicate human perception of quality, immediate estimation of the labor effort required to bring MT output to premium quality, low-cost and faster application, as well as higher IRR. Our experimental data is available at \url{https://github.com/lHan87/HOPE}
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...Lifeng (Aaron) Han
Traditional automatic evaluation metrics for machine translation have been widely criticized by linguists due to their low accuracy, lack of transparency, focus on language mechanics rather than semantics, and low agreement with human quality evaluation. Human evaluations in the form of MQM-like scorecards have always been carried out in real industry setting by both clients and translation service providers (TSPs). However, traditional human translation quality evaluations are costly to perform and go into great linguistic detail, raise issues as to inter-rater reliability (IRR) and are not designed to measure quality of worse than premium quality translations. In this work, we introduce HOPE, a task-oriented and human-centric evaluation framework for machine translation output based on professional post-editing annotations. It contains only a limited number of commonly occurring error types, and use a scoring model with geometric progression of error penalty points (EPPs) reflecting error severity level to each translation unit. The initial experimental work carried out on English-Russian language pair MT outputs on marketing content type of text from highly technical domain reveals that our evaluation framework is quite effective in reflecting the MT output quality regarding both overall system-level performance and segment-level transparency, and it increases the IRR for error type interpretation. The approach has several key advantages, such as ability to measure and compare less than perfect MT output from different systems, ability to indicate human perception of quality, immediate estimation of the labor effort required to bring MT output to premium quality, low-cost and faster application, as well as higher IRR. Our experimental data is available at \url{this https URL}.
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
Cite: Lifeng Han. 2021. Meta-evaluation of machine translation evaluation methods. In Metrics2021 Tutorial Track/type: Workshop on Informetric and Scientometric Research (SIG-MET), ASIS&T. October 23–24.
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
LPRC 2018: Limerick Postgraduate Research Conference
Lifeng Han and Shaohui Kuang. 2018. Apply Chinese radicals into neural machine translation: Deeper than character level. ArXiv pre-print https://arxiv.org/abs/1805.01565v1
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
ADAPT seminar series. June 2021
research papers @NoDaLiDa2021:the 23rd Nordic Conference on Computational Linguistics
& COLING20:MWE-LEX WS
Bonus takeaway:
AlphaMWE multilingual corpus
with MWEs
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longerLifeng (Aaron) Han
Build Moses Statistical Machine Translation system with Ubuntu
Tree to tree Machine Translation with Universal phrase tagset. https://github.com/aaronlifenghan/A-Universal-Phrase-Tagset
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Lifeng (Aaron) Han
ADAPT Centre & Detection of Verbal Multi-Word Expressions via Conditional Random Fields with Syntactic Dependency Features and Semantic Re-Ranking @ DLSS2017 Bilbao.
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...Lifeng (Aaron) Han
In this work, we present the construction of multilingual parallel corpora with annotation of multiword expressions (MWEs). MWEs include verbal MWEs (vMWEs) defined in the PARSEME shared task that have a verb as the head of the studied terms. The annotated vMWEs are also bilingually and multilingually aligned manually. The languages covered include English, Chinese, Polish, and German. Our original English corpus is taken from the PARSEME shared task in 2018. We performed machine translation of this source corpus followed by human post editing and annotation of target MWEs. Strict quality control was applied for error limitation, i.e., each MT output sentence received first manual post editing and annotation plus second manual quality rechecking. One of our findings during corpora preparation is that accurate translation of MWEs presents challenges to MT systems. To facilitate further MT research, we present a categorisation of the error types encountered by MT systems in performing MWE related translation. To acquire a broader view of MT issues, we selected four popular state-of-the-art MT models for comparisons namely: Microsoft Bing Translator, GoogleMT, Baidu Fanyi and DeepL MT. Because of the noise removal, translation post editing and MWE annotation by human professionals, we believe our AlphaMWE dataset will be an asset for cross-lingual and multilingual research, such as MT and information extraction. Our multilingual corpora are available as open access at github.com/poethan/AlphaMWE
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
Invited Presentation in NLP lab of Soochow University, about my NLP journey and ADAPT Centre. NLP part covers Machine Translation Evaluation, Quality Estimation, Multiword Expression Identification, Named Entity Recognition, Word Segmentation, Treebanks, Parsing.
A deep analysis of Multi-word Expression and Machine TranslationLifeng (Aaron) Han
A deep analysis of Multi-word Expression and Machine Translation. Faculty research open day. DCU, Dublin. 2019.
Including MWE identification, MT with radical, MTE.
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Lifeng (Aaron) Han
This is a short presentation for the Poster of WMT13 shared task: This paper is to introduce our participation in
the WMT13 shared tasks on Quality Estimation
for machine translation without using reference
translations. We submitted the results
for Task 1.1 (sentence-level quality estimation),
Task 1.2 (system selection) and Task 2
(word-level quality estimation). In Task 1.1,
we used an enhanced version of BLEU metric
without using reference translations to evaluate
the translation quality. In Task 1.2, we utilized
a probability model Naïve Bayes (NB) as
a classification algorithm with the features
borrowed from the traditional evaluation metrics.
In Task 2, to take the contextual information
into account, we employed a discriminative
undirected probabilistic graphical model
Conditional random field (CRF), in addition
to the NB algorithm. The training experiments
on the past WMT corpora showed that the designed
methods of this paper yielded promising
results especially the statistical models of
CRF and NB. The official results show that
our CRF model achieved the highest F-score
0.8297 in binary classification of Task 2.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
1. Aaron L.-F. Han
Natural Language Processing & Portuguese-Chinese Machine Translation
Laboratory
University of Macau, Macau S.A.R., China
2013.08 @ CUHK, Hong Kong
Email: hanlifengaaron AT gmail DOT com
Homepage: http://www.linkedin.com/in/aaronhan
2. The importance of machine translation (MT) evaluation
Automatic MT evaluation metrics introduction
1. Lexical similarity
2. Linguistic features
3. Metrics combination
Designed metric: LEPOR Series
1. Motivation
2. LEPOR Metrics Description
3. Performances on international ACL-WMT corpora
4. Publications and Open source tools
Other research interests and publications
3. • Eager communication with each other of different
nationalities
– Promote the translation technology
• Rapid development of Machine translation
– machine translation (MT) began as early as in the 1950s
(Weaver, 1955)
– big progress science the 1990s due to the development of
computers (storage capacity and computational power)
and the enlarged bilingual corpora (Marino et al. 2006)
4. • Some recent works of MT research:
– Och (2003) present MERT (Minimum Error Rate Training)
for log-linear SMT
– Su et al. (2009) use the Thematic Role Templates model to
improve the translation
– Xiong et al. (2011) employ the maximum-entropy model,
etc.
– The data-driven methods including example-based MT
(Carl and Way, 2003) and statistical MT (Koehn, 2010)
became main approaches in MT literature.
5. • How well the MT systems perform and whether they
make some progress?
• Difficulties of MT evaluation
– language variability results in no single correct translation
– the natural languages are highly ambiguous and different
languages do not always express the same content in the
same way (Arnold, 2003)
6. • Traditional manual evaluation criteria:
– intelligibility (measuring how understandable the
sentence is)
– fidelity (measuring how much information the translated
sentence retains as compared to the original) by the
Automatic Language Processing Advisory Committee
(ALPAC) around 1966 (Carroll, 1966)
– adequacy (similar as fidelity), fluency (whether the
sentence is well-formed and fluent) and comprehension
(improved intelligibility) by Defense Advanced Research
Projects Agency (DARPA) of US (White et al., 1994)
7. • Problems of manual evaluations :
– Time-consuming
– Expensive
– Unrepeatable
– Low agreement (Callison-Burch, et al., 2011)
9. • Precision-based
Bleu (Papineni et al., 2002 ACL)
• Recall-based
ROUGE(Lin, 2004 WAS)
• Precision and Recall
Meteor (Banerjee and Lavie, 2005 ACL)
10. • Word-order based
NKT_NSR(Isozaki et al., 2010EMNLP), Port (Chen
et al., 2012 ACL), ATEC (Wong et al., 2008AMTA)
• Word-alignment based
AER (Och and Ney, 2003 J.CL)
• Edit distance-based
WER(Su et al., 1992Coling), PER(Tillmann et al.,
1997 EUROSPEECH), TER (Snover et al., 2006
AMTA)
11. • Language model
LM-SVM (Gamon et al., 2005EAMT)
• Shallow parsing
GLEU (Mutton et al., 2007ACL), TerrorCat (Fishel
et al., 2012WMT)
• Semantic roles
Named entity, morphological, synonymy,
paraphrasing, discourse representation, etc.
12. • MTeRater-Plus (Parton et al., 2011WMT)
– Combine BLEU, TERp (Snover et al., 2009) and Meteor
(Banerjee and Lavie, 2005; Lavie and Denkowski, 2009)
• MPF & WMPBleu (Popovic, 2011WMT)
– Arithmetic mean of F score and BLEU score
• SIA (Liu and Gildea, 2006ACL)
– Combine the advantages of n-gram-based metrics and
loose-sequence-based metrics
13. • LEPOR: automatic machine translation evaluation
metric considering the: Length Penalty, Precision, n-
gram Position difference Penalty and Recall.
14. • Weaknesses in existing metrics:
– perform well on certain language pairs but weak on others,
which we call as the language-bias problem;
– consider no linguistic information (leading the metrics
result in low correlation with human judgments) or too
many linguistic features (difficult in replicability), which we
call as the extremism problem;
– present incomprehensive factors (e.g. BLEU focus on
precision only).
– What to do?
15. • to address some of the existing problems:
– Design tunable parameters to address the language-bias
problem;
– Use concise or optimized linguistic features for the
linguistic extremism problem;
– Design augmented factors.
24. • Example, employment of linguistic features:
Fig. 5. Example of n-gram POS alignment
Fig. 6. Example of NPD calculation
25. • Combination with linguistic features:
• ℎ𝐿𝐸𝑃𝑂𝑅𝑓𝑖𝑛𝑎𝑙 =
1
𝑤ℎ𝑤+𝑤ℎ𝑝
(𝑤ℎ𝑤ℎ𝐿𝐸𝑃𝑂𝑅 𝑤𝑜𝑟𝑑 +
𝑤ℎ𝑝ℎ𝐿𝐸𝑃𝑂𝑅 𝑃𝑂𝑆) (11)
• ℎ𝐿𝐸𝑃𝑂𝑅 𝑃𝑂𝑆 and ℎ𝐿𝐸𝑃𝑂𝑅 𝑤𝑜𝑟𝑑 use the same
algorithm on POS sequence and word sequence
respectively.
26. • When multi-references:
• Select the alignment that results in the minimum NPD
score.
Fig. 7. N-gram alignment when multi-references
27. • How reliable is the automatic metric?
• Evaluation criteria for evaluation metrics:
– Human judgments are the golden to approach, currently.
• Correlation with human judgments:
– System-level correlation
– Segment-level correlation
29. • Segment-level Kendall’s tau correlation:
• 𝜏 =
𝑛𝑢𝑚 𝑐𝑜𝑛𝑐𝑜𝑟𝑑𝑎𝑛𝑡 𝑝𝑎𝑖𝑟𝑠−𝑛𝑢𝑚 𝑑𝑖𝑠𝑐𝑜𝑟𝑑𝑎𝑛𝑡 𝑝𝑎𝑖𝑟𝑠
𝑡𝑜𝑡𝑎𝑙 𝑝𝑎𝑖𝑟𝑠
(14)
• The segment unit can be a single sentence or
fragment that contains several sentences.
36. • LEPOR: A Robust Evaluation Metric for Machine
Translation with Augmented Factors
– Aaron L.-F. Han, Derek F. Wong and Lidia S. Chao.
Proceedings of COLING 2012: Posters, pages 441–450,
Mumbai, India.
• Language-independent Model for Machine
Translation Evaluation with Reinforced Factors
– Aaron L.-F. Han, Derek Wong, Lidia S. Chao, Liangye He, Yi
Lu, Junwen Xing, Xiaodong Zeng. Proceedings of MT
Summit 2013. Nice, France.
37.
38.
39. • Language independent MT evaluation-LEPOR:
https://github.com/aaronlifenghan/aaron-project-lepor
• MT evaluation with linguistic features-hLEPOR:
https://github.com/aaronlifenghan/aaron-project-hlepor
• English-French Phrase tagset mapping and application in
unsupervised MT evaluation-HPPR:
https://github.com/aaronlifenghan/aaron-project-hppr
• Unsupervised English-Spanish MT evaluation-EBLEU:
https://github.com/aaronlifenghan/aaron-project-ebleu
• Projects Homepage: https://github.com/aaronlifenghan
40. • My research interests:
– Natural Language Processing
– Signal Processing
– Machine Learning
– Artificial Intelligence
– Pattern Recognition
• My past research works:
– Machine Translation Evaluation, Word Segmentation,
Entity Recognition, Multilingual Treebanks
41. • Other publications:
• A Description of Tunable Machine Translation Evaluation Systems in
WMT13 Metrics Task
– Aaron Li-Feng Han, Derek Wong, Lidia S. Chao, Yi Lu, Yervant
Ho, Yiming Wang, Zhou jiaji. Proceedings of the ACL 2013 EIGHTH
WORKSHOP ON STATISTICAL MACHINE TRANSLATION (ACL-WMT
2013), 8-9 August 2013. Sofia, Bulgaria.
– ACL-WMT13 METRICS TASK:
Our metrics are language independent
English-vs-other (French, Spanish, Czech, German, Russian)
Can perform on both system-level and segment-level
The official results show our metrics have advantages as compared to others.
42. • Quality Estimation for Machine Translation Using the Joint Method of
Evaluation Criteria and Statistical Modeling
– Aaron Li-Feng Han, Yi Lu, Derek F. Wong, Lidia S. Chao, Yervant
Ho, Anson Xing. Proceedings of the ACL 2013 EIGHTH WORKSHOP ON
STATISTICAL MACHINE TRANSLATION (ACL-WMT 2013), 8-9 August
2013. Sofia, Bulgaria.
– ACL-WMT13 QUALITY ESTIMATION TASK (no reference translation):
Task 1.1: sentence level EN-ES quality estimation
Task 1.2: system selection, EN-ES, EN-DE, new
Task 2: word-level QE, EN-ES, binary classification, multi-class classification, new
We design novel EN-ES POS tagset mapping and metric EBLEU in task 1.1.
We explore the Naïve Bayes and Support Vector Machine in task 1.2.
We achieve the highest F1 score in task 2 using Conditional Random Field.
43. Designed POS tagset mapping of Spanish (Tree tagger) to universal tagset
(Petrov et al., 2012)
49. • Phrase Tagset Mapping for French and English Treebanks and Its
Application in Machine Translation Evaluation
– Aaron Li-Feng Han, Derek F. Wong, Lidia S. Chao, Yervant Ho, Shuo
Li, Lynn Ling Zhu. In GSCL 2013. LNCS Vol. 8105, Volume Editors: Iryna
Gurevych, Chris Biemann and Torsten Zesch.
– German Society for Computational Linguistics (oral presentation):
To facilitate future research in unsupervised induction of syntactic structures
We design French-English phrase tagset mapping
We propose a universal phrase tagset
Phase tags extracted from French Treebank and English Penn Treebank
Explore the employment of the proposed mapping in unsupervised MT evaluation
54. • A Study of Chinese Word Segmentation Based on the Characteristics of
Chinese
– Aaron Li-Feng Han, Derek F. Wong, Lidia S. Chao, Yervant Ho, Lynn Ling
Zhu, Shuo Li. Accepted. In GSCL 2013. LNCS Vol. 8105, Volume Editors:
Iryna Gurevych, Chris Biemann and Torsten Zesch.
– German Society for Computational Linguistics (poster paper):
No word boundary in the Chinese expression
Chinese word segmentation is a difficult problem
Word segmentation is crucial to the word alignment in machine translation
We discuss the characteristics of Chinese and design optimized features
We formulize some problems and issues in Chinese word segmentation
55.
56. • AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH
INFORMATION
– Aaron Li-Feng Han, Derek F. Wong, Lidia S. Chao, Yervant Ho. In TSD
2013. Plzen, Czech Republic. LNAI Vol. 8082, pp. 121-128. Volume
Editors: I. Habernal and V. Matousek. Springer-Verlag Berlin
Heidelberg.
– Text, Speech and Dialogue 2013 (oral presentation):
We explore the unsupervised machine translation evaluation method
We design hLEPOR algorithm for the first time
We explore the POS usage in unsupervised MT evaluation
Experiments are performed on English vs French, German
57.
58. • Chinese Named Entity Recognition with Conditional Random Fields in the
Light of Chinese Characteristics
– Aaron Li-Feng Han, Derek Fai Wong and Lidia Sam Chao. In Proceeding
of LP&IIS. M.A. Klopotek et al. (Eds.): IIS 2013, LNCS Vol. 7912, pp. 57–
68, Warsaw, Poland. Springer-Verlag Berlin Heidelberg.
– Intelligent Information System 2013 (oral presentation):
Named entity recognition is important in IR, MT, text analysis, etc.
Chinese named entity recognition is more difficult due to no word boundary
We compare the performances of different algorithm, NB, CRF, SVM, ME
We analysis the characteristics respectively on personal, location, organization names
We show the performance of different features and select the optimized one.
59.
60. • Ongoing and further works:
– The combination of translation and evaluation, tuning the
translation model using evaluation metrics
– Evaluation models from the perspective of semantics
– The exploration of unsupervised evaluation models,
extracting features from source and target languages
61. • Actually speaking, the evaluation works are very
related to the similarity measuring. Where I have
employed them is in the MT evaluation only. These
works can be further developed into other literature:
– information retrieval
– question and answering
– Searching
– text analysis
– etc.
62. Q and A
Thanks for your attention!
Aaron L.-F. Han, 2013.08
63. • 1. Weaver, Warren.: Translation. In William Locke and A. Donald Booth, editors,
• Machine Translation of Languages: Fourteen Essays. John Wiley and Sons, New
• York, pages 15-23 (1955)
• 2. Marino B. Jose, Rafael E. Banchs, Josep M. Crego, Adria de Gispert, Patrik Lambert,
• Jose A. Fonollosa, Marta R. Costa-jussa: N-gram based machine translation,
• Computational Linguistics, Vol. 32, No. 4. pp. 527-549, MIT Press (2006)
• 3. Och, F. J.: Minimum Error Rate Training for Statistical Machine Translation. In
• Proceedings of (ACL-2003). pp. 160-167 (2003)
• 4. Su Hung-Yu and Chung-Hsien Wu: Improving Structural Statistical Machine Translation
• for Sign Language With Small Corpus Using Thematic Role Templates as
• Translation Memory, IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE
• PROCESSING, VOL. 17, NO. 7, SEPTEMBER (2009)
• 5. Xiong D., M. Zhang, H. Li: A Maximum-Entropy Segmentation Model for Statistical
• Machine Translation, Audio, Speech, and Language Processing, IEEE Transactions
• on, Volume: 19, Issue: 8, 2011 , pp. 2494- 2505 (2011)
• 6. Carl, M. and A. Way (eds): Recent Advances in Example-Based Machine Translation.
• Kluwer Academic Publishers, Dordrecht, The Netherlands (2003)
64. • 7. Koehn P.: Statistical Machine Translation, (University of Edinburgh), Cambridge
• University Press (2010)
• 8. Arnold, D.: Why translation is dicult for computers. In Computers and Translation:
• A translator's guide. Benjamins Translation Library (2003)
• 9. Carroll, J. B.: Aan experiment in evaluating the quality of translation, Pierce, J.
• (Chair), Languages and machines: computers in translation and linguistics. A report
• by the Automatic Language Processing Advisory Committee (ALPAC), Publication
• 1416, Division of Behavioral Sciences, National Academy of Sciences, National Research
• Council, page 67-75 (1966)
• 10. White, J. S., O'Connell, T. A., and O'Mara, F. E.: The ARPA MT evaluation
• methodologies: Evolution, lessons, and future approaches. In Proceedings of the
• Conference of the Association for Machine Translation in the Americas (AMTA
• 1994). pp 193-205 (1994)
• 11. Su Keh-Yih, Wu Ming-Wen and Chang Jing-Shin: A New Quantitative Quality
• Measure for Machine Translation Systems. In Proceedings of the 14th International
• Conference on Computational Linguistics, pages 433-439, Nantes, France, July
• (1992)
65. • 12. Tillmann C., Stephan Vogel, Hermann Ney, Arkaitz Zubiaga, and Hassan Sawaf:
• Accelerated DP Based Search For Statistical Translation. In Proceedings of the 5th
• European Conference on Speech Communication and Technology (EUROSPEECH97)
• (1997)
• 13. Papineni, K., Roukos, S., Ward, T. and Zhu, W. J.: BLEU: a method for automatic
• evaluation of machine translation. In Proceedings of the (ACL 2002), pages 311-318,
• Philadelphia, PA, USA (2002)
• 14. Doddington, G.: Automatic evaluation of machine translation quality using ngram
• co-occurrence statistics. In Proceedings of the second international conference
• on Human Language Technology Research(HLT 2002), pages 138-145, San Diego,
• California, USA (2002)
• 15. Turian, J. P., Shen, L. and Melanmed, I. D.: Evaluation of machine translation
• and its evaluation. In Proceedings of MT Summit IX, pages 386-393, New Orleans,
• LA, USA (2003)
• 16. Banerjee, S. and Lavie, A.: Meteor: an automatic metric for MT evaluation with
• high levels of correlation with human judgments. In Proceedings of ACL-WMT,
• pages 65-72, Prague, Czech Republic (2005)
66. • 17. Denkowski, M. and Lavie, A.: Meteor 1.3: Automatic metric for reliable optimization
• and evaluation of machine translation systems. In Proceedings of (ACL-WMT),
• pages 85-91, Edinburgh, Scotland, UK (2011)
• 18. Snover, M., Dorr, B., Schwartz, R., Micciulla, L. and Makhoul, J.: A study of
• translation edit rate with targeted human annotation. In Proceedings of the Conference
• of the Association for Machine Translation in the Americas (AMTA), pages
• 223-231, Boston, USA (2006)
• 19. Chen, B. and Kuhn, R.: Amber: A modied bleu, enhanced ranking metric. In
• Proceedings of (ACL-WMT), pages 71-77, Edinburgh, Scotland, UK (2011)
• 20. Bicici, E. and Yuret, D.: RegMT system for machine translation, system combination,
• and evaluation. In Proceedings ACL-WMT, pages 323-329, Edinburgh,
• Scotland, UK (2011)
• 21. Taylor, J. Shawe and N. Cristianini: Kernel Methods for Pattern Analysis. Cambridge
• University Press 2004.
• 22. Wong, B. T-M and Kit, C.: Word choice and word position for automatic MT
• evaluation. In Workshop: MetricsMATR of the Association for Machine Translation
• in the Americas (AMTA), short paper, 3 pages, Waikiki, Hawai'I, USA (2008)
67. • 23. Isozaki, H., Hirao, T., Duh, K., Sudoh, K., and Tsukada, H.: Automatic evaluation
• of translation quality for distant language pairs. In Proceedings of the 2010
• Conference on (EMNLP), pages 944{952, Cambridge, MA (2010)
• 24. Talbot, D., Kazawa, H., Ichikawa, H., Katz-Brown, J., Seno, M. and Och, F.: A
• Lightweight Evaluation Framework for Machine Translation Reordering. In Proceedings
• of the Sixth (ACL-WMT), pages 12-21, Edinburgh, Scotland, UK (2011)
• 25. Song, X. and Cohn, T.: Regression and ranking based optimisation for sentence
• level MT evaluation. In Proceedings of the (ACL-WMT), pages 123-129, Edinburgh,
• Scotland, UK (2011)
• 26. Popovic, M.: Morphemes and POS tags for n-gram based evaluation metrics. In
• Proceedings of (ACL-WMT), pages 104-107, Edinburgh, Scotland, UK (2011)
• 27. Popovic, M., Vilar, D., Avramidis, E. and Burchardt, A.: Evaluation without references:
• IBM1 scores as evaluation metrics. In Proceedings of the (ACL-WMT),
• pages 99-103, Edinburgh, Scotland, UK (2011)
• 28. Petrov S., Leon Barrett, Romain Thibaux, and Dan Klein: Learning accurate,
• compact, and interpretable tree annotation. Proceedings of the 21st ACL, pages
• 433-440, Sydney, July (2006)
68. • 29. Callison-Bruch, C., Koehn, P., Monz, C. and Zaidan, O. F.: Findings of the 2011
• Workshop on Statistical Machine Translation. In Proceedings of (ACL-WMT), pages
• 22-64, Edinburgh, Scotland, UK (2011)
• 30. Callison-Burch, C., Koehn, P., Monz, C., Peterson, K., Przybocki, M. and Zaidan,
• O. F.: Findings of the 2010 Joint Workshop on Statistical Machine Translation and
• Metrics for Machine Translation. In Proceedings of (ACL-WMT), pages 17-53, PA,
• USA (2010)
• 31. Callison-Burch, C., Koehn, P., Monz,C. and Schroeder, J.: Findings of the 2009
• Workshop on Statistical Machine Translation. In Proceedings of ACL-WMT, pages
• 1-28, Athens, Greece (2009)
• 32. Callison-Burch, C., Koehn, P., Monz,C. and Schroeder, J.: Further meta-evaluation
• of machine translation. In Proceedings of (ACL-WMT), pages 70-106, Columbus,
• Ohio, USA (2008)
• 33. Avramidis E., Popovic, M., Vilar, D., Burchardt, A.: Evaluate with Condence
• Estimation: Machine ranking of translation outputs using grammatical features. In
• Proceedings of the Sixth Workshop on Statistical Machine Translation, Association
• for Computational Linguistics (ACL-WMT), pages 65-70, Edinburgh, Scotland, UK
• (2011)