Read this next PPT to understand more about the advanced JustPlugIt services,platform and cross browser extension.
For more details visit JustPlugIt...
Read this next PPT to understand more about the advanced JustPlugIt services,platform and cross browser extension.
For more details visit JustPlugIt...
WMT2022 Biomedical MT PPT: Logrus Global and Uni ManchesterLifeng (Aaron) Han
Pre-trained language models (PLMs) often take
advantage of the monolingual and multilingual
dataset that is freely available online to acquire
general or mixed domain knowledge before
deployment into specific tasks. Extra-large
PLMs (xLPLMs) are proposed very recently
to claim supreme performances over smallersized PLMs such as in machine translation
(MT) tasks. These xLPLMs include Meta-AI’s
wmt21-dense-24-wide-en-X (2021) and NLLB
(2022). In this work, we examine if xLPLMs are
absolutely superior to smaller-sized PLMs in
fine-tuning toward domain-specific MTs. We
use two different in-domain data of different
sizes: commercial automotive in-house data
and clinical shared task data from the ClinSpEn2022 challenge at WMT2022. We choose
popular Marian Helsinki as smaller sized PLM
and two massive-sized Mega-Transformers
from Meta-AI as xLPLMs.
Our experimental investigation shows that 1)
on smaller sized in-domain commercial automotive data, xLPLM wmt21-dense-24-wideen-X indeed shows much better evaluation
scores using SACREBLEU and hLEPOR metrics than smaller-sized Marian, even though
its score increase rate is lower than Marian
after fine-tuning; 2) on relatively larger-size
well prepared clinical data fine-tuning, the
xLPLM NLLB tends to lose its advantage
over smaller-sized Marian on two sub-tasks
(clinical terms and ontology concepts) using
ClinSpEn offered metrics METEOR, COMET,
and ROUGE-L, and totally lost to Marian on
Task-1 (clinical cases) on all official metrics
including SACREBLEU and BLEU; 3) metrics do not always agree with each other
on the same tasks using the same model outputs; 4) clinic-Marian ranked No.2 on Task-1
(via SACREBLEU/BLEU) and Task-3 (via METEOR and ROUGE) among all submissions.
Measuring Uncertainty in Translation Quality Evaluation (TQE)Lifeng (Aaron) Han
From both human translators (HT) and machine translation (MT) researchers' point of view, translation quality evaluation (TQE) is an essential task. Translation service providers (TSPs) have to deliver large volumes of translations which meet customer specifications with harsh constraints of required quality level in tight time-frames and costs. MT researchers strive to make their models better, which also requires reliable quality evaluation. While automatic machine translation evaluation (MTE) metrics and quality estimation (QE) tools are widely available and easy to access, existing automated tools are not good enough, and human assessment from professional translators (HAPs) are often chosen as the golden standard \cite{han-etal-2021-TQA}.
Human evaluations, however, are often accused of having low reliability and agreement. Is this caused by subjectivity or statistics is at play? How to avoid the entire text to be checked and be more efficient with TQE from cost and efficiency perspectives, and what is the optimal sample size of the translated text, so as to reliably estimate the translation quality of the entire material? This work carries out such a motivated research to correctly estimate the confidence intervals \cite{Brown_etal2001Interval}
depending on the sample size of translated text, e.g. the amount of words or sentences, that needs to be processed on TQE workflow step for confident and reliable evaluation of overall translation quality.
The methodology we applied for this work is from Bernoulli Statistical Distribution Modeling (BSDM) and Monte Carlo Sampling Analysis (MCSA).
Reference: S Gladkoff, I Sorokina, L Han, A Alekseeva. 2022. Measuring Uncertainty in Translation Quality Evaluation (TQE). LREC2022. arXiv preprint arXiv:2111.07699
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Lifeng (Aaron) Han
Starting from 1950s, Machine Translation (MT) was challenged from different scientific solutions which included rule-based methods, example-based and statistical models (SMT), to hybrid models, and very recent years the neural models (NMT).
While NMT has achieved a huge quality improvement in comparison to conventional methodologies, by taking advantages of huge amount of parallel corpora available from internet and the recently developed super computational power support with an acceptable cost, it struggles to achieve real human parity in many domains and most language pairs, if not all of them.
Alongside the long road of MT research and development, quality evaluation metrics played very important roles in MT advancement and evolution.
In this tutorial, we overview the traditional human judgement criteria, automatic evaluation metrics, unsupervised quality estimation models, as well as the meta-evaluation of the evaluation methods. Among these, we will also cover the very recent work in the MT evaluation (MTE) fields taking advantages of large size of pre-trained language models for automatic metric customisation towards exactly deployed language pairs and domains. In addition, we also introduce the statistical confidence estimation regarding sample size needed for human evaluation in real practice simulation.
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...Lifeng (Aaron) Han
Traditional automatic evaluation metrics for machine translation have been widely criticized by linguists due to their low accuracy, lack of transparency, focus on language mechanics rather than semantics, and low agreement with human quality evaluation. Human evaluations in the form of MQM-like scorecards have always been carried out in real industry setting by both clients and translation service providers (TSPs). However, traditional human translation quality evaluations are costly to perform and go into great linguistic detail, raise issues as to
inter-rater reliability (IRR) and are not designed to measure quality of worse than premium quality translations.
In this work, we introduce \textbf{HOPE}, a task-oriented and \textit{\textbf{h}}uman-centric evaluation framework for machine translation output based \textit{\textbf{o}}n professional \textit{\textbf{p}}ost-\textit{\textbf{e}}diting annotations. It contains only a limited number of commonly occurring error types, and uses a scoring model with geometric progression of error penalty points (EPPs) reflecting error severity level to each translation unit.
The initial experimental work carried out on English-Russian language pair MT outputs on marketing content type of text from highly technical domain reveals that our evaluation framework is quite effective in reflecting the MT output quality regarding both overall system-level performance and segment-level transparency, and it increases the IRR for error type interpretation.
The approach has several key advantages, such as ability to measure and compare less than perfect MT output from different systems, ability to indicate human perception of quality, immediate estimation of the labor effort required to bring MT output to premium quality, low-cost and faster application, as well as higher IRR. Our experimental data is available at \url{https://github.com/lHan87/HOPE}
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...Lifeng (Aaron) Han
Traditional automatic evaluation metrics for machine translation have been widely criticized by linguists due to their low accuracy, lack of transparency, focus on language mechanics rather than semantics, and low agreement with human quality evaluation. Human evaluations in the form of MQM-like scorecards have always been carried out in real industry setting by both clients and translation service providers (TSPs). However, traditional human translation quality evaluations are costly to perform and go into great linguistic detail, raise issues as to inter-rater reliability (IRR) and are not designed to measure quality of worse than premium quality translations. In this work, we introduce HOPE, a task-oriented and human-centric evaluation framework for machine translation output based on professional post-editing annotations. It contains only a limited number of commonly occurring error types, and use a scoring model with geometric progression of error penalty points (EPPs) reflecting error severity level to each translation unit. The initial experimental work carried out on English-Russian language pair MT outputs on marketing content type of text from highly technical domain reveals that our evaluation framework is quite effective in reflecting the MT output quality regarding both overall system-level performance and segment-level transparency, and it increases the IRR for error type interpretation. The approach has several key advantages, such as ability to measure and compare less than perfect MT output from different systems, ability to indicate human perception of quality, immediate estimation of the labor effort required to bring MT output to premium quality, low-cost and faster application, as well as higher IRR. Our experimental data is available at \url{this https URL}.
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
Cite: Lifeng Han. 2021. Meta-evaluation of machine translation evaluation methods. In Metrics2021 Tutorial Track/type: Workshop on Informetric and Scientometric Research (SIG-MET), ASIS&T. October 23–24.
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
LPRC 2018: Limerick Postgraduate Research Conference
Lifeng Han and Shaohui Kuang. 2018. Apply Chinese radicals into neural machine translation: Deeper than character level. ArXiv pre-print https://arxiv.org/abs/1805.01565v1
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
ADAPT seminar series. June 2021
research papers @NoDaLiDa2021:the 23rd Nordic Conference on Computational Linguistics
& COLING20:MWE-LEX WS
Bonus takeaway:
AlphaMWE multilingual corpus
with MWEs
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longerLifeng (Aaron) Han
Build Moses Statistical Machine Translation system with Ubuntu
Tree to tree Machine Translation with Universal phrase tagset. https://github.com/aaronlifenghan/A-Universal-Phrase-Tagset
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Lifeng (Aaron) Han
ADAPT Centre & Detection of Verbal Multi-Word Expressions via Conditional Random Fields with Syntactic Dependency Features and Semantic Re-Ranking @ DLSS2017 Bilbao.
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...Lifeng (Aaron) Han
In this work, we present the construction of multilingual parallel corpora with annotation of multiword expressions (MWEs). MWEs include verbal MWEs (vMWEs) defined in the PARSEME shared task that have a verb as the head of the studied terms. The annotated vMWEs are also bilingually and multilingually aligned manually. The languages covered include English, Chinese, Polish, and German. Our original English corpus is taken from the PARSEME shared task in 2018. We performed machine translation of this source corpus followed by human post editing and annotation of target MWEs. Strict quality control was applied for error limitation, i.e., each MT output sentence received first manual post editing and annotation plus second manual quality rechecking. One of our findings during corpora preparation is that accurate translation of MWEs presents challenges to MT systems. To facilitate further MT research, we present a categorisation of the error types encountered by MT systems in performing MWE related translation. To acquire a broader view of MT issues, we selected four popular state-of-the-art MT models for comparisons namely: Microsoft Bing Translator, GoogleMT, Baidu Fanyi and DeepL MT. Because of the noise removal, translation post editing and MWE annotation by human professionals, we believe our AlphaMWE dataset will be an asset for cross-lingual and multilingual research, such as MT and information extraction. Our multilingual corpora are available as open access at github.com/poethan/AlphaMWE
WMT2022 Biomedical MT PPT: Logrus Global and Uni ManchesterLifeng (Aaron) Han
Pre-trained language models (PLMs) often take
advantage of the monolingual and multilingual
dataset that is freely available online to acquire
general or mixed domain knowledge before
deployment into specific tasks. Extra-large
PLMs (xLPLMs) are proposed very recently
to claim supreme performances over smallersized PLMs such as in machine translation
(MT) tasks. These xLPLMs include Meta-AI’s
wmt21-dense-24-wide-en-X (2021) and NLLB
(2022). In this work, we examine if xLPLMs are
absolutely superior to smaller-sized PLMs in
fine-tuning toward domain-specific MTs. We
use two different in-domain data of different
sizes: commercial automotive in-house data
and clinical shared task data from the ClinSpEn2022 challenge at WMT2022. We choose
popular Marian Helsinki as smaller sized PLM
and two massive-sized Mega-Transformers
from Meta-AI as xLPLMs.
Our experimental investigation shows that 1)
on smaller sized in-domain commercial automotive data, xLPLM wmt21-dense-24-wideen-X indeed shows much better evaluation
scores using SACREBLEU and hLEPOR metrics than smaller-sized Marian, even though
its score increase rate is lower than Marian
after fine-tuning; 2) on relatively larger-size
well prepared clinical data fine-tuning, the
xLPLM NLLB tends to lose its advantage
over smaller-sized Marian on two sub-tasks
(clinical terms and ontology concepts) using
ClinSpEn offered metrics METEOR, COMET,
and ROUGE-L, and totally lost to Marian on
Task-1 (clinical cases) on all official metrics
including SACREBLEU and BLEU; 3) metrics do not always agree with each other
on the same tasks using the same model outputs; 4) clinic-Marian ranked No.2 on Task-1
(via SACREBLEU/BLEU) and Task-3 (via METEOR and ROUGE) among all submissions.
Measuring Uncertainty in Translation Quality Evaluation (TQE)Lifeng (Aaron) Han
From both human translators (HT) and machine translation (MT) researchers' point of view, translation quality evaluation (TQE) is an essential task. Translation service providers (TSPs) have to deliver large volumes of translations which meet customer specifications with harsh constraints of required quality level in tight time-frames and costs. MT researchers strive to make their models better, which also requires reliable quality evaluation. While automatic machine translation evaluation (MTE) metrics and quality estimation (QE) tools are widely available and easy to access, existing automated tools are not good enough, and human assessment from professional translators (HAPs) are often chosen as the golden standard \cite{han-etal-2021-TQA}.
Human evaluations, however, are often accused of having low reliability and agreement. Is this caused by subjectivity or statistics is at play? How to avoid the entire text to be checked and be more efficient with TQE from cost and efficiency perspectives, and what is the optimal sample size of the translated text, so as to reliably estimate the translation quality of the entire material? This work carries out such a motivated research to correctly estimate the confidence intervals \cite{Brown_etal2001Interval}
depending on the sample size of translated text, e.g. the amount of words or sentences, that needs to be processed on TQE workflow step for confident and reliable evaluation of overall translation quality.
The methodology we applied for this work is from Bernoulli Statistical Distribution Modeling (BSDM) and Monte Carlo Sampling Analysis (MCSA).
Reference: S Gladkoff, I Sorokina, L Han, A Alekseeva. 2022. Measuring Uncertainty in Translation Quality Evaluation (TQE). LREC2022. arXiv preprint arXiv:2111.07699
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Lifeng (Aaron) Han
Starting from 1950s, Machine Translation (MT) was challenged from different scientific solutions which included rule-based methods, example-based and statistical models (SMT), to hybrid models, and very recent years the neural models (NMT).
While NMT has achieved a huge quality improvement in comparison to conventional methodologies, by taking advantages of huge amount of parallel corpora available from internet and the recently developed super computational power support with an acceptable cost, it struggles to achieve real human parity in many domains and most language pairs, if not all of them.
Alongside the long road of MT research and development, quality evaluation metrics played very important roles in MT advancement and evolution.
In this tutorial, we overview the traditional human judgement criteria, automatic evaluation metrics, unsupervised quality estimation models, as well as the meta-evaluation of the evaluation methods. Among these, we will also cover the very recent work in the MT evaluation (MTE) fields taking advantages of large size of pre-trained language models for automatic metric customisation towards exactly deployed language pairs and domains. In addition, we also introduce the statistical confidence estimation regarding sample size needed for human evaluation in real practice simulation.
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...Lifeng (Aaron) Han
Traditional automatic evaluation metrics for machine translation have been widely criticized by linguists due to their low accuracy, lack of transparency, focus on language mechanics rather than semantics, and low agreement with human quality evaluation. Human evaluations in the form of MQM-like scorecards have always been carried out in real industry setting by both clients and translation service providers (TSPs). However, traditional human translation quality evaluations are costly to perform and go into great linguistic detail, raise issues as to
inter-rater reliability (IRR) and are not designed to measure quality of worse than premium quality translations.
In this work, we introduce \textbf{HOPE}, a task-oriented and \textit{\textbf{h}}uman-centric evaluation framework for machine translation output based \textit{\textbf{o}}n professional \textit{\textbf{p}}ost-\textit{\textbf{e}}diting annotations. It contains only a limited number of commonly occurring error types, and uses a scoring model with geometric progression of error penalty points (EPPs) reflecting error severity level to each translation unit.
The initial experimental work carried out on English-Russian language pair MT outputs on marketing content type of text from highly technical domain reveals that our evaluation framework is quite effective in reflecting the MT output quality regarding both overall system-level performance and segment-level transparency, and it increases the IRR for error type interpretation.
The approach has several key advantages, such as ability to measure and compare less than perfect MT output from different systems, ability to indicate human perception of quality, immediate estimation of the labor effort required to bring MT output to premium quality, low-cost and faster application, as well as higher IRR. Our experimental data is available at \url{https://github.com/lHan87/HOPE}
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...Lifeng (Aaron) Han
Traditional automatic evaluation metrics for machine translation have been widely criticized by linguists due to their low accuracy, lack of transparency, focus on language mechanics rather than semantics, and low agreement with human quality evaluation. Human evaluations in the form of MQM-like scorecards have always been carried out in real industry setting by both clients and translation service providers (TSPs). However, traditional human translation quality evaluations are costly to perform and go into great linguistic detail, raise issues as to inter-rater reliability (IRR) and are not designed to measure quality of worse than premium quality translations. In this work, we introduce HOPE, a task-oriented and human-centric evaluation framework for machine translation output based on professional post-editing annotations. It contains only a limited number of commonly occurring error types, and use a scoring model with geometric progression of error penalty points (EPPs) reflecting error severity level to each translation unit. The initial experimental work carried out on English-Russian language pair MT outputs on marketing content type of text from highly technical domain reveals that our evaluation framework is quite effective in reflecting the MT output quality regarding both overall system-level performance and segment-level transparency, and it increases the IRR for error type interpretation. The approach has several key advantages, such as ability to measure and compare less than perfect MT output from different systems, ability to indicate human perception of quality, immediate estimation of the labor effort required to bring MT output to premium quality, low-cost and faster application, as well as higher IRR. Our experimental data is available at \url{this https URL}.
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
Cite: Lifeng Han. 2021. Meta-evaluation of machine translation evaluation methods. In Metrics2021 Tutorial Track/type: Workshop on Informetric and Scientometric Research (SIG-MET), ASIS&T. October 23–24.
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
LPRC 2018: Limerick Postgraduate Research Conference
Lifeng Han and Shaohui Kuang. 2018. Apply Chinese radicals into neural machine translation: Deeper than character level. ArXiv pre-print https://arxiv.org/abs/1805.01565v1
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
ADAPT seminar series. June 2021
research papers @NoDaLiDa2021:the 23rd Nordic Conference on Computational Linguistics
& COLING20:MWE-LEX WS
Bonus takeaway:
AlphaMWE multilingual corpus
with MWEs
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longerLifeng (Aaron) Han
Build Moses Statistical Machine Translation system with Ubuntu
Tree to tree Machine Translation with Universal phrase tagset. https://github.com/aaronlifenghan/A-Universal-Phrase-Tagset
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Lifeng (Aaron) Han
ADAPT Centre & Detection of Verbal Multi-Word Expressions via Conditional Random Fields with Syntactic Dependency Features and Semantic Re-Ranking @ DLSS2017 Bilbao.
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...Lifeng (Aaron) Han
In this work, we present the construction of multilingual parallel corpora with annotation of multiword expressions (MWEs). MWEs include verbal MWEs (vMWEs) defined in the PARSEME shared task that have a verb as the head of the studied terms. The annotated vMWEs are also bilingually and multilingually aligned manually. The languages covered include English, Chinese, Polish, and German. Our original English corpus is taken from the PARSEME shared task in 2018. We performed machine translation of this source corpus followed by human post editing and annotation of target MWEs. Strict quality control was applied for error limitation, i.e., each MT output sentence received first manual post editing and annotation plus second manual quality rechecking. One of our findings during corpora preparation is that accurate translation of MWEs presents challenges to MT systems. To facilitate further MT research, we present a categorisation of the error types encountered by MT systems in performing MWE related translation. To acquire a broader view of MT issues, we selected four popular state-of-the-art MT models for comparisons namely: Microsoft Bing Translator, GoogleMT, Baidu Fanyi and DeepL MT. Because of the noise removal, translation post editing and MWE annotation by human professionals, we believe our AlphaMWE dataset will be an asset for cross-lingual and multilingual research, such as MT and information extraction. Our multilingual corpora are available as open access at github.com/poethan/AlphaMWE
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
Invited Presentation in NLP lab of Soochow University, about my NLP journey and ADAPT Centre. NLP part covers Machine Translation Evaluation, Quality Estimation, Multiword Expression Identification, Named Entity Recognition, Word Segmentation, Treebanks, Parsing.
A deep analysis of Multi-word Expression and Machine TranslationLifeng (Aaron) Han
A deep analysis of Multi-word Expression and Machine Translation. Faculty research open day. DCU, Dublin. 2019.
Including MWE identification, MT with radical, MTE.
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Lifeng (Aaron) Han
This is a short presentation for the Poster of WMT13 shared task: This paper is to introduce our participation in
the WMT13 shared tasks on Quality Estimation
for machine translation without using reference
translations. We submitted the results
for Task 1.1 (sentence-level quality estimation),
Task 1.2 (system selection) and Task 2
(word-level quality estimation). In Task 1.1,
we used an enhanced version of BLEU metric
without using reference translations to evaluate
the translation quality. In Task 1.2, we utilized
a probability model Naïve Bayes (NB) as
a classification algorithm with the features
borrowed from the traditional evaluation metrics.
In Task 2, to take the contextual information
into account, we employed a discriminative
undirected probabilistic graphical model
Conditional random field (CRF), in addition
to the NB algorithm. The training experiments
on the past WMT corpora showed that the designed
methods of this paper yielded promising
results especially the statistical models of
CRF and NB. The official results show that
our CRF model achieved the highest F-score
0.8297 in binary classification of Task 2.
Mathematical Modeling for the Real Estate Industry (房地产行业的数学建模)
1. 房地产行业的数学建模
Mathematical Modeling for the Real Estate Industry
H. Shu, Zhibo Wang, Aaron L.-F. Han
Team-ID: 99999033
University of Macau
Av. Padre Tomás Pereira Taipa, Macau, China
In Proceedings of the 8th
National Post-Graduate Mathematical Contest in
Modeling (NPGMCM 2011): National Second Prize, P. R. China, 2011.