Examining large pre-trained language models for machine translation: What you don't know about it

•

0 likes•10 views

The document summarizes the results of experiments comparing large pre-trained language models for machine translation. In a machine translation challenge, a smaller Marian model demonstrated better or similar results to much larger pretrained models, contradicting expectations. This suggests that very large models do not necessarily improve translation quality and that current automatic evaluation metrics are limited. Human evaluation remains important for fully assessing machine translation quality.

Technology

EXAMINING LARGE
PRE-TRAINED
LANGUAGE MODELS
FOR MACHINE
TRANSLATION:
WHAT YOU DON’T
KNOW ABOUT IT
BIOMEDICAL CLINSPEN
WMT22 CHALLENGE RESULTS
2022
lifeng.han@manchester.ac.uk
serge.gladkoff@logrusglobal.com

ThefunctionofMTquality
The quality of MT output depends on the
model, language pair, quality of training
data, type of input content and other
smaller things.
3

Maybeextra-
largemodels?
Recently, extra large MT models were
increasingly coming out, with ever
increasing number of parameters and
multilingual capabilities
5

WMT21andNLLB
Two most recent extra large language models
WMT21
• 4.7 billion parameters
https://huggingface.co/facebook/wmt21-dense-
24-wide-en-x
NLLB
• 1.3 billion parameters
https://huggingface.co/docs/transformers/model_doc/nllb
6
Both extra large pretrained models can only be fine-tuned, and even for that they require supercomputer.

To answer this question we undertook
participation in WMT2022 Biomed2022 MT
challenge, with the aim to train several
models and then compare the results.
Experiment
7

Experimentalsetting
8
• Preliminary results:

BIOMEDICAL
WMT22
CLINSPEN
CHALLENGE
RESULTS

Resultsofthefine-tuning
Clinical-Marian wins clinical-NLLB in Task-1 (all metrics), Task-2 (METEOR, ROUGE), and Task-3 (METEOR,
COMET, ROUGE) on platform metrics.
10

All models were trained on the same data and tested on the
same test.
A lot of attention was given to the data preparation and cleaning
for fine-tuning. We finessed these data preparation methods and
tools for our Paralela commercial aligner product
(https://paralela.logrusglobal.com/home), we already had them.
Insufficient metrics of quality measurement
Accurate experiment setup and execution Unexpected result
Conclusions
11
Marian Helsinki demonstrated BETTER results than both xPLM
models!
COMET is clearly incorrect, because quality metrics cannot
exceed 1, therefore COMET metric score has not been normalized
correctly. Also, it does not even proportionally correspond with
other metrics.
Overall, we see here that the quality differences of these models
are not distinguishable with current automatic quality metrics.
The industry is now in situation when the training went ahead of
the ability to evaluate the results of the training. Further work in
the field of quality evaluation needs to be done.
Human evaluation is still a golden standard.
Production-wise, training of extra large language models does
not justify the cost and effort production wise, because the
output quality of smaller models is either better, or the same, or
very close. Consequently, we have reached another plateau of
MT quality with extra large models not fulfilling the promise of
the hype.
Practicality

THANKYOU
12
lifeng.han@manchester.ac.uk
serge.gladkoff@logrusglobal.com
[1] Marcin Junczys-Dowmunt and etc. Marian: Fast neural machine translation
in C++. In Proceedings of ACL 2018, System Demonstrations.
[2] NLLB Team. No language left behind: Scaling human-centered machine
translation, 2022. URL https://arxiv.org/abs/2207.04672.
BIBLIOGRAPHY

Similar to Examining large pre-trained language models for machine translation: What you don't know about it

A comprehensive guide to prompt engineering.pdfAnastasiaSteele10

Foutse_MSR Vision keynote.pptxFoutse Khomh

Large Language Models for Test Case Evolution and RepairLionel Briand

5 challenges of scaling l10n workflows KantanMT/bmmt webinarkantanmt

Building Custom Machine Learning Algorithms with Apache SystemMLsparktc

Building Custom Machine Learning Algorithms With Apache SystemMLJen Aman

The adoption of machine learning techniques for software defect prediction: A...RAKESH RANA

How to fine-tune and develop your own large language model.pptxKnoldus Inc.

Northbay_December_2023_LLM_Reporting.pdfssusera5352a2

Industrialization of testing Marathon QI Consultants

What machine translation developers are doing to make post-editors happyIconic Translation Machines

kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...Manuel Herranz

Managing machine learningDavid Murgatroyd

Vectorized Intent of Multilingual Large Language Models.pptxSachinAngre3

Google machine learning engineer exam dumps 2022SkillCertProExams

Effort Used to Create Domain-Specific Modeling LanguagesJuha-Pekka Tolvanen

Complexity 2David Maynard, MBA, PMP

2024-02-24_Session 1 - PMLE_UPDATED.pptxgdgsurrey

Foutse_Khomh.pptxFoutse Khomh

Implications of GPT-3Raven Jiang

Similar to Examining large pre-trained language models for machine translation: What you don't know about it (20)

A comprehensive guide to prompt engineering.pdf

Foutse_MSR Vision keynote.pptx

Large Language Models for Test Case Evolution and Repair

5 challenges of scaling l10n workflows KantanMT/bmmt webinar

Building Custom Machine Learning Algorithms with Apache SystemML

Building Custom Machine Learning Algorithms With Apache SystemML

The adoption of machine learning techniques for software defect prediction: A...

How to fine-tune and develop your own large language model.pptx

Northbay_December_2023_LLM_Reporting.pdf

Industrialization of testing

What machine translation developers are doing to make post-editors happy

kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...

Managing machine learning

Vectorized Intent of Multilingual Large Language Models.pptx

Google machine learning engineer exam dumps 2022

Effort Used to Create Domain-Specific Modeling Languages

Complexity 2

2024-02-24_Session 1 - PMLE_UPDATED.pptx

Foutse_Khomh.pptx

Implications of GPT-3

Recently uploaded

Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group

Key Features Of Token Development (1).pptxLBM Solutions

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

costume and set research powerpoint presentationphoebematthew05

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Install Stable Diffusion in windows machinePadma Pradeep

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

CloudStudio User manual (basic edition):comworks

APIForce Zurich 5 April Automation LPDGMarianaLemus7

Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Recently uploaded (20)

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads

Key Features Of Token Development (1).pptx

Are Multi-Cloud and Serverless Good or Bad?

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

costume and set research powerpoint presentation

Understanding the Laravel MVC Architecture

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

SQL Database Design For Developers at php[tek] 2024

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

Install Stable Diffusion in windows machine

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

CloudStudio User manual (basic edition):

APIForce Zurich 5 April Automation LPDG

Science&tech:THE INFORMATION AGE STS.pdf

Unleash Your Potential - Namagunga Girls Coding Club

Human Factors of XR: Using Human Factors to Design XR Systems

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service

Examining large pre-trained language models for machine translation: What you don't know about it

1. EXAMINING LARGE PRE-TRAINED LANGUAGE MODELS FOR MACHINE TRANSLATION: WHAT YOU DON’T KNOW ABOUT IT BIOMEDICAL CLINSPEN WMT22 CHALLENGE RESULTS 2022 lifeng.han@manchester.ac.uk serge.gladkoff@logrusglobal.com

2. Rationale • Magical technology to repro and generate new translations • BUT error rate is far from 0%.

3. ThefunctionofMTquality The quality of MT output depends on the model, language pair, quality of training data, type of input content and other smaller things. 3

4. TheFact: Theerrorrateisneverzero. 4

5. Maybeextra- largemodels? Recently, extra large MT models were increasingly coming out, with ever increasing number of parameters and multilingual capabilities 5

6. WMT21andNLLB Two most recent extra large language models WMT21 • 4.7 billion parameters https://huggingface.co/facebook/wmt21-dense- 24-wide-en-x NLLB • 1.3 billion parameters https://huggingface.co/docs/transformers/model_doc/nllb 6 Both extra large pretrained models can only be fine-tuned, and even for that they require supercomputer.

7. To answer this question we undertook participation in WMT2022 Biomed2022 MT challenge, with the aim to train several models and then compare the results. Experiment 7

8. Experimentalsetting 8 • Preliminary results:

9. BIOMEDICAL WMT22 CLINSPEN CHALLENGE RESULTS

10. Resultsofthefine-tuning Clinical-Marian wins clinical-NLLB in Task-1 (all metrics), Task-2 (METEOR, ROUGE), and Task-3 (METEOR, COMET, ROUGE) on platform metrics. 10

11. All models were trained on the same data and tested on the same test. A lot of attention was given to the data preparation and cleaning for fine-tuning. We finessed these data preparation methods and tools for our Paralela commercial aligner product (https://paralela.logrusglobal.com/home), we already had them. Insufficient metrics of quality measurement Accurate experiment setup and execution Unexpected result Conclusions 11 Marian Helsinki demonstrated BETTER results than both xPLM models! COMET is clearly incorrect, because quality metrics cannot exceed 1, therefore COMET metric score has not been normalized correctly. Also, it does not even proportionally correspond with other metrics. Overall, we see here that the quality differences of these models are not distinguishable with current automatic quality metrics. The industry is now in situation when the training went ahead of the ability to evaluate the results of the training. Further work in the field of quality evaluation needs to be done. Human evaluation is still a golden standard. Production-wise, training of extra large language models does not justify the cost and effort production wise, because the output quality of smaller models is either better, or the same, or very close. Consequently, we have reached another plateau of MT quality with extra large models not fulfilling the promise of the hype. Practicality

12. THANKYOU 12 lifeng.han@manchester.ac.uk serge.gladkoff@logrusglobal.com [1] Marcin Junczys-Dowmunt and etc. Marian: Fast neural machine translation in C++. In Proceedings of ACL 2018, System Demonstrations. [2] NLLB Team. No language left behind: Scaling human-centered machine translation, 2022. URL https://arxiv.org/abs/2207.04672. BIBLIOGRAPHY

Examining large pre-trained language models for machine translation: What you don't know about it

Recommended

Recommended

More Related Content

Similar to Examining large pre-trained language models for machine translation: What you don't know about it

Similar to Examining large pre-trained language models for machine translation: What you don't know about it (20)

More from Lifeng (Aaron) Han

More from Lifeng (Aaron) Han (20)

Recently uploaded

Recently uploaded (20)

Examining large pre-trained language models for machine translation: What you don't know about it