SlideShare a Scribd company logo
NMT Training Project
Lessons Learned
By Jiaqian Wang, Jiaying Lai, Lingge Liu, Qifang Liang
Agenda
Project background, the overall process and results – Jiaqian
What went well – Qifang
What didn’t go so well – Jiaying
Possible Improvements & insights for
future projects – Lingge
Executions plan
Financial issues
Introduction
• Client: BBC
• Purpose: train an NMT engine for documentary subtitle (Big Cats II)
• Language: en-US → zh-CN
• MT training system: MS Custom Translator, Systran
• Initial Goal:
• Efficiency: PEMT 30% faster than human translation
• Cost: PEMT 30% savings over human translation
• Quality: PEMT with an acceptable score (≤20 points and zero critical error) in a
sample of 1000 words
Data composition Microsoft
Bleu
Systran
Bleu
Round 1 Planet Earth I, The Hunt, Big Cat Tales; Glossary; Big Cats Episode 3 (Tuning Data),
Big Cats Episode 1&2 (Test Data)
11.29 25.88
Round 2 Added Blue Planet II to the training dataset (2576 segments) 12.02 25.81
Round 3 Added Planet Earth II to the training dataset (2709 segments) 13.35 25.18
Round 4 Switch Big Cat Tales from training to tuning dataset (3196 segments) 13.76 25.24
Round 5 Removed Planet Earth I & II, and added the Ted talk TM to the training dataset (9136 segments) 9.92 24.94
Round 6 Added the official translation of Animal I to the training dataset (1249 segments) 13.59 24.9
Round 7 Added the official translation of Animal II to the training dataset 14.38 24.77
Process and Results
Time and cost
Estimated time and costs
Time (hrs) Cost ($)
File Prep 12 480
Glossary creation 3 120
MT training 20 800
PEMT 4 108
Human Evaluation 4 200
PM fee 8 320
Total 51 2028
Actual time and costs
Time (hrs) Cost ($)
File Prep 12 480
Glossary creation 1 40
MT training 14 560
PEMT 4 108
Human Evaluation 4 200
PM fee 8 320
Total 43 1708
Human Translation PEMT
Speed 300 words/hour 750 words/hour
Rate $0.15/word $0.09/word
Time 3.3 hours 1.3 hours
Time savings: ~61%
Cost $150 $90
Cost savings: ~40%
The following data is based on a sample of 1000 words.
Goals check: Time and Cost savings (30% ✔)
Goals check: Quality (≤20 points and zero critical error ×)
MS R1 MS R7 Systran R1 Systran R7
Minor (1) 38 24 29 25
Major (5) 24 10 9 22
Critical (10) 6 13 4 2
Total score 201 214 114 157
Positive
attitude
Ready to adapt
Effective
communication
What
went well
We kept a positive attitude. We were optimistic despite low score
- Detailed adjustments we made that helped improve the score:
During training
Added
Animal
official
subtitle
Effective communication
& team cooperation
What did
not go
so well
During File Prep
During Training
During File Prep
• Failure to get high quality data
Accepted: srt./ass. format
Not accepted: idx./sub. format
Conversion by Subtitle Edit???
During File Prep (& PEMT)
• Segments out of order (using Memsource)
NMT
→ During PEMT / When assessing quality,
During training
• Low Bleu score
1. Segments fragmented and out of order
Ted? Bleu score dropped
significantly 
Reasons:
1. Irrelevant content
2. Low quality
At these high altitudes, the thin air is taking its toll…
• Low Bleu score
3. Lack of metrics conversion
Improvements &
insights for future
projects
NMT feasibility
Training data quality
Management
Improvements & insights for future projects
• NMT feasibility check
• Is it MT friendly?
• Contain cognitive/emotional/abstract/literary concepts?
• Cost
Improvements & insights
for future projects
• Training data quality check
• Translation quality
• File format
This Photo by Unknown Author is licensed under CC BY-NC
Improvements & insights for future projects
• Management
• Training process
education
• Unified setting
standards
• Unified QA check
standards
Improvements & insights
for future projects
• Miscellaneous
• Organized resource file
• Take immediate actions

More Related Content

Similar to Lessons Learned.pptx

Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWSSeeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
Iconic Translation Machines
 
Metrics-Based Process Mapping - Part 3 of 3 (Product Demo)
Metrics-Based Process Mapping - Part 3 of 3 (Product Demo)Metrics-Based Process Mapping - Part 3 of 3 (Product Demo)
Metrics-Based Process Mapping - Part 3 of 3 (Product Demo)
TKMG, Inc.
 
Lab 64 - Python Sktime for time series analysis in python with visualization ...
Lab 64 - Python Sktime for time series analysis in python with visualization ...Lab 64 - Python Sktime for time series analysis in python with visualization ...
Lab 64 - Python Sktime for time series analysis in python with visualization ...
finalyearproject61
 
Metrics-Based Process Mapping
Metrics-Based Process MappingMetrics-Based Process Mapping
Metrics-Based Process Mapping
TKMG, Inc.
 
PROJECT STORYBOARD: Reducing Learning Curve Ramp for Temp Employees by 2 Weeks
PROJECT STORYBOARD: Reducing Learning Curve Ramp for Temp Employees by 2 WeeksPROJECT STORYBOARD: Reducing Learning Curve Ramp for Temp Employees by 2 Weeks
PROJECT STORYBOARD: Reducing Learning Curve Ramp for Temp Employees by 2 Weeks
GoLeanSixSigma.com
 
Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - Slides
Aditya Joshi
 
TAUS MT SHOWCASE, Hunnect’s Use Case, Sándor Sojnóczky, Hunnect, 10 April 2013
TAUS MT SHOWCASE, Hunnect’s Use Case, Sándor Sojnóczky, Hunnect, 10 April 2013TAUS MT SHOWCASE, Hunnect’s Use Case, Sándor Sojnóczky, Hunnect, 10 April 2013
TAUS MT SHOWCASE, Hunnect’s Use Case, Sándor Sojnóczky, Hunnect, 10 April 2013
TAUS - The Language Data Network
 
Lean Six Sigma project Caponera
Lean Six Sigma project CaponeraLean Six Sigma project Caponera
Lean Six Sigma project Caponera
Annalisa Caponera
 
Value Stream Transformation: Achieving Excellence through Leadership Alignmen...
Value Stream Transformation: Achieving Excellence through Leadership Alignmen...Value Stream Transformation: Achieving Excellence through Leadership Alignmen...
Value Stream Transformation: Achieving Excellence through Leadership Alignmen...
TKMG, Inc.
 
Language Localisation of Tamil using Statistical Machine Translation - ICTer2015
Language Localisation of Tamil using Statistical Machine Translation - ICTer2015Language Localisation of Tamil using Statistical Machine Translation - ICTer2015
Language Localisation of Tamil using Statistical Machine Translation - ICTer2015
Achchuthan Yogarajah
 
Nayeem Resume HP
Nayeem Resume HPNayeem Resume HP
Nayeem Resume HP
Nayeem Khan
 
Improve phase lean six sigma tollgate template
Improve phase   lean six sigma tollgate templateImprove phase   lean six sigma tollgate template
Improve phase lean six sigma tollgate template
Steven Bonacorsi
 
Improve phase lean six sigma tollgate template
Improve phase   lean six sigma tollgate templateImprove phase   lean six sigma tollgate template
Improve phase lean six sigma tollgate template
Steven Bonacorsi
 
Value Stream Mapping in Non-Manufacturing Environments
Value Stream Mapping in Non-Manufacturing EnvironmentsValue Stream Mapping in Non-Manufacturing Environments
Value Stream Mapping in Non-Manufacturing Environments
TKMG, Inc.
 
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS - The Language Data Network
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happy
Iconic Translation Machines
 
2.) services (people & culture)
2.) services (people & culture)2.) services (people & culture)
2.) services (people & culture)
Jeff Green
 
MT Use in Lingosail, by Yongpeng Wei, Lingosail
MT Use in Lingosail, by Yongpeng Wei, LingosailMT Use in Lingosail, by Yongpeng Wei, Lingosail
MT Use in Lingosail, by Yongpeng Wei, Lingosail
TAUS - The Language Data Network
 
Joel Manfredo: Accelerating IT Mojo in the Face of Financial Distress
Joel Manfredo: Accelerating IT Mojo in the Face of Financial DistressJoel Manfredo: Accelerating IT Mojo in the Face of Financial Distress
Joel Manfredo: Accelerating IT Mojo in the Face of Financial Distress
gbrjournal
 
A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.
Pankaj Chandan Mohapatra
 

Similar to Lessons Learned.pptx (20)

Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWSSeeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
 
Metrics-Based Process Mapping - Part 3 of 3 (Product Demo)
Metrics-Based Process Mapping - Part 3 of 3 (Product Demo)Metrics-Based Process Mapping - Part 3 of 3 (Product Demo)
Metrics-Based Process Mapping - Part 3 of 3 (Product Demo)
 
Lab 64 - Python Sktime for time series analysis in python with visualization ...
Lab 64 - Python Sktime for time series analysis in python with visualization ...Lab 64 - Python Sktime for time series analysis in python with visualization ...
Lab 64 - Python Sktime for time series analysis in python with visualization ...
 
Metrics-Based Process Mapping
Metrics-Based Process MappingMetrics-Based Process Mapping
Metrics-Based Process Mapping
 
PROJECT STORYBOARD: Reducing Learning Curve Ramp for Temp Employees by 2 Weeks
PROJECT STORYBOARD: Reducing Learning Curve Ramp for Temp Employees by 2 WeeksPROJECT STORYBOARD: Reducing Learning Curve Ramp for Temp Employees by 2 Weeks
PROJECT STORYBOARD: Reducing Learning Curve Ramp for Temp Employees by 2 Weeks
 
Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - Slides
 
TAUS MT SHOWCASE, Hunnect’s Use Case, Sándor Sojnóczky, Hunnect, 10 April 2013
TAUS MT SHOWCASE, Hunnect’s Use Case, Sándor Sojnóczky, Hunnect, 10 April 2013TAUS MT SHOWCASE, Hunnect’s Use Case, Sándor Sojnóczky, Hunnect, 10 April 2013
TAUS MT SHOWCASE, Hunnect’s Use Case, Sándor Sojnóczky, Hunnect, 10 April 2013
 
Lean Six Sigma project Caponera
Lean Six Sigma project CaponeraLean Six Sigma project Caponera
Lean Six Sigma project Caponera
 
Value Stream Transformation: Achieving Excellence through Leadership Alignmen...
Value Stream Transformation: Achieving Excellence through Leadership Alignmen...Value Stream Transformation: Achieving Excellence through Leadership Alignmen...
Value Stream Transformation: Achieving Excellence through Leadership Alignmen...
 
Language Localisation of Tamil using Statistical Machine Translation - ICTer2015
Language Localisation of Tamil using Statistical Machine Translation - ICTer2015Language Localisation of Tamil using Statistical Machine Translation - ICTer2015
Language Localisation of Tamil using Statistical Machine Translation - ICTer2015
 
Nayeem Resume HP
Nayeem Resume HPNayeem Resume HP
Nayeem Resume HP
 
Improve phase lean six sigma tollgate template
Improve phase   lean six sigma tollgate templateImprove phase   lean six sigma tollgate template
Improve phase lean six sigma tollgate template
 
Improve phase lean six sigma tollgate template
Improve phase   lean six sigma tollgate templateImprove phase   lean six sigma tollgate template
Improve phase lean six sigma tollgate template
 
Value Stream Mapping in Non-Manufacturing Environments
Value Stream Mapping in Non-Manufacturing EnvironmentsValue Stream Mapping in Non-Manufacturing Environments
Value Stream Mapping in Non-Manufacturing Environments
 
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happy
 
2.) services (people & culture)
2.) services (people & culture)2.) services (people & culture)
2.) services (people & culture)
 
MT Use in Lingosail, by Yongpeng Wei, Lingosail
MT Use in Lingosail, by Yongpeng Wei, LingosailMT Use in Lingosail, by Yongpeng Wei, Lingosail
MT Use in Lingosail, by Yongpeng Wei, Lingosail
 
Joel Manfredo: Accelerating IT Mojo in the Face of Financial Distress
Joel Manfredo: Accelerating IT Mojo in the Face of Financial DistressJoel Manfredo: Accelerating IT Mojo in the Face of Financial Distress
Joel Manfredo: Accelerating IT Mojo in the Face of Financial Distress
 
A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.
 

Recently uploaded

IGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdfIGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
Amin Marwan
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
MysoreMuleSoftMeetup
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
spdendr
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
MJDuyan
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
สมใจ จันสุกสี
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
ssuser13ffe4
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 

Recently uploaded (20)

IGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdfIGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 

Lessons Learned.pptx

  • 1. NMT Training Project Lessons Learned By Jiaqian Wang, Jiaying Lai, Lingge Liu, Qifang Liang
  • 2. Agenda Project background, the overall process and results – Jiaqian What went well – Qifang What didn’t go so well – Jiaying Possible Improvements & insights for future projects – Lingge Executions plan Financial issues
  • 3. Introduction • Client: BBC • Purpose: train an NMT engine for documentary subtitle (Big Cats II) • Language: en-US → zh-CN • MT training system: MS Custom Translator, Systran • Initial Goal: • Efficiency: PEMT 30% faster than human translation • Cost: PEMT 30% savings over human translation • Quality: PEMT with an acceptable score (≤20 points and zero critical error) in a sample of 1000 words
  • 4. Data composition Microsoft Bleu Systran Bleu Round 1 Planet Earth I, The Hunt, Big Cat Tales; Glossary; Big Cats Episode 3 (Tuning Data), Big Cats Episode 1&2 (Test Data) 11.29 25.88 Round 2 Added Blue Planet II to the training dataset (2576 segments) 12.02 25.81 Round 3 Added Planet Earth II to the training dataset (2709 segments) 13.35 25.18 Round 4 Switch Big Cat Tales from training to tuning dataset (3196 segments) 13.76 25.24 Round 5 Removed Planet Earth I & II, and added the Ted talk TM to the training dataset (9136 segments) 9.92 24.94 Round 6 Added the official translation of Animal I to the training dataset (1249 segments) 13.59 24.9 Round 7 Added the official translation of Animal II to the training dataset 14.38 24.77 Process and Results
  • 5. Time and cost Estimated time and costs Time (hrs) Cost ($) File Prep 12 480 Glossary creation 3 120 MT training 20 800 PEMT 4 108 Human Evaluation 4 200 PM fee 8 320 Total 51 2028 Actual time and costs Time (hrs) Cost ($) File Prep 12 480 Glossary creation 1 40 MT training 14 560 PEMT 4 108 Human Evaluation 4 200 PM fee 8 320 Total 43 1708
  • 6. Human Translation PEMT Speed 300 words/hour 750 words/hour Rate $0.15/word $0.09/word Time 3.3 hours 1.3 hours Time savings: ~61% Cost $150 $90 Cost savings: ~40% The following data is based on a sample of 1000 words. Goals check: Time and Cost savings (30% ✔) Goals check: Quality (≤20 points and zero critical error ×) MS R1 MS R7 Systran R1 Systran R7 Minor (1) 38 24 29 25 Major (5) 24 10 9 22 Critical (10) 6 13 4 2 Total score 201 214 114 157
  • 8. We kept a positive attitude. We were optimistic despite low score - Detailed adjustments we made that helped improve the score: During training Added Animal official subtitle
  • 10. What did not go so well During File Prep During Training
  • 11. During File Prep • Failure to get high quality data Accepted: srt./ass. format Not accepted: idx./sub. format Conversion by Subtitle Edit???
  • 12. During File Prep (& PEMT) • Segments out of order (using Memsource) NMT → During PEMT / When assessing quality,
  • 13. During training • Low Bleu score 1. Segments fragmented and out of order Ted? Bleu score dropped significantly  Reasons: 1. Irrelevant content 2. Low quality At these high altitudes, the thin air is taking its toll…
  • 14. • Low Bleu score 3. Lack of metrics conversion
  • 15. Improvements & insights for future projects NMT feasibility Training data quality Management
  • 16. Improvements & insights for future projects • NMT feasibility check • Is it MT friendly? • Contain cognitive/emotional/abstract/literary concepts? • Cost
  • 17. Improvements & insights for future projects • Training data quality check • Translation quality • File format This Photo by Unknown Author is licensed under CC BY-NC
  • 18. Improvements & insights for future projects • Management • Training process education • Unified setting standards • Unified QA check standards
  • 19. Improvements & insights for future projects • Miscellaneous • Organized resource file • Take immediate actions

Editor's Notes

  1. during training, despite having relatively low blue scores, we kept a positive attitude. To be more specific, as you can see, around round 5, the blue score lowered significantly. We quickly found out that its because of new dataset we added in during that round so we made adjustments accordingly. We switch the big cat tale data from training to tuning since round 4 which helped. In round 6 and 7, we added a new series called animal, which has an official translated subtitle. We did see result in these changes and rond 7 has the higest blue score of all rounds.
  2. Our team also cooperated effectively on coming up with a QA table for PEMT. under error type and severity is a drop-down menu. after selecting from the menu, the yellow table on the bottom automatically calculates the score for each error type with minor error being 1, major error being 5, and critical error being 10. the green table on the right calculates how many errors there are based on the severity. we also communicated effectively and decided on a set of standards including what constitutes as a minor, major, and critical error