SlideShare a Scribd company logo
1 of 24
Download to read offline
An	Empirical	Study:	
	
	Post-edi7ng	Effort	for	English	to	
Arabic	Hybrid	Machine	Transla7on	
Hassan	Sajjad,	Francisco	Guzman,	Stephan	Vogel	
Qatar	Compu7ng	Research	Ins7tute,	HBKU
Introduc7on	
•  Old	Arabic	documents	
•  Transla7on	of	
metadata	from	English	
to	Arabic
Tradi7onal	Transla7on	Process	
Translators
Translation
Company
British Library
TM
Problem	
•  Various	small	documents	
•  Fewer	overlap	at	sentence/segment	level	
	
•  Few	transla7on	memory	matches	
– A	lot	needs	to	be	translated	from	scratch	
•  Time	and	cost	inefficient
Solu7on:	Hybrid	Machine	Transla7on	
100%	recall	–	
readily	available	
transla7ons	
High	precision	
transla7ons	
TM CMT
Hybrid MT
Hybrid	MT:	Combines	the	benefits	of	both!	
Transla7on	Memory	and	Customized	MT
Hybrid	MT	System	
•  Transla7on	Memory	
– First	pass:	use	strict	matching	to	translate	
known	words	and	phrases	
	
•  Customized	Machine	Transla7on	
– Second	pass:	translate	the	remaining	text	using	
machine	transla7on	system		
	
TM
CMT
Aiming	higher:	Post	Edi7ng	for	Quality	
Post Editors
•  High quality
•  High consistency
•  Cost and time effective
TM CMT
Hybrid MT
Customized	Machine	Transla7on	
•  A	sta7s7cal	machine	transla7on	system	
– Train	specific	to	the	domain	of	the	text	that	needs	
to	be	translated	
•  General	prac7ce	
– Use	Moses	
– Train	on	the	data	of	transla7on	memory	
– Follow	recipe	of	a	compe77on	grade	system	to	
ensure	high	quality	
CMT
English	to	Arabic	CMT	
•  Best	compe77on	grade	pipeline	involves	
–  Arabic	(de-)	tokeniza7on	
•  Spling	morphologically	rich	words	into	smaller	segments	and	
vice-versa	
•  +1.5	BLEU	points	improvement	
–  Arabic	(de-)	normaliza7on	
•  Mapping	different	forms	of	a	leaer	to	one	form	and	vice	verse	
•  +0.5	BLEU	point	improvement	
	
This	ensures	high	quality	but	does	not	guarantee	less	
frustra7on	for	post-editors	
CMT
Why?	
Transla7on	output	requires:	
•  De-tokeniza7on	and	de-normaliza7on	
•  De-normaliza7on	introduces	character-level	
errors	
– Frustra7ng	for	the	post-editor	to	correct		
– Time	inefficient	
CMT
Recommended	Prac7ces	for	CMT	of		
English-Arabic	
•  Don’t	normalize	
But	
•  Always	tokenize	
– Improve	coverage	of	words	
– Beaer	transla7ons	
CMT
Let’s	Talk	about	BL	Case	Numbers!	
We	compare:	
•  Transla7on	Memory	(TM)	only	
•  Hybrid	MT	(TM	+	CMT)	
	
Also:	
•  Translator	
•  Hybrid	MT	+	Post	edi7ng	(PE)	
Looking	at:	
•  Effec7veness	
•  Quality	
•  Consistency
Data	
•  1000	documents	
– 90k	parallel	sentences/segments	
– 953	documents	for	training	
•  489k	tokens	
– Rest	for	tune	and	test
Effec7veness	of	TM	
Exact	match	
	
Fuzzy	match	
50%
segments
7%
words
84%
segments
13.5%
words
More than 85% of words still need to be translated !!!!
* Based on an assessment over X documents
BUT	
COVERS	
ONLY	
BUT	
COVERS	
ONLY
Effec7veness	of	CMT	
100%
segments
99.9%
words
AND	
translated!
Effec7veness	of	Hybrid	MT	
•  High	precision	
–  TM	exact	matches	
•  High	recall	
–  CMT	to	produce	high	quality	transla7ons
Assessing	Quality	
•  BLEU	
–  Compare	output	to	‘reference’	transla7on	
Strict	 Par7al	
TM	 7.07	 21.01	
TM	+	CMT	 54.60	 48.54	
CMT	alone	BLEU	scores	are	53.90
Assessing	Quality	
•  	TER:	Transla7on	Error	Rate	
–  	How	much	effort	is	needed	to	get	perfect	transla7on?	
–  	Compare	to	‘reference’	transla7on	
Hybrid MT can improve beyond that!!!
0%	 20%	 40%	 60%	 80%	 100%	
Percentage	of	effort	required	
Hybrid	MT	
TM
Assessing	Quality	
•  	TER	vs.	Post	edi7ng	effort	
– 	Similar	effort	es7ma7on	using	post-edi7ng	of	
Hybrid	MT	
0%	 20%	 40%	 60%	 80%	 100%	
Percentage	of	effort	required	
PE	on	Hybrid	MT	
Hybrid	MT	
TM	
* PE is based on an assessment over 4 documents, using a junior translator
Consistency	of	Hybrid	MT	
•  We	compared	Hybrid	MT	versus	a	junior	translator	
•  We	measured	consistency	with	reference		
transla7ons	
Hybrid MT is more consistent with reference translations
* Based on an assessment over 4 documents
0%	 10%	 20%	 30%	 40%	 50%	 60%	 70%	
Overlap	with	reference	transla7on	
Hybrid	MT	
Translator
Speedup	of	Hybrid	MT	
•  	We	compared	Hybrid	MT	versus	a	junior	
translator	
* Based on an assessment over 4 documents
Hybrid MT+PE is
30% more efficient
0	
20	
40	
60	
80	
100	
120	
Time	taken	to	translate	
(mins)	
Translator	
Hybrid	MT	+	PE
Conclusion	
•  Hybrid	MT	
– 	High	precision	and	high	recall	
•  Hybrid	MT	plus	Post-edi7ng	
– Efficient	in	terms	of	both	7me	and	cost	
– Improves	consistency	
•  Customized	MT	for	English-Arabic	
– Don’t	normalize	but	always	tokenize
References	
•  Ahmed	Abdelali,	Kareem	Darwish,	Nadir	Durrani,	and	Hamdy	Mubarak.	
Farasa:	A	Fast	and	Furious	Segmenter	for	Arabic.	In	NAACL-2016,	San	
Diego,	US.	
•  Philipp	Koehn,	Hieu	Hoang,	Alexandra	Birch,	Chris	Callison-Burch,	Marcello	
Federico,	Nicola	Bertoldi,	Brooke	Cowan,	Wade	Shen,	Chris7ne	Moran,	
Richard	Zens,	Chris	Dyer,	Ondrej	Bojar,	Alexandra	Constan7n,	and	Evan	
Herbst.	Moses:	Open	source	toolkit	for	sta7s7cal	machine	transla7on.	In	
ACL-2007,	Prague,	Czech	Republic	
•  Hassan	Sajjad,	Francisco	Guzman,	Preslav	Nakov,	Ahmed	Abdelali,	Kenton	
Murray,	Fahad	Al	Obaidli,	and	Stephan	Vogel.	QCRI	at	IWSLT	2013:	
Experiments	in	Arabic-English	and	English-Arabic	Spoken	Language	
Transla7on.	In	IWSLT-2013,	Heidelberg,	Germany
Thank	you

More Related Content

Similar to Post editing Effort for English to Arabic Hybrid Machine Translation

TAUS Roundtable Moscow, Translation Automation Going Cloud- The New Landscape...
TAUS Roundtable Moscow, Translation Automation Going Cloud- The New Landscape...TAUS Roundtable Moscow, Translation Automation Going Cloud- The New Landscape...
TAUS Roundtable Moscow, Translation Automation Going Cloud- The New Landscape...TAUS - The Language Data Network
 
COLT, the COntinuously Learning Translation System
COLT, the COntinuously Learning Translation SystemCOLT, the COntinuously Learning Translation System
COLT, the COntinuously Learning Translation SystemAdamWooten
 
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS - The Language Data Network
 
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015RIILP
 
Whoops! I Rewrote It in Rust
Whoops! I Rewrote It in RustWhoops! I Rewrote It in Rust
Whoops! I Rewrote It in RustScyllaDB
 
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...dclsocialmedia
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translationkhyati gupta
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translationkhyati gupta
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptxthanhdowork
 
Xtm webinar presentation xtm system overview
Xtm webinar presentation   xtm system overviewXtm webinar presentation   xtm system overview
Xtm webinar presentation xtm system overviewAndrzej Zydroń MBCS
 
Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies Sajan
 
Desired language characteristics – Data typing .pptx
Desired language characteristics – Data typing .pptxDesired language characteristics – Data typing .pptx
Desired language characteristics – Data typing .pptx4132lenin6497ram
 
Research Topics & Developments
Research Topics & DevelopmentsResearch Topics & Developments
Research Topics & Developmentsfreeaion
 

Similar to Post editing Effort for English to Arabic Hybrid Machine Translation (20)

TAUS Roundtable Moscow, Translation Automation Going Cloud- The New Landscape...
TAUS Roundtable Moscow, Translation Automation Going Cloud- The New Landscape...TAUS Roundtable Moscow, Translation Automation Going Cloud- The New Landscape...
TAUS Roundtable Moscow, Translation Automation Going Cloud- The New Landscape...
 
COLT, the COntinuously Learning Translation System
COLT, the COntinuously Learning Translation SystemCOLT, the COntinuously Learning Translation System
COLT, the COntinuously Learning Translation System
 
The Tipping Point
The Tipping PointThe Tipping Point
The Tipping Point
 
The tipping point
The tipping pointThe tipping point
The tipping point
 
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
 
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
 
Whoops! I Rewrote It in Rust
Whoops! I Rewrote It in RustWhoops! I Rewrote It in Rust
Whoops! I Rewrote It in Rust
 
Translation Memory
Translation MemoryTranslation Memory
Translation Memory
 
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
project present
project presentproject present
project present
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx
 
MT Use in Lingosail, by Yongpeng Wei, Lingosail
MT Use in Lingosail, by Yongpeng Wei, LingosailMT Use in Lingosail, by Yongpeng Wei, Lingosail
MT Use in Lingosail, by Yongpeng Wei, Lingosail
 
Xtm webinar presentation xtm system overview
Xtm webinar presentation   xtm system overviewXtm webinar presentation   xtm system overview
Xtm webinar presentation xtm system overview
 
Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies
 
Desired language characteristics – Data typing .pptx
Desired language characteristics – Data typing .pptxDesired language characteristics – Data typing .pptx
Desired language characteristics – Data typing .pptx
 
Research Topics & Developments
Research Topics & DevelopmentsResearch Topics & Developments
Research Topics & Developments
 

Recently uploaded

Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 

Recently uploaded (20)

Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 

Post editing Effort for English to Arabic Hybrid Machine Translation