An MT Case Study:Breaking into Latin American Marketson a Small BudgetMaría Azqueta (SeproTec) & Diego Bartolomé (tauyou)
Spanish WorldwideSpanish Language:• Also known as Castellano.• Latin-derived Romance language.• Spanish is one of the six ...
Spanish Worldwide
Spanish Worldwide0 200 400 600 800 1000 1200Mandarin ChineseSpanishEnglishHindi/Urdu407 million311 million955 million360 m...
Spanish Worldwide• For demographic reasons, the percentage of theorld’s populatio that speaks Spa ish as a ati elanguage i...
Spanish on the Internet• Spanish is the third most widely used language onthe Net.• The use of Spanish on the Net has expe...
Spanish Worldwide and its DifferencesHigh demand for translations into Spanish.But… is the same Spanish spokeneverywhere?
Spanish Worldwide and its DifferencesRAE (Royal Spanish Academy) :– Created in the 18th century, it is widely seen asthe a...
Spanish Worldwide and its DifferencesLexicalvariationsGrammaticaldifferencesIdiomsDifferent dialects and many differences:
Spanish Worldwide and its Differences‘Neutral’ or‘International’SpanishLatin AmericanSpanish &EuropeanSpanishMarket Trend:
Why Adapt to theLocal Spanish of Each Country?To reach different marketsPeople are most likely to buy when a product isadv...
Why Adapt to theLocal Spanish of Each Country?EN: Take a card from the deckES: Coge una carta de la barajaClient A (Gaming...
Why Adapt to theLocal Spanish of Each Country?ES: Coge una carta de la barajaAR: Agarrá una carta del mazoCL: Toma una car...
Coger (32 entries)http://rae.es/rae.html1.tr. Asir, agarrar o tomar. U. t. c. prnl.31. intr. vulg. Am. Realizar el acto se...
Advise ClientsIf you really want to break into a specificmarket, you must decide which countryyou want to target and local...
The Main Problems Clients Face
Is there a cost-efficient solutionon the market?
tauyou MT Solution at SeproTecHybrid machine translation since January 2011La guages: EN, ES, PT, GA, FR, IT…Do ai s: Lega...
Initial BrainstormingMT fromEN > different ES dialectsExtensive post-editingwould be required
Final Scope of the ProjectHuman translation + revisionEnglish > Spanish (Spain)MT of Spanish (Spain) intoSpanish from:• Ar...
Initial Approach for Latin American MTTraditional Workflow. Gather tra slatio e ories (EN → ES-XX)2. Add generic material3...
DrawbacksVarying MT QualityDepending on the domain and dialectInitial Inconsistencies among DialectsHandled with glossarie...
New ApproachTranslate EN to Standard ESVia standard high-quality human translationConvert Standard ES to Latin American Va...
SpecificationsCountriesArgentina, Chile, Colombia, Mexico, Puerto RicoInternal Glossaries to Handle Lexical VariationsIt c...
Testing the Prototype EngineExtraction of several texts (fashion, real-estate, human resources, automobile)Sent to linguis...
First Bug ReportNot all termswere localizedConcordanceissues(masc./fem.;sing./pl.)Verbal tensesfor ArgentinaHuman vs. Mach...
First Bug ReportSome terms were changed/localized by theengine, but not by the humans.(example)Human error or MT error?
Testing the Prototype EngineA glossary was created byextracting the terms localized by thelinguists/translators.This gloss...
Testing the Prototype EngineThe glossary grew by 36.91%!
Testing the Prototype EnginePeople can miss things.Although many different variants of Spanishexist, Spanish speakers unde...
Latest Bug ReportMT: 1.21% error rate
AchievementsVery little post-editing neededReduced error rateShortened deadlinesSignificant cost reduction
ConclusionsHuman localization is not perfect.MT is not perfect either.Combining human and machine translationhelps achieve...
Further WorkImproving GlossariesThrough a simple web interface for PEExtending Spanish Language CoverageMore dialectsTradu...
BibliographyYule, G. (2006). The Study of Language: ThirdEdition, Cambridge University New York.RAEInstituto Cervanteshttp...
THANK YOU FORYOUR TIME!María Azquetamazqueta@seprotec.comDiego Bartolomédiego.bartolome@tauyou.com
2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget
Upcoming SlideShare
Loading in …5
×

2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget

369 views
224 views

Published on

The Latin American market is composed of a mix of various Spanish dialects. If a company really wants to reach a specific audience in Latin America, it must use the right dialect. But how is it possible to translate marketing materials into four or five Spanish dialects without dramatically increasing costs? This session will discuss how a joint effort to create an MT engine for translating international Spanish into specific Latin American dialects (Spanish for Argentina, Chile, Columbia, Mexico, and Puerto Rico) made this challenge feasible, economical, and replicable.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
369
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget

  1. 1. An MT Case Study:Breaking into Latin American Marketson a Small BudgetMaría Azqueta (SeproTec) & Diego Bartolomé (tauyou)
  2. 2. Spanish WorldwideSpanish Language:• Also known as Castellano.• Latin-derived Romance language.• Spanish is one of the six official languages ofthe United Nations and an official language ofthe European Union.
  3. 3. Spanish Worldwide
  4. 4. Spanish Worldwide0 200 400 600 800 1000 1200Mandarin ChineseSpanishEnglishHindi/Urdu407 million311 million955 million360 millionSecond most spoken language by number of native speakers
  5. 5. Spanish Worldwide• For demographic reasons, the percentage of theorld’s populatio that speaks Spa ish as a ati elanguage is increasing, while the percentage ofChinese and English speakers is decreasing.• Withi three or four ge eratio s, % of the orld’spopulation will communicate in Spanish.• I 5 , the U ited States ill e the orld’sforemost Spanish speaking country.
  6. 6. Spanish on the Internet• Spanish is the third most widely used language onthe Net.• The use of Spanish on the Net has experienced agrowth rate of 807.4% between 2000 and 2011.• Spain and Mexico are among the 20 countries withthe highest number of internet users.• The demand for documents in Spanish is the fourthlargest fro a o g the orld’s la guages.
  7. 7. Spanish Worldwide and its DifferencesHigh demand for translations into Spanish.But… is the same Spanish spokeneverywhere?
  8. 8. Spanish Worldwide and its DifferencesRAE (Royal Spanish Academy) :– Created in the 18th century, it is widely seen asthe arbiter of what is considered standardSpanish.– It produces authoritative dictionaries andgrammar guides.– Although its decisions are not formally binding,they are widely followed in both Spain and LatinAmerica.
  9. 9. Spanish Worldwide and its DifferencesLexicalvariationsGrammaticaldifferencesIdiomsDifferent dialects and many differences:
  10. 10. Spanish Worldwide and its Differences‘Neutral’ or‘International’SpanishLatin AmericanSpanish &EuropeanSpanishMarket Trend:
  11. 11. Why Adapt to theLocal Spanish of Each Country?To reach different marketsPeople are most likely to buy when a product isadvertised in their dialect
  12. 12. Why Adapt to theLocal Spanish of Each Country?EN: Take a card from the deckES: Coge una carta de la barajaClient A (Gaming Industry)
  13. 13. Why Adapt to theLocal Spanish of Each Country?ES: Coge una carta de la barajaAR: Agarrá una carta del mazoCL: Toma una carta del naipeCO: Coge una carta de la barajaMX: Saca una carta de la barajaPR: Coge una carta de la baraja
  14. 14. Coger (32 entries)http://rae.es/rae.html1.tr. Asir, agarrar o tomar. U. t. c. prnl.31. intr. vulg. Am. Realizar el acto sexualWhy Adapt to theLocal Spanish of Each Country?
  15. 15. Advise ClientsIf you really want to break into a specificmarket, you must decide which countryyou want to target and localize yourmaterial for the different Spanish dialectsspoken in each individual country.
  16. 16. The Main Problems Clients Face
  17. 17. Is there a cost-efficient solutionon the market?
  18. 18. tauyou MT Solution at SeproTecHybrid machine translation since January 2011La guages: EN, ES, PT, GA, FR, IT…Do ai s: Legal, Te h i al…Glossaries and forbidden words listsAverage translated words per month: 700,000
  19. 19. Initial BrainstormingMT fromEN > different ES dialectsExtensive post-editingwould be required
  20. 20. Final Scope of the ProjectHuman translation + revisionEnglish > Spanish (Spain)MT of Spanish (Spain) intoSpanish from:• Argentina• Chile• Colombia• Mexico• Puerto Rico
  21. 21. Initial Approach for Latin American MTTraditional Workflow. Gather tra slatio e ories (EN → ES-XX)2. Add generic material3. Develop engine4. Add linguistic pre- and post-processing5. Improve quality over time
  22. 22. DrawbacksVarying MT QualityDepending on the domain and dialectInitial Inconsistencies among DialectsHandled with glossariesMedium Post-Editing EffortCould be improved over time
  23. 23. New ApproachTranslate EN to Standard ESVia standard high-quality human translationConvert Standard ES to Latin American VariantsFrom Spanish to SpanishBetter final quality is achieved
  24. 24. SpecificationsCountriesArgentina, Chile, Colombia, Mexico, Puerto RicoInternal Glossaries to Handle Lexical VariationsIt corrects discordanceIdiomsGrammatical DifferencesIt adapts verb tenses
  25. 25. Testing the Prototype EngineExtraction of several texts (fashion, real-estate, human resources, automobile)Sent to linguists and/or translators ineach target country for localizationPerformance of the same localizationsby the engineComparison and contrasting of humanand machine localization results
  26. 26. First Bug ReportNot all termswere localizedConcordanceissues(masc./fem.;sing./pl.)Verbal tensesfor ArgentinaHuman vs. MachineMT: 7.78 % error rate
  27. 27. First Bug ReportSome terms were changed/localized by theengine, but not by the humans.(example)Human error or MT error?
  28. 28. Testing the Prototype EngineA glossary was created byextracting the terms localized by thelinguists/translators.This glossary was then sent tothe same people who localizedthe texts to verify that all theterms were correctly localizedand nothing was missing.
  29. 29. Testing the Prototype EngineThe glossary grew by 36.91%!
  30. 30. Testing the Prototype EnginePeople can miss things.Although many different variants of Spanishexist, Spanish speakers understand manyterms that are foreign to their own dialectwhen they read them in context,sometimes to the point of accepting themas their own. I believe that this may bedue to the phenomenon of globalizationand the internet.
  31. 31. Latest Bug ReportMT: 1.21% error rate
  32. 32. AchievementsVery little post-editing neededReduced error rateShortened deadlinesSignificant cost reduction
  33. 33. ConclusionsHuman localization is not perfect.MT is not perfect either.Combining human and machine translationhelps achieve high quality and reduce cost.
  34. 34. Further WorkImproving GlossariesThrough a simple web interface for PEExtending Spanish Language CoverageMore dialectsTraductor.cervantes.esIncorporating more languagesEnglish, French and Portuguese
  35. 35. BibliographyYule, G. (2006). The Study of Language: ThirdEdition, Cambridge University New York.RAEInstituto Cervanteshttp://www.linguapress.com
  36. 36. THANK YOU FORYOUR TIME!María Azquetamazqueta@seprotec.comDiego Bartolomédiego.bartolome@tauyou.com

×