Catalan daily goes Catalan

802 views

Published on

Published in: Business, Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
802
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Catalan daily goes Catalan

  1. 1. Catalan daily goes Catalan LocWord 2012, A4 Magí Camps (La Vanguardia) Blanca Vidal (Lucy Software)
  2. 2. [1] Introduction, background Newspapers in Catalan Net Circulation 90.000 79.239 80.000 70.000 60.000 50.000 45.309 40.000 31.762 30.000 20.000 15.662 10.000 6.779 0Source: Estudi General de Mitjans (EGM), 2012
  3. 3. Introduction, backgroundResults Increase +4% of copies +7% of readers Distribution 57% Spanish 43% Catalan
  4. 4. Introduction, background Why a Catalan version? Celebration of LV’s 130 anniversary Normalization of the use of Catalan Investment to face the crisisOpportunity to consolidate LV’s hegemony
  5. 5. [2] Customer goalsTo publish two language Journalists should be editions of the same able to write in newspaper daily any (supplements incl.). of the two languages. Neither quality nor distribution timeframes should be affected.
  6. 6. Customer requirements • Tailor-made system • Complying with LV’s style guide • Seamless integration into journalist’s workflow MT • Translation of Hermes XML and InDesign formats • Reliability, high availability • High performance
  7. 7. [3] Ramp-up phaseProject set-upWork areas MT linguistic improvement/tuning Post-editing preparation MT system set-up and integration MT lexicon trainingDuration 8 months (+ 3 months)Staff LV: 10-12 in-house journalists Lucy: 3 computational linguists / lexicographers 1 software developer Incyta: 2 professional post-editorsImportant! On-site support
  8. 8. SubphasesTASKS Phase 1 Phase 2 Phase 3 Phase 4Linguistic improvement/tuning - Language-type definition x - Creation of a corpus of real texts x x x x - Analysis of the translation quality x x x x - Error reporting (lexicon and grammar errors) x x x x - Linguistic implementation (lex and grammar) x x x x - Pre and post-editing filters x x x xPost-editing preparation - Gathering of MT post-editing guidelines x - Evaluation of post-editing effort x x - Creation and training of the post-editing team xTechnical set-up - System set-up and integration x - Preparation of XML converters xMaintenance - Lexicon maintenance training xDuration 2 mo 3 mo 3 mo 3 mo
  9. 9. [a] Linguistic tuningLanguage model Corpus Translation quality (TQ) Analysis and error-reporting Implementation Accomplished improvement data
  10. 10. Linguistic tuning Catalan language model • no exclusion • compliant with standards • innovative in terminology • dynamic in syntactical structures Corpus • ES: 500,000 transl. units – 8,300,000 words • CA: 250,000 transl. units – 3,000,000 words
  11. 11. Linguistic tuning Translation Quality Medium Minimal post-edit post- 2% editing 24% Perfect 74%Conclusions• No specific domains (except Sports)• Culture: proper names• Opinion: idioms, plays on words• Errors not repetitive• % style to be post-edited
  12. 12. Linguistic tuning Analysis and error reporting • Semi-automatic detection of missing words • Terminology lists • New and different translations, error reporting Implementation • Proper names [44.5 % of the TUs ] • Idioms • Alternatives
  13. 13. Linguistic tuningAccomplished improvement data• Work in figures 40,000 lexicon entries (20,000 for each transl. direction) Around 440 grammar rules Around 7,200 words in the proper names files (each transl. dir)• Non-measurable work Understanding of the MT system Understanding of the newspaper specificities Support in the style guide taking into account MT• Improvement ES>CA 41% diff => 35% better , 4% similar, 2% worse CA>ES 36% diff => 32% better, 3% similar, 1% worse
  14. 14. [b] Post-editing
  15. 15. Post-editing Metrics on translation volume Metrics onSpecificities post-editing effort of the text Post-editors Post-editing workspace resources Error reporting process and tools Post-editing team and profile
  16. 16. Post-editing: metrics Total Lex/gram StyleFile translation units post-edition % post-edition %LV_2010-10-27 2,474 464 18.79% 394 15.96%(= 42.512 words) Conclusions • Different sections had different levels of post-editing • What style corrections could be avoided? • Post-editing speed: 1,000-1,500 words/h • Daily volume: 75,000 words • New post-editing team: 20 post-editors/12 editors
  17. 17. Post-editing: resources, workspace Post-editors Resources on should have Post-editing Adapt CMS to new Intranet language proficiency in their guide workflow portal skills BUT also Be trained on New Bilingual style Classified MT post-ed processing guide frequent MT errors status Have an Links to all integrated reference workspace dictionaries Reference Have document for New mark-ups training MT portal for resources any journalist at a click
  18. 18. Post-editing: resources, workspace La Vanguardia’s intranet: linguistic portal
  19. 19. Post-editing: error reporting, team Error reporting • Crucial for continuous improvement • Not automated (yet) • Provide better support to error reporting Definition of post-editing profile and team • Proficient in Catalan • Journalist background
  20. 20. [c] System integration During phase 1: pre-production • Pre-production set-up and installation • Hermes XML converter • Changes in the LT engine to translate InDesign files During phase 3: production • Production installation • Test (load, performance and stress) • Performance 500-1,200 w/sec • Definition of the final installation size
  21. 21. System integration Language HermesHermes InDesign portal InDesign Web Service Web Service Production Pre-production Maintenance• Production: balanced high performance (HP) and high availability (HA) configuration• System requirements: normal Windows Server -> low HW footprint (e.g. Dual Core/Quad 2.5-3 GHz, 2-4 GB RAM running Win Server 2003/2008)
  22. 22. [4] Operation: production process Staff Effort Timeline • 20 post-editors • 30’ linguistic review • Start 5 p.m. • 12 editors • 10’ journalistic review • First edition 11.30 p.m. • 70,000 words/day + suppl. • Second edition 2.30 a.m.
  23. 23. Operation: production process
  24. 24. [5] Next goalsSuccess! Yes.Thanks to• Close work and Next! cooperation • How to reduce• Three parties post-editing effort involved • How to re-use• Time and effort post-edited text investment• Customisation
  25. 25. Thank you for your attentionMagí Camps Blanca Vidal Ignasi NavarroLa Vanguardia Lucy Software Ibérica Incytamcamps@lavanguardia.es blanca.vidal@lucysoftware.com Ignasi_navarro@incyta.comwww.lavanguardia.es www.lucysoftware.com www.incyta.com

×