Catalan daily goes Catalan       LocWord 2012, A4  Magí Camps (La Vanguardia)  Blanca Vidal (Lucy Software)
[1] Introduction, background                                                Newspapers in Catalan                         ...
Introduction, backgroundResults                      Increase                      +4% of copies                      +7% ...
Introduction, background        Why a Catalan version?   Celebration of LV’s 130 anniversary  Normalization of the use of ...
[2] Customer goalsTo publish two language    Journalists should be  editions of the same         able to write in    newsp...
Customer requirements           • Tailor-made system           • Complying with LV’s style guide           • Seamless inte...
[3] Ramp-up phaseProject set-upWork areas       MT linguistic improvement/tuning                 Post-editing preparation ...
SubphasesTASKS                                                  Phase 1   Phase 2   Phase 3   Phase 4Linguistic improvemen...
[a] Linguistic tuningLanguage model                Corpus  Translation quality (TQ)                         Analysis and  ...
Linguistic tuning Catalan language model • no exclusion • compliant with standards • innovative in terminology • dynamic i...
Linguistic tuning           Translation Quality               Medium Minimal               post-edit  post-               ...
Linguistic tuning Analysis and error reporting • Semi-automatic detection of missing words • Terminology lists • New and d...
Linguistic tuningAccomplished improvement data• Work in figures        40,000 lexicon entries (20,000 for each transl. dir...
[b] Post-editing
Post-editing             Metrics on         translation volume                         Metrics onSpecificities            ...
Post-editing: metrics                               Total     Lex/gram                     StyleFile               transla...
Post-editing: resources, workspace    Post-editors                                                            Resources on...
Post-editing: resources, workspace       La Vanguardia’s intranet: linguistic portal
Post-editing: error reporting, team     Error reporting     • Crucial for continuous improvement     • Not automated (yet)...
[c] System integration   During phase 1: pre-production   • Pre-production set-up and installation   • Hermes XML converte...
System integration                         Language                HermesHermes      InDesign                           po...
[4] Operation: production process  Staff               Effort                        Timeline  • 20 post-editors   • 30’ l...
Operation: production process
[5] Next goalsSuccess! Yes.Thanks to• Close work and                     Next!  cooperation        • How to reduce• Three ...
Thank you for your attentionMagí Camps               Blanca Vidal                    Ignasi NavarroLa Vanguardia          ...
Upcoming SlideShare
Loading in...5
×

Catalan daily goes Catalan

699

Published on

Published in: Business, Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
699
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Catalan daily goes Catalan

  1. 1. Catalan daily goes Catalan LocWord 2012, A4 Magí Camps (La Vanguardia) Blanca Vidal (Lucy Software)
  2. 2. [1] Introduction, background Newspapers in Catalan Net Circulation 90.000 79.239 80.000 70.000 60.000 50.000 45.309 40.000 31.762 30.000 20.000 15.662 10.000 6.779 0Source: Estudi General de Mitjans (EGM), 2012
  3. 3. Introduction, backgroundResults Increase +4% of copies +7% of readers Distribution 57% Spanish 43% Catalan
  4. 4. Introduction, background Why a Catalan version? Celebration of LV’s 130 anniversary Normalization of the use of Catalan Investment to face the crisisOpportunity to consolidate LV’s hegemony
  5. 5. [2] Customer goalsTo publish two language Journalists should be editions of the same able to write in newspaper daily any (supplements incl.). of the two languages. Neither quality nor distribution timeframes should be affected.
  6. 6. Customer requirements • Tailor-made system • Complying with LV’s style guide • Seamless integration into journalist’s workflow MT • Translation of Hermes XML and InDesign formats • Reliability, high availability • High performance
  7. 7. [3] Ramp-up phaseProject set-upWork areas MT linguistic improvement/tuning Post-editing preparation MT system set-up and integration MT lexicon trainingDuration 8 months (+ 3 months)Staff LV: 10-12 in-house journalists Lucy: 3 computational linguists / lexicographers 1 software developer Incyta: 2 professional post-editorsImportant! On-site support
  8. 8. SubphasesTASKS Phase 1 Phase 2 Phase 3 Phase 4Linguistic improvement/tuning - Language-type definition x - Creation of a corpus of real texts x x x x - Analysis of the translation quality x x x x - Error reporting (lexicon and grammar errors) x x x x - Linguistic implementation (lex and grammar) x x x x - Pre and post-editing filters x x x xPost-editing preparation - Gathering of MT post-editing guidelines x - Evaluation of post-editing effort x x - Creation and training of the post-editing team xTechnical set-up - System set-up and integration x - Preparation of XML converters xMaintenance - Lexicon maintenance training xDuration 2 mo 3 mo 3 mo 3 mo
  9. 9. [a] Linguistic tuningLanguage model Corpus Translation quality (TQ) Analysis and error-reporting Implementation Accomplished improvement data
  10. 10. Linguistic tuning Catalan language model • no exclusion • compliant with standards • innovative in terminology • dynamic in syntactical structures Corpus • ES: 500,000 transl. units – 8,300,000 words • CA: 250,000 transl. units – 3,000,000 words
  11. 11. Linguistic tuning Translation Quality Medium Minimal post-edit post- 2% editing 24% Perfect 74%Conclusions• No specific domains (except Sports)• Culture: proper names• Opinion: idioms, plays on words• Errors not repetitive• % style to be post-edited
  12. 12. Linguistic tuning Analysis and error reporting • Semi-automatic detection of missing words • Terminology lists • New and different translations, error reporting Implementation • Proper names [44.5 % of the TUs ] • Idioms • Alternatives
  13. 13. Linguistic tuningAccomplished improvement data• Work in figures 40,000 lexicon entries (20,000 for each transl. direction) Around 440 grammar rules Around 7,200 words in the proper names files (each transl. dir)• Non-measurable work Understanding of the MT system Understanding of the newspaper specificities Support in the style guide taking into account MT• Improvement ES>CA 41% diff => 35% better , 4% similar, 2% worse CA>ES 36% diff => 32% better, 3% similar, 1% worse
  14. 14. [b] Post-editing
  15. 15. Post-editing Metrics on translation volume Metrics onSpecificities post-editing effort of the text Post-editors Post-editing workspace resources Error reporting process and tools Post-editing team and profile
  16. 16. Post-editing: metrics Total Lex/gram StyleFile translation units post-edition % post-edition %LV_2010-10-27 2,474 464 18.79% 394 15.96%(= 42.512 words) Conclusions • Different sections had different levels of post-editing • What style corrections could be avoided? • Post-editing speed: 1,000-1,500 words/h • Daily volume: 75,000 words • New post-editing team: 20 post-editors/12 editors
  17. 17. Post-editing: resources, workspace Post-editors Resources on should have Post-editing Adapt CMS to new Intranet language proficiency in their guide workflow portal skills BUT also Be trained on New Bilingual style Classified MT post-ed processing guide frequent MT errors status Have an Links to all integrated reference workspace dictionaries Reference Have document for New mark-ups training MT portal for resources any journalist at a click
  18. 18. Post-editing: resources, workspace La Vanguardia’s intranet: linguistic portal
  19. 19. Post-editing: error reporting, team Error reporting • Crucial for continuous improvement • Not automated (yet) • Provide better support to error reporting Definition of post-editing profile and team • Proficient in Catalan • Journalist background
  20. 20. [c] System integration During phase 1: pre-production • Pre-production set-up and installation • Hermes XML converter • Changes in the LT engine to translate InDesign files During phase 3: production • Production installation • Test (load, performance and stress) • Performance 500-1,200 w/sec • Definition of the final installation size
  21. 21. System integration Language HermesHermes InDesign portal InDesign Web Service Web Service Production Pre-production Maintenance• Production: balanced high performance (HP) and high availability (HA) configuration• System requirements: normal Windows Server -> low HW footprint (e.g. Dual Core/Quad 2.5-3 GHz, 2-4 GB RAM running Win Server 2003/2008)
  22. 22. [4] Operation: production process Staff Effort Timeline • 20 post-editors • 30’ linguistic review • Start 5 p.m. • 12 editors • 10’ journalistic review • First edition 11.30 p.m. • 70,000 words/day + suppl. • Second edition 2.30 a.m.
  23. 23. Operation: production process
  24. 24. [5] Next goalsSuccess! Yes.Thanks to• Close work and Next! cooperation • How to reduce• Three parties post-editing effort involved • How to re-use• Time and effort post-edited text investment• Customisation
  25. 25. Thank you for your attentionMagí Camps Blanca Vidal Ignasi NavarroLa Vanguardia Lucy Software Ibérica Incytamcamps@lavanguardia.es blanca.vidal@lucysoftware.com Ignasi_navarro@incyta.comwww.lavanguardia.es www.lucysoftware.com www.incyta.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×