Catalan daily goes Catalan
Upcoming SlideShare
Loading in...5

Catalan daily goes Catalan






Total Views
Views on SlideShare
Embed Views



1 Embed 1 1


Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Catalan daily goes Catalan Catalan daily goes Catalan Presentation Transcript

    • Catalan daily goes Catalan LocWord 2012, A4 Magí Camps (La Vanguardia) Blanca Vidal (Lucy Software)
    • [1] Introduction, background Newspapers in Catalan Net Circulation 90.000 79.239 80.000 70.000 60.000 50.000 45.309 40.000 31.762 30.000 20.000 15.662 10.000 6.779 0Source: Estudi General de Mitjans (EGM), 2012
    • Introduction, backgroundResults Increase +4% of copies +7% of readers Distribution 57% Spanish 43% Catalan
    • Introduction, background Why a Catalan version? Celebration of LV’s 130 anniversary Normalization of the use of Catalan Investment to face the crisisOpportunity to consolidate LV’s hegemony
    • [2] Customer goalsTo publish two language Journalists should be editions of the same able to write in newspaper daily any (supplements incl.). of the two languages. Neither quality nor distribution timeframes should be affected.
    • Customer requirements • Tailor-made system • Complying with LV’s style guide • Seamless integration into journalist’s workflow MT • Translation of Hermes XML and InDesign formats • Reliability, high availability • High performance
    • [3] Ramp-up phaseProject set-upWork areas MT linguistic improvement/tuning Post-editing preparation MT system set-up and integration MT lexicon trainingDuration 8 months (+ 3 months)Staff LV: 10-12 in-house journalists Lucy: 3 computational linguists / lexicographers 1 software developer Incyta: 2 professional post-editorsImportant! On-site support
    • SubphasesTASKS Phase 1 Phase 2 Phase 3 Phase 4Linguistic improvement/tuning - Language-type definition x - Creation of a corpus of real texts x x x x - Analysis of the translation quality x x x x - Error reporting (lexicon and grammar errors) x x x x - Linguistic implementation (lex and grammar) x x x x - Pre and post-editing filters x x x xPost-editing preparation - Gathering of MT post-editing guidelines x - Evaluation of post-editing effort x x - Creation and training of the post-editing team xTechnical set-up - System set-up and integration x - Preparation of XML converters xMaintenance - Lexicon maintenance training xDuration 2 mo 3 mo 3 mo 3 mo
    • [a] Linguistic tuningLanguage model Corpus Translation quality (TQ) Analysis and error-reporting Implementation Accomplished improvement data
    • Linguistic tuning Catalan language model • no exclusion • compliant with standards • innovative in terminology • dynamic in syntactical structures Corpus • ES: 500,000 transl. units – 8,300,000 words • CA: 250,000 transl. units – 3,000,000 words
    • Linguistic tuning Translation Quality Medium Minimal post-edit post- 2% editing 24% Perfect 74%Conclusions• No specific domains (except Sports)• Culture: proper names• Opinion: idioms, plays on words• Errors not repetitive• % style to be post-edited
    • Linguistic tuning Analysis and error reporting • Semi-automatic detection of missing words • Terminology lists • New and different translations, error reporting Implementation • Proper names [44.5 % of the TUs ] • Idioms • Alternatives
    • Linguistic tuningAccomplished improvement data• Work in figures 40,000 lexicon entries (20,000 for each transl. direction) Around 440 grammar rules Around 7,200 words in the proper names files (each transl. dir)• Non-measurable work Understanding of the MT system Understanding of the newspaper specificities Support in the style guide taking into account MT• Improvement ES>CA 41% diff => 35% better , 4% similar, 2% worse CA>ES 36% diff => 32% better, 3% similar, 1% worse
    • [b] Post-editing
    • Post-editing Metrics on translation volume Metrics onSpecificities post-editing effort of the text Post-editors Post-editing workspace resources Error reporting process and tools Post-editing team and profile
    • Post-editing: metrics Total Lex/gram StyleFile translation units post-edition % post-edition %LV_2010-10-27 2,474 464 18.79% 394 15.96%(= 42.512 words) Conclusions • Different sections had different levels of post-editing • What style corrections could be avoided? • Post-editing speed: 1,000-1,500 words/h • Daily volume: 75,000 words • New post-editing team: 20 post-editors/12 editors
    • Post-editing: resources, workspace Post-editors Resources on should have Post-editing Adapt CMS to new Intranet language proficiency in their guide workflow portal skills BUT also Be trained on New Bilingual style Classified MT post-ed processing guide frequent MT errors status Have an Links to all integrated reference workspace dictionaries Reference Have document for New mark-ups training MT portal for resources any journalist at a click
    • Post-editing: resources, workspace La Vanguardia’s intranet: linguistic portal
    • Post-editing: error reporting, team Error reporting • Crucial for continuous improvement • Not automated (yet) • Provide better support to error reporting Definition of post-editing profile and team • Proficient in Catalan • Journalist background
    • [c] System integration During phase 1: pre-production • Pre-production set-up and installation • Hermes XML converter • Changes in the LT engine to translate InDesign files During phase 3: production • Production installation • Test (load, performance and stress) • Performance 500-1,200 w/sec • Definition of the final installation size
    • System integration Language HermesHermes InDesign portal InDesign Web Service Web Service Production Pre-production Maintenance• Production: balanced high performance (HP) and high availability (HA) configuration• System requirements: normal Windows Server -> low HW footprint (e.g. Dual Core/Quad 2.5-3 GHz, 2-4 GB RAM running Win Server 2003/2008)
    • [4] Operation: production process Staff Effort Timeline • 20 post-editors • 30’ linguistic review • Start 5 p.m. • 12 editors • 10’ journalistic review • First edition 11.30 p.m. • 70,000 words/day + suppl. • Second edition 2.30 a.m.
    • Operation: production process
    • [5] Next goalsSuccess! Yes.Thanks to• Close work and Next! cooperation • How to reduce• Three parties post-editing effort involved • How to re-use• Time and effort post-edited text investment• Customisation
    • Thank you for your attentionMagí Camps Blanca Vidal Ignasi NavarroLa Vanguardia Lucy Software Ibérica