2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Uploaded on

Alain Désilets's talk at the Translation CrowdSourcing workshop organized by University of Maryland in June 2010

Alain Désilets's talk at the Translation CrowdSourcing workshop organized by University of Maryland in June 2010

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • In the next 45 mins, I will be telling you what I know about translators attitudes and workpractices w.r.t. collaborative and crowdsourced translation. Anchored in two personal perspective: (a) Observing professional translators for 5-6 years and (b) “Preaching” the wiki gospell to professional translators for 4 years.
  • Joint work with Université du Québec en Outaouais’ Language Studies department. Multidisciplinary team (technology and translation researchers)


  • 1. Translation and Crowd Sourcing: Opportunity or Heresy? Professional translators’ attitudes towards massive collaboration Alain Désilets Conseil national de recherches du Canada
  • 2. “ The most reliable way to forecast the future is to try to understand the present.” “ Trends, like horses, are easier to ride in the direction they are going.” -- John Naisbitt "You have to talk to [customers], watch them; this is the only way to understand their interests, their motives, their needs". -- Donald Norman
  • 3. Observing Translators
    • Multi-disciplinary project that includes technology researchers from NRC and translation studies researchers from Université du Québec en Outaouais.
    • Contextual Inquiry: well known and tried technique in Human Computer Interaction for learning about end users.
    • Mix between observation and interviewing.
    • Observe potential end users while they work.
    • Ask them to think aloud.
    • Interrupt with lots of questions.
    • Use Qualitative and Quantitative data Analysis to make sense of what you witnessed.
  • 4. 25 subjects Organization type large (250+ employees) LSPs (13), medium (<30 employees) LSPs (6), freelance (2), academic (2), amateur(2) Type of work “ conventional” translation (15), MT-Post Editing (8), Revision (2) Language pairs English -French (13),English-Spanish (6), English-Japanese (2) , Portuguese-Spanish (1), Chinese-English (1), English+Italian-Estonian (1), English-Inuktitut (1) Years of experience Ranged from < 2 years up to 20+ years. Source text domain Aboriginal affairs, Municipal affairs, Public administration, Education, Legal, Health, Software manual, Politics, Job offers. Source text length Min: ~20, Max:7000 Country Canada (15), Europe (5), US (3), Japan (2) Professional translators All but 2.
  • 5. Preaching the Wiki Word to Translators
    • Involved in world of wikis since 2002
    • Chaired WikiSym conference in 2007 (Montreal)
    • Have been telling professional translators about wikis since 2006
    • Keynote at Translating and the Computer 2007: “Translation Wikified” .
    • Organized a workshop and panel on those topics.
    • Co-implemented wiki-based tools to support translation work
    • Cross lingual wiki engine: translate in a wiki context where pre-conditions of traditional translation workflows do not apply (ex: master language).
    • Tiki-CMT: TikiWiki module to support Collaborative Multilingual Terminology work.
  • 6. Talk Outline
    • Are professionnal translators technology averse?
    • Professional translators attitudes and workpractices with respect to:
    • collaboratively built linguistic resources
    • collaborative translation and crowdsourcing
    • Please interrupt with questions at any point!
  • 7. Are professional translators technology averse?
  • 8. Translation Problems
    • Translators use a lot of technology when trying to resolve translation problems.
    • Translation problem =
      • any source language word or expression which presents a difficulty for a human translator (not machine) during the process of translation.
      • Term, idiomatic expression, named entity, etc...
  • 9. Tools, Tools, more Tools! Average of 10 resources in our subjects’ toolboxes!!!
    • private (to the individual) lexicons built using simple office suites (ex: Excel spreadsheets, MS-Word documents)
    • 1 large, public general purpose bi-text (TransSearch)
    • private (to the individual) or institutional Translation Memories built with 3 different products (Trados, Multitrans, LogiTerm)
    • 2 private (to the individual) or institutional, unaligned archives of previous translations, either stored in a database or the file system
    • 9 unilingual general purpose dictionaries (Multidictionnaire, Petit Robert, Merriam-Webster, Dictionnaire des cooccurrences, dictionary.references.com, Canadian Oxford, Trésor de la langue française, www.dictionary.com, urban dictionary)
    • 2 unilingual thesauri (Dictionnaire analogique, Dictionnaire des synonymes de l'Université de Caen)
    • 2 unilingual specialized dictionaries and lexicons (Dictionnaire de droit québécois, Lexique des noms géographiques)
    • 3 bilingual dictionaries (LexibasePro, René Merteens, Robert & Collins)
    • the source text being translated, as well as its partial translation
    • bilingual documents related to the source text (ex: minutes of meetings being discussed in the source text)2 instances of the client's Web sites
    • 2 large, bilingual Web sites not directly related to the domain of the source text (gc.ca domain, Canadian Broadcasting Corporation)
    • 3 large, bilingual Web sites directly related to the domain of the source text (CanLII, Canadian Federal Court, University of Ottawa)
    • the whole Web in the source or target language (mined using Google search engine)
    • 2 manuals of style (Guide du rédacteur, Le Ramat de la typographie)
    • 2 spell and grammar checkers (MS-Word, Antidote)
    • 1 database of newspaper articles in the target language
  • 10. Tools, Tools, more Tools! (2) Translators use a wide range of tools when resolving translation problems. Vast majority of those are electronic. p < 0.001
  • 11. Adoption of Corpus-Based Tools
    • Both type equally used.
    • Corpus-based tools have made it into the mainstream.
    • But they have not displaced Termino-lexicographic tools.
    • Termino-lexicographic =
    • Dictionary
    • Terminology Database
    • Lexicon, etc.
    • Corpus-based =
    • Translation Memory
    • Bilingual web site, etc.
    p > 0.05
  • 12. Advanced Google Use
    • Translators are among the world’s most advanced Google users.
    • They know the advanced syntax, and expect it in most search tools they use.
    • They use Google in various ways to mine the web-as-a-corpus
      • Ex: search bilingual sites for solutions, assessing usage in target language of particular solutions.
  • 13. Searching Bilingual Sites
  • 14. Searching Bilingual Sites (2)
  • 15. Hot Buttons
    • That said, translators strongly resist technology that either:
    • disrupts the fair compensation equation , or
    • exerts downward pressure on quality of end product
    • Translation crowdsourcing is likely to press on both these buttons.
  • 16. Fair Compensation Equation
    • Translators are paid by the word.
    • Technology that increases productivity exerts strong downward pressure on per-word rate
    • Pressure is not always commensurate with actual productivity gain .
    • Example:
    • 10 words sentence with 80% fuzzy match level.
    • Should translator only get paid for two words?
    • Eventhough she still has to read the whole sentence...
    • ... and may have to change the rest of the sentence to make it work with the translation of those 2 words?
    • Once a new fair equilibrium has been reached, this initial resistance may go away.
  • 17. Lowering Quality?
    • Translators
      • Craftspeople who take pride in the quality of their end product.
      • Quality = original sense is rendered, AND translation reads as though it was an original text written by a native speaker .
    • Customers
      • Translation = cost center, not part of their core business
      • Can’t always tell quality when they see it, nor measure clear link between translation quality and bottom line.
      • liable to introduce cost-reducing technologies without realizing impact on quality.
  • 18. Attitudes and workpractices with respect to collaboratively built linguistic resources
  • 19. Is Wikipedia Useful for Translators?
    • We witnessed very little use of Wikipedia in our translator observation.
    • On a few occasions, subjects consulted Wikipedia to get background information on a particular concept, but never to get a solution to a terminology difficulty.
    • Analysis conducted in June 2007 indicates that coverage of typical terminology difficulties may be insufficient for the later task (finding equivalents).
  • 20. Is Wikipedia Useful for Translators (2)?
    • Wikipedia’s coverage of 42 observed terminology problems (June 2007)
    Note: TERMIUM = Terminology DB of the Gov. Of Canada. WikiPedia Wiktionary TERMIUM Has English entry 71.4% 47.6% 80.1% Has English entry in right sense 57.1% 45.2% 76.2% Has French equivalent 33.3 % 35.7% 76.2% Has French equivalent in correct sense 26.1 % 33.3% 76.2%
  • 21. Is Wikipedia Useful for Translators (3)?
    • Evolution of translators attitudes towards Wikipedia and wikis:
    • 4 yrs ago: “Wikipedia, what’s that?”
    • 3 yrs ago: “I know about Wikipedia and I think it’s crap because any clown can write to it.”
    • 2 yrs ago: “ You know, Wikipedia is surprisingly good and I use it all the time in my work now.”
    • 1 yr ago: “This collaborative, wiki stuff is bound to be important for translation, but I am not sure how best to leverage it.”
  • 22. What Makes a Good Source?
    • Professional Translators are trained to:
    • only use trusted sources.
    • focus on sources that are specialised for their domain or client.
    • never use content that may have been translated, or written by non-native speakers.
    • In theory, that would rule out most collaborative sources.
    • In practice, translators are pragmatic and will consult sources that do not meet those criteria when necessary.
  • 23. Use of Public Sources
    • Our subjects used significantly more public resources.
    • Many of them available for free (ex: customer’s web site).
    • Caveat: Situation is different for highly repetitive, technical translation.
    p < 0.001
    • Public =
      • Anyone can access, possibly at a fee.
    • Private =
      • Only accessible to certain translators (ex: those working for particular employer)
  • 24. Use of General Sources
    • Our subjects used significantly more multidomain resources.
    • Seemed to prefer casting a wide net, and then sift the results.
    • Caveat: again here, situation is different for highly repetitive, technical translation.
    p < 0.05
    • Multidomain =
      • Covers multiple domains, and subject searched it without restricting domain
    • Single domain =
      • Covers single domain, or, covers many, but subject restricted search by domain
  • 25. Use of Translated Material Our subjects frequently searched in bilingual Canadian sites for French equivalents. Estimated 75% of French content on those sites was translated. Thus, in 75% of the case, this strategy ended up yielding solutions taken from translated material. Frowned upon in Terminology, and, to a lesser extent in Translation. But our subjects did it anyway. Here’s why…
  • 26. Translator Jugement
    • Subjects exercised a lot of critical judgment w.r.t to resources.
    • Did not blindingly trust any source, even highly reputed ones like TERMIUM (Terminology DB of the Gov. of Canada).
    • In 35% of the cases, searched in a second resource, after finding some relevant information.
    • Subjects adept at rapidly scanning list of suggestions and sifting grain from chaff.
    • Problem Coverage (i.e. probability that at least one relevant solution found in top 10), seemed more important than Precision (i.e. probability that a proposed solution is relevant).
    • Recall (i.e. percentage of all relevant solutions that is actually proposed by the resource) also seemed important, but to a much lesser degree.
  • 27. Resources Quality Control
    • Our subjects preferred more tightly controlled resources.
    • But still made non-negligible use of Moderately controlled ones (38% of all consultations).
    • Almost no use of completely Open resources.
    • Tight =
      • Carefully crafted and revised (linguists, terminologists, revisers). Ex: TERMIUM.
    • Moderate =
      • Comes from reputed organizations, but may not be as carefully crafted and revised. Ex: Gov of Canada web sites.
    • Open =
      • Could have been produced by anyone. Ex: the whole web.
    p < 0.05 p < 0.001
  • 28. Write Access
    • Our subjects were predominantly consumers of resources, as opposed to contributors.
    • Many comments about lack of time to contribute.
    • But in most collaborative resources, only need a small percentage of contributors.
    • Read-only =
      • Subject cannot write, or can only do so through an intermediary. Ex: TERMIUM
    • Read-Write =
      • Subject can write directly without an intermediary. Ex: subject’s own lexicon.
    p < 0.001
  • 29. Use of collaborative sources is not that common yet, but growing. Collaborative resources go against the grain of some translator attitudes, but nothing that can’t be surmounted. Need to address perception of quality and trustworthiness. Cannot expect majority of translators to contribute.
  • 30. Attitudes and workpractices with respect to collaborative translation
  • 31. Flavours of Collaborative Translation
    • In increasing order of controversy:
    • Translation teamware
      • Allow multidisciplinary teams of translators, terminologists, customers, domain experts to collaborate efficiently on a translation project.
    • Online market place for translators
      • E-bay like platforms for connecting customers and translators with minimal intervention by a middle man.
    • Translation crowdsourcing
      • Mechanical Turk style platform for distributing translation projects across large crowds of mostly amateur translators.
  • 32. Translation Teamware
      • Allow multidisciplinary teams of translators, terminologists, customers, domain experts to collaborate efficiently on a translation project.
    • Relatively uncontroversial.
    • Many commercial translation workflow products are along those lines, but follow a somewhat assembly-line model.
    • More resistance to wiki-like platforms that breakdown barriers and open up horizontal communication channels
      • Ex: Customer seeing early drafts of translations, and commenting on them.
      • Translators like to (need to?) stay in their own bubble.
      • Fear of undue interference by non-qualified staff.
      • But starting to see more and more case studies of this (ex: using BaseCamp or wikis to coordinate translation teams)
  • 33. Online Market Place for Translators
      • E-bay like platforms for connecting customers and translators with minimal intervention by a middle man.
      • Ex: ProZ, Translated.net
    • Usually includes
      • automatic reputation management.
      • free, open resources for translators (ex: Kudoz, MyMemory).
    • Somewhat controversial:
      • Some freelancers perceive it as empowering (cut out the middle man).
      • Others perceive it as an impersonal “Wallmart of translation”, i.e. something that encourages , low-quality translation.
  • 34. Translation Crowdsourcing
    • Mechanical Turk style platform for distributing translation projects across large crowds of mostly amateur translators.
      • This is REALLY controversial.
        • So far, only heard one professional translator say that this is a good thing.
      • Disrupts the fair compensation equation AND exerts downward pressures on quality.
        • One crowdsourcing vendor quotes average of $0.0008/word (vs $0.25-0.30/word for your average professional translator).
        • Translating out of context is known to be error-prone.
        • Amateur translators tend to produce texts that read like translations.
  • 35. Translation Crowdsourcing (2)
    • One hopes that CrowdSourcing technology will be used wisely and in a way that continues to leverage professional translator skills, for example:
        • Crowdsourcing used mostly for low-stake or user-generated content that is currently not being translated at all.
        • Professionals continue to play a pivotal role, by revising translations produced by the crowd and paying special attention to amateurs’ main weakness: native-sounding translation.
      • But we, researchers and developers, cannot guarantee that this is how things will unfold.
      • We need to be sensitive to those issues while we build the future of translation crowdsourcing.
  • 36. Conclusions
  • 37. Conclusions
    • Professional translators are NOT technology averse, but they will resist technologies that disrupt the fair compensation equation, or exert downward pressures on quality.
    • Crowdsourcing of large linguistic resources is compatible with the views of professional translators, although it is not yet part of their mainstream work practices.
    • Also non-controversial, is the use of online collaboration to facilitate team coordination, or to create “fair” marketplaces for freelance translators.
    • Crowd-sourcing of translation on the other hand is very controversial in translator circles, and we need to be sensitive to that issue in building and designing translation crowdsourcing environments.
  • 38. Questions?
  • 39. Thank you for your attention. For more details… Alain Désilets National Research Council of Canada [email_address]
  • 40.