Translation Studies:
     Simplification and Explicitation Universals

                                 Claudiu Mih˘il˘
hypothesise on universals since we are not able to capture all translations from
all languages and from all times [8].
in the recent years. The simplification universal is described as the tendency
of translators to produce simpler and easier...
number of discourse markers, and less simple and significantly shorter sentences.
However, the simplification traits are mor...
in the feature vector, including unigrams, bigrams, trigrams, word forms, lem-
mas, and part-of-speech tags. Therefore, th...
tions tend to be similar in translations and original texts, although the influence
of the source languages cannot be exclu...
Table 3. Distribution of say + that/zero in the BNC and TEC [26].

                      Connective BNC     TEC
the study of English into Korean translation described by Cheong contradicts
this claim [30].
    Cheong clearly distingui...
A successful validation of translation universals could be of great help in many
other nlp tasks, which rely on translatio...
Engineering in Honour of Juan C. Sager. John Benjamins, Amsterdam -
      Philadelphia (1996)
15.   Corpas Pastor, G., Mit...
Upcoming SlideShare
Loading in …5

Translation studies: Simplification and Explicitation Universals


Published on

Translation studies: Simplification and Explicitation Universals

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Translation studies: Simplification and Explicitation Universals

  1. 1. Translation Studies: Simplification and Explicitation Universals Claudiu Mih˘il˘ a a Faculty of Computer Science, ”Al.I. Cuza” University of Ia¸i, s 16, General Berthelot Street, 700483 Ia¸i, Romania s Abstract. The characteristics exhibited by translated texts compared to non-translated texts have always been of great interest in Translation Studies. Two universals, namely simplification and explicitation, are re- viewed in this report, presenting some of the studies that have been undertaken for their confirmation or, on the contrary, their disconfirma- tion. We describe the corpora, the methods, and the results, and analyse the conclusions of several important research papers. Key words: translationese, translation studies, translation universal, corpus linguistics 1 Introduction The idea of translation studies to search for regularities and general laws is not new; Gideon Toury is the best-known advocate for general laws of translation [1]. He proposed this as a fundamental task of descriptive translation studies due to the fact that translated language is believed to manifest certain universal features, as a consequence of the translation process. Translations exhibit their own specific lexico-grammatical and syntactic characteristics [2–4]. These ”fin- gerprints” that the translation process leaves behind were first described by Gellerstam and named generically translationese [5]. More recently, it has been stated that there are common characteristics which all translations share, regardless of the source and the target languages [6]. Although mostly intuitively, Mona Baker defines several such universal laws. Additionally, she observes the power that resides in electronic corpora and automatic natural language processing systems, in comparison to the manual contrastive studies undertaken by previous scholars on small-scale collections of texts. She includes in her list of universals, amongst others, simplification, explicitation, normalisation, and convergence. However, the issue of the existence of translation universals remains highly controversial. While some scientists report that they have found sufficient proof that such translation laws exist [7], others consider that it is not possible to even
  2. 2. hypothesise on universals since we are not able to capture all translations from all languages and from all times [8]. The translation universals field is thus a real target of debate in the trans- lation studies domain in the last fifteen years, bringing together different perspec- tives of the language of translation. Perhaps the main reason to investigate these hypotheses is to raise awareness among translators about the conscious or uncon- scious effects over translated texts, and the relationship between language and culture [7]. Bringing unconscious tendencies to light will emphasise translators’ decisions and strategies, and hence should pave the way to more accurate trans- lations, with ”more desired effects and fewer unwanted ones” [9]. The fundamental aim of this line of research is to model a language-indepen- dent learning system, able to distinguish between translated and non-translated texts. This development has implications in providing a wide applicability for other languages, thus enhancing the possibilities of study of these universals. Furthermore, it becomes feasible to determine which are the characteristics that influence the most the translated language. From a practical perspective, a system that automatically identifies transla- tionese (improved or not by the inclusion of specific features of the considered universals) may be of great help in the self-assessment of professional translators, or in the assessment of their training process. Moreover, an automatic transla- tionese identifier may significantly improve other nlp applications. For instance, such a system may be integrated in a statistical machine translation framework in order to identify translation direction [10]. Another possible application is its use in multilingual plagiarism detection, topic that is tackled more intensively in the last period. The report is structured as follows: section 2 contains brief descriptions of translation universals, whilst in section 3 we review the related work in this domain, focussing on simplification and explicitation. Finally, conclusions are drawn in section 4. 2 Translation universals The universals attracted considerable attention from translation experts, but their formulation and initial explanation has been based on intuition and intro- spection with ulterior corpus research limited to comparatively small-size cor- pora, literary or newswire texts and semi-manual analysis. Moreover, previous research has not provided sufficient guidance as to which are the features which account for these universals to be regarded as valid [11]. Various so-called translation universals as universal tendencies of the trans- lation process, laws of translation and norms of translation have been suggested in the literature [12, 13, 6, 7]. Toury proposed two laws of translation: the law of standardisation and the law of interference [13]. Baker defined four possible translation universals [6, 14]. The four universals, namely simplification, explicitation, convergence, and normalisation, are the ones which are the most intensively studied universals
  3. 3. in the recent years. The simplification universal is described as the tendency of translators to produce simpler and easier-to-follow texts, whilst explicitation refers to introducing overt information into the translation that is implicit in the source language [6]. Convergence states that the translations become more similar to one another than the non-translated texts are, and normalisation represents the conscious or unconscious rendering of idiosyncratic text features in order to make them conform to the typical textual characteristics of the target language. Laviosa continued this line of research by proposing features for simplification in a corpus-based study [7]. Despite some evidence of the existence of such a phenomenon, there is still a remarkable challenge in defining the features which characterise the simplification universal. 3 Related work A number of papers undertake certain experiments towards the research of the universals, however without any clear-cut conclusions. Nevertheless, it seems that these problematic claims require a strategy of investigation divided in two linear stages: first, the investigation of the proposed translation tendencies, and afterwards the investigation of the universality factor. On the one hand, the claims themselves, without considering the universality aspect, require adequate practical support in order to be validated as true or false. On the other hand, the universality characteristic is a matter of discussion, as the coverage implied by this term is too wide for the lack of evidence provided for different languages. The condition needed for the universality aspect to be widely accepted is to be validated for all languages, or at least for all language families. In what follows, we will see the current status of two of the hypothesised universals, simplification and explicitation, going through some of the most prevalent research undertaken in the field. 3.1 Simplification Recently, a corpus-based approach which tests the statistical significance of features proposed to investigate the simplification universal has been exploited for Spanish [11, 15]. In [11], Corpas tries to verify the validity of the simplification universal on a Spanish comparable corpus of medical and technical, translated and non- translated texts produced by both professional and semi-professional translators. Simplification seems to be validated for the lexical richness feature. Despite this, it is contradicted in terms of complex sentences, sentence length, depth of syntactical trees, information load, and ambiguity. Nonetheless, in [15], the authors use the same corpora as in [11] and perform a deeper analysis, exploiting other features as well. The experiments revealed that the translated texts contain a lower level of lexical richness and density, a lower
  4. 4. number of discourse markers, and less simple and significantly shorter sentences. However, the simplification traits are more visible only on the technical texts, and to a lesser degree on the professionally translated medical texts. Furthermore, Ilisei et al. develop a supervised learning system that is able to distinguish with a very high accuracy in some cases between translated and non-translated texts, also for the Spanish language [16, 17]. They use three comparable corpora, of which two are related to the medical domain, and one contains technical texts, and extract 21 language-independent features for their learning system to exploit. Table 1 includes the accuracies of various trained classifiers tested in [17]. The BayesNet, Simple Logistic, SVM, and Meta-classifier reach an incredible value of 97.62% in technical texts, with the SVM result statistically significantly better than without using simplification features. Table 1. Classification accuracy results on medical and technical test datasets with regard to simplification features (SF) [17]. Including SF Excluding SF Classifier Medical Technical Medical Technical Baseline (ZeroR) 64.71% 66.67% 64.71% 66.67% Naive Bayes 71.57% 95.24% 71.57% 80.95% BayesNet 73.53% 97.62% 71.57% 92.86% Jrip 79.42% 95.24% 72.55% 92.86% Decision Tree 77.45% 92.86% 75.49% 95.24% Simple Logistic 77.45% 97.62% 79.41% 83.33% SVM 75.49% 97.62% 74.51% 69.05% Meta-classifier 82.35% 97.62% 78.43% 92.86% Aiming at determining which are the most salient features that lead to these results, Ilisei et al. analyse the outputs of the various classifiers, such as Decision Tree and Jrip, and use attribute evaluators, such as Chi-Square and Information Gain. They conclude that lexical richness influences mostly the classification, closely followed by sentence length, proportions of pronouns, conjunctions, grammatical words, and lexical words; other features influence also the classification, but in a smaller proportion. Both lexical richness and sentence length are features considered to be indicative of the simplification hypothesis, widely discussed and studied in the past decade. Sentence length is a characte- ristic which posed a certain difficulty in its interpretation in the study undertaken in [15]. The most influential features identified with these evaluators concur with the first-level attributes from the intuitive output of the Decision Tree and Jrip classifiers [18]. A different perspective for this research topic is undertaken by Baroni and Bernardini, reporting a machine learning approach for the task of classifying Italian texts as translated or originals [19]. Several features have been employed
  5. 5. in the feature vector, including unigrams, bigrams, trigrams, word forms, lem- mas, and part-of-speech tags. Therefore, they are able to prove that shallow data representations can be sufficient to automatically distinguish professional translations from non-translated texts with an accuracy above the chance level, and hypothesise that this representation captures the distinguishing features of translationese. Additionally, the system’s classification quality seems to be much higher than that of human judges when faced with the same task. However, it is to be explicitly noted that in this study the feature vector is highly dependent on the language the system works on. The simplification universal is known to be a controversial claim, with dif- ferent studies bringing evidence both for and against it. However, it has been contested by studies on collocations [20], lexical use [21], and syntax [22]. For instance, Jantunen does not manage to establish clear and consistent evidence of a universal untypical lexical-grammatical patterning when operating on a subset of the Corpus of Translated Finnish (CTF) [22]. He tests the hypo- thesis on three near-synonym degree modifiers, hyvin, kovin, and oikein, all roughly meaning very, including a quantitative and qualitative analysis to pro- vide a comprehensive description. He uses the Three Phase Comparative Ana- lysis (TPCA) on three corpora, one of original Finnish (CNF), one of texts translated from various Indo-European and Finno-Ugric languages (MuCTF), and one translated from English (MoCTF). As described in Table 2, the author shows that the modifiers are almost twice as frequent in the translations (after normalisation per 100 000 tokens), and that this depends on the source language: the difference is not statistically significant for English, but it is for the MuCTF for a critical value for χ2 at 0.05 level of significance. Table 2. Frequencies of hyvin, kovin, and oikein in the CNF, MuCTF and MoCTF corpora [22]. Modifier CNF MuCTF MoCTF hyvin 36 66 70 kovin 18 39 38 oikein 12 15 20 Total 66 120 128 Jantunen then extracts the top-ranked collocations for each of the three modifiers from each of the three corpora. In the case of hyvin, the collocations match in a extremely small degree. However, in the case of the other two modifiers, the collocations overlap to a high degree, therefore making it rather difficult to draw conclusions. Furthermore, the colligation analysis for hyvin shows no difference between original and translated into Finnish texts. The conclusion that Jantunen reports is that translations tend to exhibit untypical lexical combinations, due to the source language, and that grammatical combina-
  6. 6. tions tend to be similar in translations and original texts, although the influence of the source languages cannot be excluded. 3.2 Explicitation Even though the surge for translation universals happened in the last two de- cades, pointers towards the law of explicitation have existed since the middle of the century. Vinay performed a comparative study in 1958 between French and English, and defines explicitation as: ”the process of introducing information into the target language which is present only implicitly in the source language, but which can be derived from the context or the situation” [23] Furthermore, Blum-Kulka notices the tendency of translations to be more explicit compared to the source texts, regardless of the language-specific expli- citness [12]. Later, Baker defines the explicitation universal as the tendency to ”spell things out rather than leave them implicit” [14]. Two categories of explicitation are described by Pym: the obligatory one, forced by the language specificity, and the voluntary one, when the translator is adding optional information in the text to avoid misinterpretations [24]. Vander- auwera proposes the following list of explicitation repertoires: expansion of con- densed passage; addition of modifiers, qualifiers and conjunctions to achieve greater transparency; and addition of extra information and insertion of expla- nations, amongst many others [25]. Another study, which exploits the Translational English Corpus (TEC), indi- cates a significant use of the optional that with the verbs say and tell in trans- lated texts compared to a British National Corpus (BNC) comparable sub-corpus [26]. Tables 3 and 4 contain the results of the analysis, having included them both as absolute and percentage values. It is immediately clear that the that- connective is far more frequent in TEC than in BNC. By contrast, the zero- connective is more frequent for all forms of both verbs in the BNC corpus. These differences have been proven to be statistically significant. Furthermore, the results of the say and tell study were consistent with findings by Burnett who reviewed use of the verbs suggest, admit, claim, think, believe, hope and know in both TEC and BNC [27]. A similar study investigating the verb promise found the same pattern be- tween translated and non-translated English [28]. Table 5 shows that although the number of occurrences of ’promise’ followed by that or zero connective is very close in the two corpora (131 in the TEC and 135 in the BNC), the distributions are almost directly inverse. Also, the explicitation universal is investigated in simultaneous interpreting, and Gumul concludes that, to a certain extent, explicitation appears to be dependent on the direction of interpreting [29]. In contrast to simplification, the explicitation universal is maybe the least controversial hypothesis according to the conclusions of several studies. However,
  7. 7. Table 3. Distribution of say + that/zero in the BNC and TEC [26]. Connective BNC TEC 712 775 that 23.72% 50.22% 2289 768 zero 76.28% 49.78% Total 3001 1543 Table 4. Distribution of tell + that/zero in the BNC and TEC [26]. Connective BNC TEC 997 719 that 41.45% 62.74% 1408 427 zero 58.55% 37.26% Total 2405 1146 Table 5. Distribution of promise + that/zero in the BNC and TEC [28]. Connective BNC TEC 46 89 that 34.1% 67.9% 89 42 zero 65.9% 32.1% Total 135 131
  8. 8. the study of English into Korean translation described by Cheong contradicts this claim [30]. Cheong clearly distinguishes between two reverse operations, explicitation and implicitation, and notes that implicitation has been neglected in the study of translation universals. Therefore, by using a English-Korean corpus, he tries to determine which of the two phenomena is the dominant one, to test whether the direction of the translation has any effect on them, and to identify the factors that influence the phenomena. After applying four different measurement units and a set of newly devised variables, the author concludes that both explicitation and implicitation are present in the target text, and that the direction of the translation influences the behaviour of texts regarding the two phenomena, even in cases where the identical language pair is involved [30]. Although no studies have yet been performed in the case of the Romanian language, it is possible for explicitation and implicitation to manifest themselves in Romanian translations too. For instance, the flexible Romanian grammar allows zero anaphora to exist with a relatively high frequency, of 0.32 zero pronominal anaphors per sentence [31]. Therefore, when translating into Roma- nian from a language with a very low degree of zero pronouns, such as English, French, or German, the explicit information in the source text may become encoded implicitly in some other word in the target text, without it being demanded by grammar rules. Thus, the identification of zero pronouns in Roma- nian [32] might prove itself a valuable characteristic of implicitation. On the other hand, when translating between language pairs both of which have a high degree of zero anaphora (e.g., Spanish, Portuguese, Korean, or Chinese), both explicitation and implicitation might occur, in order to avoid ambiguities or to create a more natural text. 4 Conclusions This report contains in brief some of the results that have been obtained in the field of translation studies, more specifically on the simplification and explicita- tion universals. We have described various methodologies of study, and presented the conclusions of the authors regarding the validity of the two universals. Although intensely studied in the last two decades, simplification is not yet completely and clearly confirmed as a universal. Although there are many differently undertaken studies supporting it, there are also studies which contra- dict it. It is still a difficult task to extract the characteristics of this phenomenon. Nonetheless, efforts are being continuously made on different language pairs and promising results started to appear in the past few years. In the case of explicitation, things seem to be clearer than with simplification. It occurs quite often in many translations, mostly in order to avoid misinterpre- tations in the target text. However, there are cases when explicitation appears combined with its reverse function, implicitation, making it rather complicated to analyse the data and draw conclusions. Nevertheless, most studies confirm this hypothesis, making it one of the most plausible universals.
  9. 9. A successful validation of translation universals could be of great help in many other nlp tasks, which rely on translations. For instance, statistical machine translations could be improved by automatically determining the direction of translation, and multilingual plagiarism detection may benefit too. Moreover, human translators would become more conscious of the way they translate, and such universals could aid them to self-assess their work. However, due to the number of disconfirming experiments, it is possible for the name of translation universal to not be the most felicitous one; one could rename it to, for example, translation trend. References 1. Toury, G.: In search of a theory of translation. The Porter Institute for Poetics and Semiotics, Tel Aviv (1980) 2. Borin, L., Pr¨tz, K.: Through a glass darkly: Part-of-speech distribution in original u and translated text. In Daelemans, W., Sima’an, K., Veenstra, J., Zavrel, J., eds.: Computational Linguistics in the Netherlands 2000. (2001) pp. 30–44 3. Hansen, S.: The Nature of Translated Text - An Interdisciplinary Methodology for the Investigation of the Specific Properties of Translations. Saarland University, Saarbr¨cken (2003) u 4. Teich, E.: Cross-Linguistic Variation in System and Text. Mouton de Gruyter, Berlin (2003) 5. Gellerstam, M.: Translationese in Swedish novels translated from English. In Wollin, L., Lindquist, H., eds.: Translation studies in Scandinavia. CWK Gleerup (1986) pp. 88–95 6. Baker, M.: Corpus linguistics and translation studies: Implications and applications. In Baker, M., Francis, G., Tognini-Bonelli, E., eds.: Text and Technology: In Honour of John Sinclair. John Benjamins, Amsterdam - Philadelphia (1993) 7. Laviosa, S.: Corpus-based Translation Studies. Theory, Findings, Applications. Rodopi, Amsterdam - New York (2002) 8. Tymoczko, M.: Computerised corpora and translation studies. Meta 43(4) (1998) pp. 652–659 9. Chesterman, A.: A causal model for translation studies. In Olohan, M., ed.: Intercultural Faultlines. Research Models in Translation Studies I: Textual and Cognitive Aspects. St. Jerome, Manchester (2000) 10. Goutte, C., Kurokawa, D., Isabelle, P.: Improving SMT by learning translation direction. In: EAMT 2009 workshop ”Statistical Multilingual Analysis for Retrieval and Translation”. (2009) 11. Corpas Pastor, G.: Investigar con corpus en traducci´n: los retos de un nuevo o paradigma. Peter Lang, Berlin & New York (2008) 12. Blum-Kulka, S.: Shifts of cohesion and coherence in translation. In House, J., Blum-Kulka, S., eds.: Interlingual and Intercultural Communication. Discourse and Cognition in Translation and Second Language Acquisition. Narr (1986) pp. 17–35 13. Toury, G.: Descriptive Translation Studies and Beyond. John Benjamins, Amsterdam (1995) 14. Baker, M.: Corpus-based translation studies: The challenges that lie ahead. In Somers, H., ed.: Terminology, LSP and Translation: Studies in Language
  10. 10. Engineering in Honour of Juan C. Sager. John Benjamins, Amsterdam - Philadelphia (1996) 15. Corpas Pastor, G., Mitkov, R., Afzal, N., Pekar, V.: Translation universals: Do they exist? A corpus-based NLP study of convergence and simplification. In: Proceedings of the AMTA. (2008) 16. Ilisei, I., Inkpen, D., Corpas Pastor, G., Mitkov, R.: Towards simplification: A supervised learning approach. In: Proceedings of Machine Translation 25 Years On. (November 2009) 17. Ilisei, I., Inkpen, D., Corpas Pastor, G., Mitkov, R.: Identification of translationese: A machine learning approach. In Gelbukh, A., ed.: Proceedings of the 11th Inter- national Conference on Computational Linguistics and Intelligent Text Processing (CICLing). (2010) pp. 503–511 18. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1) (1986) pp. 81–106 19. Baroni, M., Bernardini, S.: A New Approach to the Study of Translationese: Machine-learning the Difference between Original and Translated Text. Lit Linguist Computing 21(3) (2006) pp. 259–274 20. Mauranen, A.: Strange strings in translated language: A study on corpora. In Olohan, M., ed.: Intercultural Faultlines. Research Models in Translation Studies I: Textual and Cognitive Aspects. St. Jerome, Manchester (2000) pp. 119–141 21. Jantunen, J.H.: Synonymity and lexical simplification in translations: A corpus based approach. Across Languages and Cultures 2(1) (2001) pp. 97–112 22. Jantunen, J.H.: Untypical patterns in translations: Issues on corpus methodology and synonymity. In Mauranen, A., Kujamaki, P., eds.: Translation Universals : Do They Exist? Volume 48. John Benjamins (2004) pp. 101–126 23. Vinay, D.: Stylistique Comparee du Fran¸ais et de l’Anglais. Didier (1958) c 24. Pym, A.: Explaining explicitation. In Karoly, K., F´ris, A., eds.: New Trends o in Translation Studies. In Honour of Kinga Klaudy. Akad´miai Kiad´, Budapest e o (2005) pp. 29–34 25. Vanderauwera, R.: Dutch novels translated into English: the transformation of a ”Minority” literature. Rodopi, Amsterdam (1985) 26. Olohan, M., Baker, M.: Reporting ’that’ in translated English: Evidence for subconscious processes of explicitation? Across Languages and Cultures 1(2) (2000) pp. 141–158 27. Burnett, S.: A corpus-based study of translational English. Master’s thesis, University of Manchester (1999) 28. Olohan, M.: Spelling out the optionals in translation: A corpus study. In: UCREL Technical Papers. Volume 13. (2001) pp. 423–432 29. Gumul, E.: Explicitation in simultaneous interpreting: A strategy or a byproduct of language mediation? Across Languages and Cultures 7(2) (2006) pp. 171–190 30. Cheong, H.J.: Target text contraction in English-into-Korean translations: A contradiction of presumed translation universals? Meta 51(2) (2006) pp. 343–367 31. Mih˘il˘, C., Ilisei, I., Inkpen, D.: Romanian Zero Pronoun Distribution: A a a Comparative Study. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC). (2010) 32. Mih˘il˘, C., Ilisei, I., Inkpen, D.: To Be or Not to Be a Zero Pronoun: A Machine a a Learning Approach for Romanian. In: Proceedings of the Processing ROmanian in Multilingual, Interoperational and Scalable Environments Workshop (PROMISE). (2010)