Simpliﬁcation and Explicitation Universals
Faculty of Computer Science,
”Al.I. Cuza” University of Ia¸i,
16, General Berthelot Street,
700483 Ia¸i, Romania
Abstract. The characteristics exhibited by translated texts compared
to non-translated texts have always been of great interest in Translation
Studies. Two universals, namely simpliﬁcation and explicitation, are re-
viewed in this report, presenting some of the studies that have been
undertaken for their conﬁrmation or, on the contrary, their disconﬁrma-
tion. We describe the corpora, the methods, and the results, and analyse
the conclusions of several important research papers.
Key words: translationese, translation studies, translation universal,
The idea of translation studies to search for regularities and general laws is not
new; Gideon Toury is the best-known advocate for general laws of translation
. He proposed this as a fundamental task of descriptive translation studies
due to the fact that translated language is believed to manifest certain universal
features, as a consequence of the translation process. Translations exhibit their
own speciﬁc lexico-grammatical and syntactic characteristics [2–4]. These ”ﬁn-
gerprints” that the translation process leaves behind were ﬁrst described by
Gellerstam and named generically translationese .
More recently, it has been stated that there are common characteristics which
all translations share, regardless of the source and the target languages .
Although mostly intuitively, Mona Baker deﬁnes several such universal laws.
Additionally, she observes the power that resides in electronic corpora and
automatic natural language processing systems, in comparison to the manual
contrastive studies undertaken by previous scholars on small-scale collections
of texts. She includes in her list of universals, amongst others, simpliﬁcation,
explicitation, normalisation, and convergence.
However, the issue of the existence of translation universals remains highly
controversial. While some scientists report that they have found suﬃcient proof
that such translation laws exist , others consider that it is not possible to even
hypothesise on universals since we are not able to capture all translations from
all languages and from all times .
The translation universals ﬁeld is thus a real target of debate in the trans-
lation studies domain in the last ﬁfteen years, bringing together diﬀerent perspec-
tives of the language of translation. Perhaps the main reason to investigate these
hypotheses is to raise awareness among translators about the conscious or uncon-
scious eﬀects over translated texts, and the relationship between language and
culture . Bringing unconscious tendencies to light will emphasise translators’
decisions and strategies, and hence should pave the way to more accurate trans-
lations, with ”more desired eﬀects and fewer unwanted ones” .
The fundamental aim of this line of research is to model a language-indepen-
dent learning system, able to distinguish between translated and non-translated
texts. This development has implications in providing a wide applicability for
other languages, thus enhancing the possibilities of study of these universals.
Furthermore, it becomes feasible to determine which are the characteristics that
inﬂuence the most the translated language.
From a practical perspective, a system that automatically identiﬁes transla-
tionese (improved or not by the inclusion of speciﬁc features of the considered
universals) may be of great help in the self-assessment of professional translators,
or in the assessment of their training process. Moreover, an automatic transla-
tionese identiﬁer may signiﬁcantly improve other nlp applications. For instance,
such a system may be integrated in a statistical machine translation framework
in order to identify translation direction . Another possible application is its
use in multilingual plagiarism detection, topic that is tackled more intensively
in the last period.
The report is structured as follows: section 2 contains brief descriptions of
translation universals, whilst in section 3 we review the related work in this
domain, focussing on simpliﬁcation and explicitation. Finally, conclusions are
drawn in section 4.
2 Translation universals
The universals attracted considerable attention from translation experts, but
their formulation and initial explanation has been based on intuition and intro-
spection with ulterior corpus research limited to comparatively small-size cor-
pora, literary or newswire texts and semi-manual analysis. Moreover, previous
research has not provided suﬃcient guidance as to which are the features which
account for these universals to be regarded as valid .
Various so-called translation universals as universal tendencies of the trans-
lation process, laws of translation and norms of translation have been suggested
in the literature [12, 13, 6, 7].
Toury proposed two laws of translation: the law of standardisation and the
law of interference . Baker deﬁned four possible translation universals [6,
14]. The four universals, namely simpliﬁcation, explicitation, convergence, and
normalisation, are the ones which are the most intensively studied universals
in the recent years. The simpliﬁcation universal is described as the tendency
of translators to produce simpler and easier-to-follow texts, whilst explicitation
refers to introducing overt information into the translation that is implicit in
the source language . Convergence states that the translations become more
similar to one another than the non-translated texts are, and normalisation
represents the conscious or unconscious rendering of idiosyncratic text features
in order to make them conform to the typical textual characteristics of the target
Laviosa continued this line of research by proposing features for simpliﬁcation
in a corpus-based study . Despite some evidence of the existence of such a
phenomenon, there is still a remarkable challenge in deﬁning the features which
characterise the simpliﬁcation universal.
3 Related work
A number of papers undertake certain experiments towards the research of the
universals, however without any clear-cut conclusions. Nevertheless, it seems
that these problematic claims require a strategy of investigation divided in two
linear stages: ﬁrst, the investigation of the proposed translation tendencies, and
afterwards the investigation of the universality factor.
On the one hand, the claims themselves, without considering the universality
aspect, require adequate practical support in order to be validated as true or
false. On the other hand, the universality characteristic is a matter of discussion,
as the coverage implied by this term is too wide for the lack of evidence provided
for diﬀerent languages. The condition needed for the universality aspect to be
widely accepted is to be validated for all languages, or at least for all language
In what follows, we will see the current status of two of the hypothesised
universals, simpliﬁcation and explicitation, going through some of the most
prevalent research undertaken in the ﬁeld.
Recently, a corpus-based approach which tests the statistical signiﬁcance of
features proposed to investigate the simpliﬁcation universal has been exploited
for Spanish [11, 15].
In , Corpas tries to verify the validity of the simpliﬁcation universal
on a Spanish comparable corpus of medical and technical, translated and non-
translated texts produced by both professional and semi-professional translators.
Simpliﬁcation seems to be validated for the lexical richness feature. Despite
this, it is contradicted in terms of complex sentences, sentence length, depth of
syntactical trees, information load, and ambiguity.
Nonetheless, in , the authors use the same corpora as in  and perform a
deeper analysis, exploiting other features as well. The experiments revealed that
the translated texts contain a lower level of lexical richness and density, a lower
number of discourse markers, and less simple and signiﬁcantly shorter sentences.
However, the simpliﬁcation traits are more visible only on the technical texts,
and to a lesser degree on the professionally translated medical texts.
Furthermore, Ilisei et al. develop a supervised learning system that is able to
distinguish with a very high accuracy in some cases between translated and
non-translated texts, also for the Spanish language [16, 17]. They use three
comparable corpora, of which two are related to the medical domain, and one
contains technical texts, and extract 21 language-independent features for their
learning system to exploit.
Table 1 includes the accuracies of various trained classiﬁers tested in .
The BayesNet, Simple Logistic, SVM, and Meta-classiﬁer reach an incredible
value of 97.62% in technical texts, with the SVM result statistically signiﬁcantly
better than without using simpliﬁcation features.
Table 1. Classiﬁcation accuracy results on medical and technical test datasets with
regard to simpliﬁcation features (SF) .
Including SF Excluding SF
Medical Technical Medical Technical
Baseline (ZeroR) 64.71% 66.67% 64.71% 66.67%
Naive Bayes 71.57% 95.24% 71.57% 80.95%
BayesNet 73.53% 97.62% 71.57% 92.86%
Jrip 79.42% 95.24% 72.55% 92.86%
Decision Tree 77.45% 92.86% 75.49% 95.24%
Simple Logistic 77.45% 97.62% 79.41% 83.33%
SVM 75.49% 97.62% 74.51% 69.05%
Meta-classiﬁer 82.35% 97.62% 78.43% 92.86%
Aiming at determining which are the most salient features that lead to
these results, Ilisei et al. analyse the outputs of the various classiﬁers, such
as Decision Tree and Jrip, and use attribute evaluators, such as Chi-Square
and Information Gain. They conclude that lexical richness inﬂuences mostly
the classiﬁcation, closely followed by sentence length, proportions of pronouns,
conjunctions, grammatical words, and lexical words; other features inﬂuence also
the classiﬁcation, but in a smaller proportion. Both lexical richness and sentence
length are features considered to be indicative of the simpliﬁcation hypothesis,
widely discussed and studied in the past decade. Sentence length is a characte-
ristic which posed a certain diﬃculty in its interpretation in the study undertaken
in . The most inﬂuential features identiﬁed with these evaluators concur with
the ﬁrst-level attributes from the intuitive output of the Decision Tree and Jrip
A diﬀerent perspective for this research topic is undertaken by Baroni and
Bernardini, reporting a machine learning approach for the task of classifying
Italian texts as translated or originals . Several features have been employed
in the feature vector, including unigrams, bigrams, trigrams, word forms, lem-
mas, and part-of-speech tags. Therefore, they are able to prove that shallow
data representations can be suﬃcient to automatically distinguish professional
translations from non-translated texts with an accuracy above the chance level,
and hypothesise that this representation captures the distinguishing features of
translationese. Additionally, the system’s classiﬁcation quality seems to be much
higher than that of human judges when faced with the same task. However, it is
to be explicitly noted that in this study the feature vector is highly dependent
on the language the system works on.
The simpliﬁcation universal is known to be a controversial claim, with dif-
ferent studies bringing evidence both for and against it. However, it has been
contested by studies on collocations , lexical use , and syntax .
For instance, Jantunen does not manage to establish clear and consistent
evidence of a universal untypical lexical-grammatical patterning when operating
on a subset of the Corpus of Translated Finnish (CTF) . He tests the hypo-
thesis on three near-synonym degree modiﬁers, hyvin, kovin, and oikein, all
roughly meaning very, including a quantitative and qualitative analysis to pro-
vide a comprehensive description. He uses the Three Phase Comparative Ana-
lysis (TPCA) on three corpora, one of original Finnish (CNF), one of texts
translated from various Indo-European and Finno-Ugric languages (MuCTF),
and one translated from English (MoCTF). As described in Table 2, the author
shows that the modiﬁers are almost twice as frequent in the translations (after
normalisation per 100 000 tokens), and that this depends on the source language:
the diﬀerence is not statistically signiﬁcant for English, but it is for the MuCTF
for a critical value for χ2 at 0.05 level of signiﬁcance.
Table 2. Frequencies of hyvin, kovin, and oikein in the CNF, MuCTF and MoCTF
Modiﬁer CNF MuCTF MoCTF
hyvin 36 66 70
kovin 18 39 38
oikein 12 15 20
Total 66 120 128
Jantunen then extracts the top-ranked collocations for each of the three
modiﬁers from each of the three corpora. In the case of hyvin, the collocations
match in a extremely small degree. However, in the case of the other two
modiﬁers, the collocations overlap to a high degree, therefore making it rather
diﬃcult to draw conclusions. Furthermore, the colligation analysis for hyvin
shows no diﬀerence between original and translated into Finnish texts. The
conclusion that Jantunen reports is that translations tend to exhibit untypical
lexical combinations, due to the source language, and that grammatical combina-
tions tend to be similar in translations and original texts, although the inﬂuence
of the source languages cannot be excluded.
Even though the surge for translation universals happened in the last two de-
cades, pointers towards the law of explicitation have existed since the middle of
the century. Vinay performed a comparative study in 1958 between French and
English, and deﬁnes explicitation as:
”the process of introducing information into the target language which is
present only implicitly in the source language, but which can be derived
from the context or the situation” 
Furthermore, Blum-Kulka notices the tendency of translations to be more
explicit compared to the source texts, regardless of the language-speciﬁc expli-
citness . Later, Baker deﬁnes the explicitation universal as the tendency to
”spell things out rather than leave them implicit” .
Two categories of explicitation are described by Pym: the obligatory one,
forced by the language speciﬁcity, and the voluntary one, when the translator is
adding optional information in the text to avoid misinterpretations . Vander-
auwera proposes the following list of explicitation repertoires: expansion of con-
densed passage; addition of modiﬁers, qualiﬁers and conjunctions to achieve
greater transparency; and addition of extra information and insertion of expla-
nations, amongst many others .
Another study, which exploits the Translational English Corpus (TEC), indi-
cates a signiﬁcant use of the optional that with the verbs say and tell in trans-
lated texts compared to a British National Corpus (BNC) comparable sub-corpus
. Tables 3 and 4 contain the results of the analysis, having included them
both as absolute and percentage values. It is immediately clear that the that-
connective is far more frequent in TEC than in BNC. By contrast, the zero-
connective is more frequent for all forms of both verbs in the BNC corpus.
These diﬀerences have been proven to be statistically signiﬁcant. Furthermore,
the results of the say and tell study were consistent with ﬁndings by Burnett
who reviewed use of the verbs suggest, admit, claim, think, believe, hope and
know in both TEC and BNC .
A similar study investigating the verb promise found the same pattern be-
tween translated and non-translated English . Table 5 shows that although
the number of occurrences of ’promise’ followed by that or zero connective is very
close in the two corpora (131 in the TEC and 135 in the BNC), the distributions
are almost directly inverse.
Also, the explicitation universal is investigated in simultaneous interpreting,
and Gumul concludes that, to a certain extent, explicitation appears to be
dependent on the direction of interpreting .
In contrast to simpliﬁcation, the explicitation universal is maybe the least
controversial hypothesis according to the conclusions of several studies. However,
Table 3. Distribution of say + that/zero in the BNC and TEC .
Connective BNC TEC
Total 3001 1543
Table 4. Distribution of tell + that/zero in the BNC and TEC .
Connective BNC TEC
Total 2405 1146
Table 5. Distribution of promise + that/zero in the BNC and TEC .
Connective BNC TEC
Total 135 131
the study of English into Korean translation described by Cheong contradicts
this claim .
Cheong clearly distinguishes between two reverse operations, explicitation
and implicitation, and notes that implicitation has been neglected in the study
of translation universals. Therefore, by using a English-Korean corpus, he tries
to determine which of the two phenomena is the dominant one, to test whether
the direction of the translation has any eﬀect on them, and to identify the factors
that inﬂuence the phenomena. After applying four diﬀerent measurement units
and a set of newly devised variables, the author concludes that both explicitation
and implicitation are present in the target text, and that the direction of the
translation inﬂuences the behaviour of texts regarding the two phenomena, even
in cases where the identical language pair is involved .
Although no studies have yet been performed in the case of the Romanian
language, it is possible for explicitation and implicitation to manifest themselves
in Romanian translations too. For instance, the ﬂexible Romanian grammar
allows zero anaphora to exist with a relatively high frequency, of 0.32 zero
pronominal anaphors per sentence . Therefore, when translating into Roma-
nian from a language with a very low degree of zero pronouns, such as English,
French, or German, the explicit information in the source text may become
encoded implicitly in some other word in the target text, without it being
demanded by grammar rules. Thus, the identiﬁcation of zero pronouns in Roma-
nian  might prove itself a valuable characteristic of implicitation. On the
other hand, when translating between language pairs both of which have a high
degree of zero anaphora (e.g., Spanish, Portuguese, Korean, or Chinese), both
explicitation and implicitation might occur, in order to avoid ambiguities or to
create a more natural text.
This report contains in brief some of the results that have been obtained in the
ﬁeld of translation studies, more speciﬁcally on the simpliﬁcation and explicita-
tion universals. We have described various methodologies of study, and presented
the conclusions of the authors regarding the validity of the two universals.
Although intensely studied in the last two decades, simpliﬁcation is not
yet completely and clearly conﬁrmed as a universal. Although there are many
diﬀerently undertaken studies supporting it, there are also studies which contra-
dict it. It is still a diﬃcult task to extract the characteristics of this phenomenon.
Nonetheless, eﬀorts are being continuously made on diﬀerent language pairs and
promising results started to appear in the past few years.
In the case of explicitation, things seem to be clearer than with simpliﬁcation.
It occurs quite often in many translations, mostly in order to avoid misinterpre-
tations in the target text. However, there are cases when explicitation appears
combined with its reverse function, implicitation, making it rather complicated
to analyse the data and draw conclusions. Nevertheless, most studies conﬁrm
this hypothesis, making it one of the most plausible universals.
A successful validation of translation universals could be of great help in many
other nlp tasks, which rely on translations. For instance, statistical machine
translations could be improved by automatically determining the direction of
translation, and multilingual plagiarism detection may beneﬁt too. Moreover,
human translators would become more conscious of the way they translate, and
such universals could aid them to self-assess their work. However, due to the
number of disconﬁrming experiments, it is possible for the name of translation
universal to not be the most felicitous one; one could rename it to, for example,
1. Toury, G.: In search of a theory of translation. The Porter Institute for Poetics
and Semiotics, Tel Aviv (1980)
2. Borin, L., Pr¨tz, K.: Through a glass darkly: Part-of-speech distribution in original
and translated text. In Daelemans, W., Sima’an, K., Veenstra, J., Zavrel, J., eds.:
Computational Linguistics in the Netherlands 2000. (2001) pp. 30–44
3. Hansen, S.: The Nature of Translated Text - An Interdisciplinary Methodology for
the Investigation of the Speciﬁc Properties of Translations. Saarland University,
4. Teich, E.: Cross-Linguistic Variation in System and Text. Mouton de Gruyter,
5. Gellerstam, M.: Translationese in Swedish novels translated from English. In
Wollin, L., Lindquist, H., eds.: Translation studies in Scandinavia. CWK Gleerup
(1986) pp. 88–95
6. Baker, M.: Corpus linguistics and translation studies: Implications and
applications. In Baker, M., Francis, G., Tognini-Bonelli, E., eds.: Text and
Technology: In Honour of John Sinclair. John Benjamins, Amsterdam -
7. Laviosa, S.: Corpus-based Translation Studies. Theory, Findings, Applications.
Rodopi, Amsterdam - New York (2002)
8. Tymoczko, M.: Computerised corpora and translation studies. Meta 43(4) (1998)
9. Chesterman, A.: A causal model for translation studies. In Olohan, M., ed.:
Intercultural Faultlines. Research Models in Translation Studies I: Textual and
Cognitive Aspects. St. Jerome, Manchester (2000)
10. Goutte, C., Kurokawa, D., Isabelle, P.: Improving SMT by learning translation
direction. In: EAMT 2009 workshop ”Statistical Multilingual Analysis for Retrieval
and Translation”. (2009)
11. Corpas Pastor, G.: Investigar con corpus en traducci´n: los retos de un nuevo
paradigma. Peter Lang, Berlin & New York (2008)
12. Blum-Kulka, S.: Shifts of cohesion and coherence in translation. In House, J.,
Blum-Kulka, S., eds.: Interlingual and Intercultural Communication. Discourse and
Cognition in Translation and Second Language Acquisition. Narr (1986) pp. 17–35
13. Toury, G.: Descriptive Translation Studies and Beyond. John Benjamins,
14. Baker, M.: Corpus-based translation studies: The challenges that lie ahead.
In Somers, H., ed.: Terminology, LSP and Translation: Studies in Language
Engineering in Honour of Juan C. Sager. John Benjamins, Amsterdam -
15. Corpas Pastor, G., Mitkov, R., Afzal, N., Pekar, V.: Translation universals: Do
they exist? A corpus-based NLP study of convergence and simpliﬁcation. In:
Proceedings of the AMTA. (2008)
16. Ilisei, I., Inkpen, D., Corpas Pastor, G., Mitkov, R.: Towards simpliﬁcation: A
supervised learning approach. In: Proceedings of Machine Translation 25 Years
On. (November 2009)
17. Ilisei, I., Inkpen, D., Corpas Pastor, G., Mitkov, R.: Identiﬁcation of translationese:
A machine learning approach. In Gelbukh, A., ed.: Proceedings of the 11th Inter-
national Conference on Computational Linguistics and Intelligent Text Processing
(CICLing). (2010) pp. 503–511
18. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1) (1986) pp.
19. Baroni, M., Bernardini, S.: A New Approach to the Study of Translationese:
Machine-learning the Diﬀerence between Original and Translated Text. Lit
Linguist Computing 21(3) (2006) pp. 259–274
20. Mauranen, A.: Strange strings in translated language: A study on corpora. In
Olohan, M., ed.: Intercultural Faultlines. Research Models in Translation Studies
I: Textual and Cognitive Aspects. St. Jerome, Manchester (2000) pp. 119–141
21. Jantunen, J.H.: Synonymity and lexical simpliﬁcation in translations: A corpus
based approach. Across Languages and Cultures 2(1) (2001) pp. 97–112
22. Jantunen, J.H.: Untypical patterns in translations: Issues on corpus methodology
and synonymity. In Mauranen, A., Kujamaki, P., eds.: Translation Universals : Do
They Exist? Volume 48. John Benjamins (2004) pp. 101–126
23. Vinay, D.: Stylistique Comparee du Fran¸ais et de l’Anglais. Didier (1958)
24. Pym, A.: Explaining explicitation. In Karoly, K., F´ris, A., eds.: New Trends
in Translation Studies. In Honour of Kinga Klaudy. Akad´miai Kiad´, Budapest
(2005) pp. 29–34
25. Vanderauwera, R.: Dutch novels translated into English: the transformation of a
”Minority” literature. Rodopi, Amsterdam (1985)
26. Olohan, M., Baker, M.: Reporting ’that’ in translated English: Evidence for
subconscious processes of explicitation? Across Languages and Cultures 1(2)
(2000) pp. 141–158
27. Burnett, S.: A corpus-based study of translational English. Master’s thesis,
University of Manchester (1999)
28. Olohan, M.: Spelling out the optionals in translation: A corpus study. In: UCREL
Technical Papers. Volume 13. (2001) pp. 423–432
29. Gumul, E.: Explicitation in simultaneous interpreting: A strategy or a byproduct
of language mediation? Across Languages and Cultures 7(2) (2006) pp. 171–190
30. Cheong, H.J.: Target text contraction in English-into-Korean translations: A
contradiction of presumed translation universals? Meta 51(2) (2006) pp. 343–367
31. Mih˘il˘, C., Ilisei, I., Inkpen, D.: Romanian Zero Pronoun Distribution: A
Comparative Study. In: Proceedings of the 7th International Conference on
Language Resources and Evaluation (LREC). (2010)
32. Mih˘il˘, C., Ilisei, I., Inkpen, D.: To Be or Not to Be a Zero Pronoun: A Machine
Learning Approach for Romanian. In: Proceedings of the Processing ROmanian in
Multilingual, Interoperational and Scalable Environments Workshop (PROMISE).