Spotting Translationese: An Empirical Approach
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Spotting Translationese: An Empirical Approach

  • 1,544 views
Uploaded on

This research aims to give empirical evidence of the phenomenon of translationese, which has been defined as the dialect, sub-language or code of translated language. The evidence of......

This research aims to give empirical evidence of the phenomenon of translationese, which has been defined as the dialect, sub-language or code of translated language. The evidence of translationese has been empirically demonstrated through isolated phenomena in particular language pairs, but there has not been a systematical study involving more than two languages. We have not either found any previous study of translationese in Catalan so far.
We intend to prove the translationese hypothesis: first in a corpus of original and translated Catalan; secondly, in other languages such as Spanish, French, English and German by reusing the previous methodology. Thus, we will try to demonstrate that translationese is empirically observable and automatically detectable. The goal is therefore to define which patterns of translation are universal across languages and which are source language or target language-dependent.
The data collected and the resources created for identifying lexical, morphological and syntactic patterns of translations can be of great help for Translation Studies teachers, scholars and students: teachers will have tools to help students avoid the reproduction of translationese patterns. Resources previously developed will help in detecting non-genuine words and inadequate structures in the target language. This fact would imply an improvement in stylistic quality in translations. Machine Translation companies can also take advantage of our resources in order to improve their translation quality.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,544
On Slideshare
1,543
From Embeds
1
Number of Embeds
1

Actions

Shares
Downloads
17
Comments
0
Likes
0

Embeds 1

http://www.linkedin.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Spotting Translationese: An Empirical Approach Pau Giménez FloresSupervisors: Carme Colominas and Toni Badia Universitat Pompeu Fabra
  • 2. Content1. Translationese2. Goals3. Translation Universals4. Empirical Methods in Translation Studies5. Theoretical Framework6. Hypotheses7. Methodology8. Working Plan9. Commented Bibliography
  • 3. Translationese• A product of the incompetence of the translator (translation errors): – “unusual distribution of features is clearly a result of the translator’s inexperience or lack of competence in the target language” (Baker, 1998: 248)• Translation-specific language or dialect, without any negative connotations (translation universals): – Third code “which arises out of the bilateral consideration of the matrix and target codes: it is, in a sense, a sub-code of each of the codes involved” (Frawley, 1984: 168). – Translationese: set of linguistic features of translated texts which are different both from the source language and the target language (Gellerstam, 1986).
  • 4. Goals• Main goal: validating the hypothesis of translationese empirically. – Capturing the linguistic properties of translationese in observable and refutable facts. – Detecting and classifying automatically translated vs. non-translated texts based on its syntactic and lexical properties.
  • 5. Translation Universals (1)“Features which typically occur in translatedtext rather than original utterances and whichare not the result of interference from specificlinguistic systems” (Baker, 1993: 243)
  • 6. Translation Universals (2)• Explicitation or explicitness: translations tend to be more explicit than source texts – Repetition of redundant grammatical items (i.e. prepositions) – Optional that-connective is more frequent in reported speech in translated English (Olohan and Baker, 2000).
  • 7. Translation Universals (3)• Simplification: the language of translations is assumed to be lexically and syntactically simpler than that of non-translated target language texts. – Narrower range of vocabulary: lower type-token ratio. – Lower level of information load: lower lexical density
  • 8. Translations Universals (4)• Normalization: exaggeration of typical features of the target language. Translations tend to be more unmarked and conventional, less creative, more conservative. – Conventionalization of metaphors and idioms. – Dialectal and colloquial expressions less frequent. – Lexical choice of ‘standard translation’ (Gellerstam, 1986).
  • 9. Translations Universals (5)• Interference from the source text and language (Toury, 1995; Mauranen, 2000). It can occur in the morphological, lexical, syntactic level, etc.• Unique items hypothesis (Tirkkonen-Condit, 2002): translated texts “manifest lower frequencies of linguistic elements that lack linguistic counterparts in the source languages such that these could also be used as translations equivalents” (Simplification, Normalization?)
  • 10. Translations Universals (6)However,The as yet relatively small amount of research intopotential translation universals has producedcontradictory results, which seems to suggest that asearch for real, ‘unrestricted’ universals in the field oftranslation might turn out to be unsuccessful. Puurtinen (2003: 403)
  • 11. Empirical Methods in TS (1)• Laviosa-Braithwaite, (1996): study of the linguistic nature of English translated text in a subsection of the English Comparable Corpora (ECC).• Øverås (1998): investigation of explicitation in translational English and translational Norwegian.• Olohan and Baker (2000): testing of the explicitation hypothesis based on the omission and inclusion of the reporting that in translational and original English.
  • 12. Empirical Methods in TS (2)• Borin and Prütz (2001): study of original newspaper articles in British and American English with articles translated from Swedish into English with POS n-gram tags.• Puurtinen (2003): research of potential features of translationese in a corpus of Finnish translations of children’s books.
  • 13. Empirical Methods in TS (3)• Baroni and Bernardini (2006): application of supervised machine learning techniques (SVMs) to detect translationese on two monolingual corpora of translated and original Italian texts.
  • 14. Empirical Methods in TS (4)• Rayson et al (2008): a descriptive study of translationese by comparing keyword, keyword classes (POS) and key semantic tags frequencies in original Chinese, translated English and edited translated English corpora.• Tirkonnen-Condit (2002): Translationese – a myth or an empirical fact? Human translators did not identify well if a text was translated or not.
  • 15. Theoretical Framework Crossroad of Corpus Linguistics, Translation Studies and Computational Linguistics• It is an empirical research where corpora are the main source of data and source of hypotheses (Laviosa-Braithwaite, 1996; Olohan and Baker, 2000, etc.)• It tries to validate the existence of translationese and to define the linguistic properties of translated language as a product. (Gellerstam, 1986; Baker, 1993, etc.)• Use of Computational Linguistic techniques such as information extraction and machine learning algorithms (Kindermann et al., 2003; Baroni and Bernardini, 2006)
  • 16. Hypotheses1. Translationese exists and it is observable across languages.2. This fact can be demonstrated with empirical methods applied to corpora in different languages.
  • 17. Methodology (1)Preliminary StudyTwo monolingual comparable corpora of original and translated Catalan ofart and architecture. 300.000 tokens each.• Corpus Building – Corpus compilation – Tokenization, tagging and parsing with CatCG (Alsina, Badia et al. 2002)• Corpus Exploitation – Exploitation with Wordsmith Tools (wordlists, frequency lists, type-token ratio, lexical density, concordance lists) – Implementation of scripts to extract collocations and POS n-grams with Python and NTLK• Implementation of a Machine Learning System – Machine Learning techniques (SVMs) in order to automatically classify texts in translated and not translated. – Training a set of the corpus and testing (Weka software).
  • 18. Methodology (2)Main experiment• Corpus Building – Corpus compilation (Spanish, French, English, German) – Tokenization, tagging and parsing• Corpus Exploitation – Exploitation with Wordsmith Tools (wordlists, frequency lists, type-token ratio, lexical density, concordance lists) – Implementation of scripts to extract collocations and POS n-grams with Python and NTLK• Implementation of a Machine Learning System – Machine Learning techniques (SVMs) in order to automatically classify texts in translated and not translated. – Training a set of the corpus and testing (Weka software).
  • 19. Working Plan
  • 20. Commented Biblography (1)• Baker, M. (1995). Corpora in Translation Studies: An Overview and Some Suggestions for Future Research. Target 7, 2: 223-243. – Definition of a new type of corpora: monolingual comparable corpora in order to “effect a shift away from comparing either ST with TT or language A with language B to comparing text production per se with translation.” – Type-token ratio, lexical density measures.• Borin, L. and Prütz, K. (2001). Through a Glass Darkly: Part-of-speech Distribution in Original and Translated Text, in Computational linguistics in the Netherlands 2000, 30-44. – Comparison of POS n-grams in order to determine if there are significant syntactical differences between original and translated language. – Overuse in translated English of preposition-initial sentences and sentence- initial adverbs.
  • 21. Commented Biblography (2)• Kindermann et al. (2003). Authorship attribution with support vector machines. Applied Intelligence 19, 109-123. – Different statistical techniques for authorship attribution are described: the log-likelihood ratio statistic, naïve bayesian probabilistic classifiers, multi-layer perceptrons, k-nearest neighbour classification (kNN), Support Vector Machines (SVMs), etc. – SVMs achieve better results than other classifiers in author attribution: they are fast and allow a great number of features as input.
  • 22. Commented Biblography (3)• Baroni, M. and Bernardini, S. (2006). A New Approach to the Study of Translationese: Machine-Learning the Difference between Original and Translated text, Literary and Linguistic Computing (2006) 21(3). 259-274 – A new explicit criterion to prove the existence of translationese: learnability by a machine. – SVMs allow the utilization of a big amount of features. – The application of SVMs achieve better results than professional human translators. – Their results show that translations are recognizable on purely grammatical/syntactic grounds (function words distribution and shallow syntactic patterns).
  • 23. Commented Biblography (4)• Tirkkonen-Condit, S. (2002). Translationese – a Myth or an Empirical Fact? Target, 14 (2): 207–20. – The hypothesis of translationese is, at least, controversial, whereas the unique items hypothesis can describe in a better way the translated or non-translated nature of a text. – Translated texts “manifest lower frequencies of linguistic elements that lack linguistic counterparts in the source languages such that these could also be used as translation equivalents”.