2. Parallel corpus
• A parallel corpus consists of two monolingual corpora. One corpus is
the translation of the other. For example, a novel and its translation .
Both languages need to be aligned, i.e. corresponding segments,
usually sentences or paragraphs, need to be matched. The user can
then search for all examples of a word or phrase in one language and
the results will be displayed together with the corresponding
sentences in the other language. The user can then observe how the
search word or phrase is translated.
3. • A parallel corpus: is a corpus that contains a collection of original
texts in language L1 and their translations into a set of languages L2 .
In most cases, parallel corpora contain data from only two languages.
4. Types of parallel corpora
• Parallel corpora can be bilingual or multilingual, i.e. they consist of texts of two or
more languages. They can be either unidirectional (e.g. an English text translated
into German), bidirectional (e.g. an English text translated into German and vice
versa), or multidirectional (e.g. an English text such as an EU regulation translated
into German, Spanish, French, etc.).
• Compilation of parallel corpora
The texts of a corpus are chosen according to specific criteria which depend on
the purpose for which it is created. In particular, compilers have to decide whether
to include a static or dynamic collection of texts, and entire texts or text samples.
Questions of authorship, size, topic, genre, medium and style have to be
considered we well.
In any case, a corpus is intended to comply with the following requirements: (i) it
should contain authentic (naturally occurring) language data; (ii) it should be
representative, i.e. it should contain data from different types of discourse.
5. Parallel corpora can be used for various
practical purposes.
• Contrastive linguistics
• Parallel corpora are used to compare linguistic features and their
frequencies in two languages subject to a contrastive analsis. They are
also used to investigate similarities and differences between the
source and the target language, making systematic, text-based
contrastive studies at different levels of analysis possible. In this way,
parallel corpora can provide new insights into the languages
compared concerning language-specific, typological and cultural
differences and similarities, and allow for quantitative methods of
analysis.
6. Translation studies
Parallel corpora may help translators to find translational
equivalents between the source and the target language. They provide
information on the frequency of words, specific uses of lexical items as
well as syntactic patterns. This procedure may help translators to
develop systematic translation strategies for words or phrases which
have no direct equivalent in the target language.
• Lexicology
• Parallel corpora are used more and more to design corpus-based
(bilingual) dictionaries.
7. Examples of parallel corpora
• English-German Translation Corpus
• English-Norwegian Parallel Corpus (ENPC)
• English-Swedish Parallel Corpus (ESPC)
8. Comparable Corpora
• Two (or more) corpora in different languages (e.g. English and
Spanish) or in different varieties of a language (e.g. Indian English and
Canadian English).
• They are designed along the same lines – will contain the same
proportions of newspaper texts, novels, casual conversation, etc.
• Comparable corpora of varieties of the same language can be used to
compare those varieties.
• Comparable corpora of different languages can be used by
translators to identify differences and equivalences in each language.
9. Comparable Corpora
• A Comparable Corpus is a collection of "similar" texts in different
languages or in different varieties of a language.
• The aim of these type of corpora is to compare the languages or
varieties presented in similar circumstances of communication,
without the distorsion which appear in translated texts of Parallel
Corpora
• Examples of comparable corpora are those mirrored on the Brown
corpus of Standard American English, for example, the LOB Corpus
(British English), and the Kolhapur Corpus (Indian English).
10. • Example International Corpus of English (ICE) are comparable corpora
of 1 million words each of different varieties of English.
11. Aligned corpus
• An aligned is a kind of bilingual or multilingual corpus ,in which text
samples from one language their translation into other language are
aligned paragraph by paragraph, sentence by sentence ,phrase by
phrase ,word by word ,if possible given character by character.
12. • For the corpus to be useful it is necessary to identify
which sentences in the sub-corpora are translations of each other,
and which words are translations of each other. A corpus which
shows these identifications is known as an aligned corpus as it makes
an explicit link between the elements which are mutual translations
of each other. For example, in a corpus the sentences "Das Buch ist
auf dem Tisch" and "The book is on the table" might be aligned to
one another. At a further level, specific words might be aligned, e.g.
"Das" with "The". This is not always a simple process, however, as
often one word in one language might be equal to two words in
another language, e.g. the German word "raucht" would be
equivalent to "is smoking" in English.