Parallel croups ,comparable
croups ,aligned croups
Presented
By
Ataulghafer & Shoiba sabir
Department of applied linguistics
GCUF
Parallel corpus
• A parallel corpus consists of two monolingual corpora. One corpus is
the translation of the other. For example, a novel and its translation .
Both languages need to be aligned, i.e. corresponding segments,
usually sentences or paragraphs, need to be matched. The user can
then search for all examples of a word or phrase in one language and
the results will be displayed together with the corresponding
sentences in the other language. The user can then observe how the
search word or phrase is translated.
• A parallel corpus: is a corpus that contains a collection of original
texts in language L1 and their translations into a set of languages L2 .
In most cases, parallel corpora contain data from only two languages.
Types of parallel corpora
• Parallel corpora can be bilingual or multilingual, i.e. they consist of texts of two or
more languages. They can be either unidirectional (e.g. an English text translated
into German), bidirectional (e.g. an English text translated into German and vice
versa), or multidirectional (e.g. an English text such as an EU regulation translated
into German, Spanish, French, etc.).
• Compilation of parallel corpora
The texts of a corpus are chosen according to specific criteria which depend on
the purpose for which it is created. In particular, compilers have to decide whether
to include a static or dynamic collection of texts, and entire texts or text samples.
Questions of authorship, size, topic, genre, medium and style have to be
considered we well.
In any case, a corpus is intended to comply with the following requirements: (i) it
should contain authentic (naturally occurring) language data; (ii) it should be
representative, i.e. it should contain data from different types of discourse.
Parallel corpora can be used for various
practical purposes.
• Contrastive linguistics
• Parallel corpora are used to compare linguistic features and their
frequencies in two languages subject to a contrastive analsis. They are
also used to investigate similarities and differences between the
source and the target language, making systematic, text-based
contrastive studies at different levels of analysis possible. In this way,
parallel corpora can provide new insights into the languages
compared concerning language-specific, typological and cultural
differences and similarities, and allow for quantitative methods of
analysis.
Translation studies
Parallel corpora may help translators to find translational
equivalents between the source and the target language. They provide
information on the frequency of words, specific uses of lexical items as
well as syntactic patterns. This procedure may help translators to
develop systematic translation strategies for words or phrases which
have no direct equivalent in the target language.
• Lexicology
• Parallel corpora are used more and more to design corpus-based
(bilingual) dictionaries.
Examples of parallel corpora
• English-German Translation Corpus
• English-Norwegian Parallel Corpus (ENPC)
• English-Swedish Parallel Corpus (ESPC)
Comparable Corpora
• Two (or more) corpora in different languages (e.g. English and
Spanish) or in different varieties of a language (e.g. Indian English and
Canadian English).
• They are designed along the same lines – will contain the same
proportions of newspaper texts, novels, casual conversation, etc.
• Comparable corpora of varieties of the same language can be used to
compare those varieties.
• Comparable corpora of different languages can be used by
translators to identify differences and equivalences in each language.
Comparable Corpora
• A Comparable Corpus is a collection of "similar" texts in different
languages or in different varieties of a language.
• The aim of these type of corpora is to compare the languages or
varieties presented in similar circumstances of communication,
without the distorsion which appear in translated texts of Parallel
Corpora
• Examples of comparable corpora are those mirrored on the Brown
corpus of Standard American English, for example, the LOB Corpus
(British English), and the Kolhapur Corpus (Indian English).
• Example International Corpus of English (ICE) are comparable corpora
of 1 million words each of different varieties of English.
Aligned corpus
• An aligned is a kind of bilingual or multilingual corpus ,in which text
samples from one language their translation into other language are
aligned paragraph by paragraph, sentence by sentence ,phrase by
phrase ,word by word ,if possible given character by character.
• For the corpus to be useful it is necessary to identify
which sentences in the sub-corpora are translations of each other,
and which words are translations of each other. A corpus which
shows these identifications is known as an aligned corpus as it makes
an explicit link between the elements which are mutual translations
of each other. For example, in a corpus the sentences "Das Buch ist
auf dem Tisch" and "The book is on the table" might be aligned to
one another. At a further level, specific words might be aligned, e.g.
"Das" with "The". This is not always a simple process, however, as
often one word in one language might be equal to two words in
another language, e.g. the German word "raucht" would be
equivalent to "is smoking" in English.
Thank you

Types of corpus linguistics Parallel ,aligned...

  • 1.
    Parallel croups ,comparable croups,aligned croups Presented By Ataulghafer & Shoiba sabir Department of applied linguistics GCUF
  • 2.
    Parallel corpus • Aparallel corpus consists of two monolingual corpora. One corpus is the translation of the other. For example, a novel and its translation . Both languages need to be aligned, i.e. corresponding segments, usually sentences or paragraphs, need to be matched. The user can then search for all examples of a word or phrase in one language and the results will be displayed together with the corresponding sentences in the other language. The user can then observe how the search word or phrase is translated.
  • 3.
    • A parallelcorpus: is a corpus that contains a collection of original texts in language L1 and their translations into a set of languages L2 . In most cases, parallel corpora contain data from only two languages.
  • 4.
    Types of parallelcorpora • Parallel corpora can be bilingual or multilingual, i.e. they consist of texts of two or more languages. They can be either unidirectional (e.g. an English text translated into German), bidirectional (e.g. an English text translated into German and vice versa), or multidirectional (e.g. an English text such as an EU regulation translated into German, Spanish, French, etc.). • Compilation of parallel corpora The texts of a corpus are chosen according to specific criteria which depend on the purpose for which it is created. In particular, compilers have to decide whether to include a static or dynamic collection of texts, and entire texts or text samples. Questions of authorship, size, topic, genre, medium and style have to be considered we well. In any case, a corpus is intended to comply with the following requirements: (i) it should contain authentic (naturally occurring) language data; (ii) it should be representative, i.e. it should contain data from different types of discourse.
  • 5.
    Parallel corpora canbe used for various practical purposes. • Contrastive linguistics • Parallel corpora are used to compare linguistic features and their frequencies in two languages subject to a contrastive analsis. They are also used to investigate similarities and differences between the source and the target language, making systematic, text-based contrastive studies at different levels of analysis possible. In this way, parallel corpora can provide new insights into the languages compared concerning language-specific, typological and cultural differences and similarities, and allow for quantitative methods of analysis.
  • 6.
    Translation studies Parallel corporamay help translators to find translational equivalents between the source and the target language. They provide information on the frequency of words, specific uses of lexical items as well as syntactic patterns. This procedure may help translators to develop systematic translation strategies for words or phrases which have no direct equivalent in the target language. • Lexicology • Parallel corpora are used more and more to design corpus-based (bilingual) dictionaries.
  • 7.
    Examples of parallelcorpora • English-German Translation Corpus • English-Norwegian Parallel Corpus (ENPC) • English-Swedish Parallel Corpus (ESPC)
  • 8.
    Comparable Corpora • Two(or more) corpora in different languages (e.g. English and Spanish) or in different varieties of a language (e.g. Indian English and Canadian English). • They are designed along the same lines – will contain the same proportions of newspaper texts, novels, casual conversation, etc. • Comparable corpora of varieties of the same language can be used to compare those varieties. • Comparable corpora of different languages can be used by translators to identify differences and equivalences in each language.
  • 9.
    Comparable Corpora • AComparable Corpus is a collection of "similar" texts in different languages or in different varieties of a language. • The aim of these type of corpora is to compare the languages or varieties presented in similar circumstances of communication, without the distorsion which appear in translated texts of Parallel Corpora • Examples of comparable corpora are those mirrored on the Brown corpus of Standard American English, for example, the LOB Corpus (British English), and the Kolhapur Corpus (Indian English).
  • 10.
    • Example InternationalCorpus of English (ICE) are comparable corpora of 1 million words each of different varieties of English.
  • 11.
    Aligned corpus • Analigned is a kind of bilingual or multilingual corpus ,in which text samples from one language their translation into other language are aligned paragraph by paragraph, sentence by sentence ,phrase by phrase ,word by word ,if possible given character by character.
  • 12.
    • For thecorpus to be useful it is necessary to identify which sentences in the sub-corpora are translations of each other, and which words are translations of each other. A corpus which shows these identifications is known as an aligned corpus as it makes an explicit link between the elements which are mutual translations of each other. For example, in a corpus the sentences "Das Buch ist auf dem Tisch" and "The book is on the table" might be aligned to one another. At a further level, specific words might be aligned, e.g. "Das" with "The". This is not always a simple process, however, as often one word in one language might be equal to two words in another language, e.g. the German word "raucht" would be equivalent to "is smoking" in English.
  • 13.