Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

P99 1067


Published on

  • Be the first to comment

  • Be the first to like this

P99 1067

  1. 1. Automatic Identification of Word Translationsfrom Unrelated English and German CorporaReinhard RappUniversity of Mainz, FASKD-76711 Germersheim, Germanyrapp @usun2.fask.uni-mainz.deAbstractAlgorithms for the alignment of words intranslated texts are well established. How-ever, only recently new approaches havebeen proposed to identify word translationsfrom non-parallel or even unrelated texts.This task is more difficult, because moststatistical clues useful in the processing ofparallel texts cannot be applied to non-par-allel texts. Whereas for parallel texts insome studies up to 99% of the word align-ments have been shown to be correct, theaccuracy for non-parallel texts has beenaround 30% up to now. The current study,which is based on the assumption that thereis a correlation between the patterns of wordco-occurrences in corpora of different lan-guages, makes a significant improvement toabout 72% of word translations identifiedcorrectly.1 IntroductionStarting with the well-known paper of Brown etal. (1990) on statistical machine translation,there has been much scientific interest in thealignment of sentences and words in translatedtexts. Many studies show that for nicely parallelcorpora high accuracy rates of up to 99% can beachieved for both sentence and word alignment(Gale & Church, 1993; Kay & R/Sscheisen,1993). Of course, in practice - due to omissions,transpositions, insertions, and replacements inthe process of translation - with real texts theremay be all kinds of problems, and therefore ro-bustness is still an issue (Langlais et al., 1998).Nevertheless, the results achieved with thesealgorithms have been found useful for the corn-pilation of dictionaries, for checking the con-sistency of terminological usage in translations,for assisting the terminological work of trans-lators and interpreters, and for example-basedmachine translation. By now, some alignmentprograms are offered commercially: Translationmemory tools for translators, such as IBMsTranslation Manager or Trados TranslatorsWorkbench, are bundled or can be upgradedwith programs for sentence alignment.Most of the proposed algorithms first con-duct an alignment of sentences, that is, they lo-cate those pairs of sentences that are translationsof each other. In a second step a word alignmentis performed by analyzing the correspondencesof words in each pair of sentences. The algo-rithms are usually based on one or several of thefollowing statistical clues:1. correspondence of word and sentence order2. correlation between word frequencies3. cognates: similar spelling of words in relatedlanguagesAll these clues usually work well for paralleltexts. However, despite serious efforts in thecompilation of parallel corpora (Armstrong etal., 1998), the availability of a large-enough par-allel corpus in a specific domain and for a givenpair of languages is still an exception. Since theacquisition of monolingual corpora is mucheasier, it would be desirable to have a programthat can determine the translations of wordsfrom comparable (same domain) or possiblyunrelated monolingnal texts of two languages.This is what translators and interpreters usuallydo when preparing terminology in a specificfield: They read texts corresponding to this fieldin both languages and draw their conclusions onword correspondences from the usage of the519
  2. 2. terms. Of course, the translators and interpreterscan understand the texts, whereas our programsare only considering a few statistical clues.For non-parallel texts the first clue, which isusually by far the strongest of the three men-tioned above, is not applicable at all. The secondclue is generally less powerful than the first,since most words are ambiguous in natural lan-guages, and many ambiguities are differentacross languages. Nevertheless, this clue is ap-plicable in the case of comparable texts, al-though with a lower reliability than for paralleltexts. However, in the case of unrelated texts, itsusefulness may be near zero. The third clue isgenerally limited to the identification of wordpairs with similar spelling. For all other pairs, itis usually used in combination with the firstclue. Since the first clue does not work withnon-parallel texts, the third clue is useless forthe identification of the majority of pairs. Forunrelated languages, it is not applicable anyway.In this situation, Rapp (1995) proposed usinga clue different from the three mentioned above:His co-occurrence clue is based on the as-sumption that there is a correlation between co-occurrence patterns in different languages. Forexample, if the words teacher and school co-occur more often than expected by chance in acorpus of English, then the German translationsof teacher and school, Lehrer and Schule,should also co-occur more often than expectedin a corpus of German. In a feasibility study heshowed that this assumption actually holds forthe language pair English/German even in thecase of unrelated texts. When comparing anEnglish and a German co-occurrence matrix ofcorresponding words, he found a high corre-lation between the co-occurrence patterns of thetwo matrices when the rows and columns ofboth matrices were in corresponding word order,and a low correlation when the rows and col-umns were in random order.The validity of the co-occurrence clue is ob-vious for parallel corpora, but - as empiricallyshown by Rapp - it also holds for non-parallelcorpora. It can be expected that this clue willwork best with parallel corpora, second-bestwith comparable corpora, and somewhat worsewith unrelated corpora. In all three cases, theproblem of robustness - as observed whenapplying the word-order clue to parallel corpo-ra- is not severe. Transpositions of text seg-ments have virtually no negative effect, andomissions or insertions are not critical. How-ever, the co-occurrence clue when applied tocomparable corpora is much weaker than theword-order clue when applied to parallel cor-pora, so larger corpora and well-chosen sta-tistical methods are required.After an attempt with a context heterogeneitymeasure (Fung, 1995) for identifying wordtranslations, Fung based her later work also onthe co-occurrence assumption (Fung & Yee,1998; Fung & McKeown, 1997). By presup-posing a lexicon of seed words, she avoids theprohibitively expensive computational effort en-countered by Rapp (1995). The method des-cribed here - although developed independentlyof Fungs work- goes in the same direction.Conceptually, it is a trivial case of Rappsmatrix permutation method. By simply assumingan initial lexicon the large number of permu-tations to be considered is reduced to a muchsmaller number of vector comparisons. Themain contribution of this paper is to describe apractical implementation based on the co-occur-rence clue that yields good results.2 ApproachAs mentioned above, it is assumed that acrosslanguages there is a correlation between the co-occurrences of words that are translations ofeach other. If - for example - in a text of onelanguage two words A and B co-occur more of-ten than expected by chance, then in a text ofanother language those words that are transla-tions of A and B should also co-occur more fre-quently than expected. This is the only statisti-cal clue used throughout this paper.It is further assumed that there is a smalldictionary available at the beginning, and thatour aim is to expand this base lexicon. Using acorpus of the target language, we first compute aco-occurrence matrix whose rows are all wordtypes occurring in the corpus and whose col-unms are all target words appearing in the baselexicon. We now select a word of the sourcelanguage whose translation is to be determined.Using our source-language corpus, we compute520
  3. 3. a co-occurrence vector for this word. We trans-late all known words in this vector to the targetlanguage. Since our base lexicon is small, onlysome of the translations are known. All un-known words are discarded from the vector andthe vector positions are sorted in order to matchthe vectors of the target-language matrix. Withthe resulting vector, we now perform a similar-ity computation to all vectors in the co-occur-rence matrix of the target language. The vectorwith the highest similarity is considered to bethe translation of our source-language word.3 Simulation3.1 Language ResourcesTo conduct the simulation, a number of resour-ces were required. These are1. a German corpus2. an English corpus3. a number of German test words with knownEnglish translations4. a small base lexicon, German to EnglishAs the German corpus, we used 135 millionwords of the newspaper Frankfurter AllgemeineZeitung (1993 to 1996), and as the Englishcorpus 163 million words of the Guardian (1990to 1994). Since the orientation of the twonewspapers is quite different, and since the timespans covered are only in part overlapping, thetwo corpora can be considered as more or lessunrelated.For testing our results, we started with a listof 100 German test words as proposed by Rus-sell (1970), which he used for an associationexperiment with German subjects. By lookingup the translations for each of these 100 words,we obtained a test set for evaluation.Our German/English base lexicon is derivedfrom the Collins Gem German Dictionary withabout 22,300 entries. From this we eliminatedall multi-word entries, so 16,380 entries re-mained. Because we had decided on our testword list beforehand, and since it would notmake much sense to apply our method to wordsthat are already in the base lexicon, we also re-moved all entries belonging to the 100 testwords.3.2 Pre-processingSince our corpora are very large, to save diskspace and processing time we decided to removeall function words from the texts. This was doneon the basis of a list of approximately 600German and another list of about 200 Englishfunction words. These lists were compiled bylooking at the closed class words (mainly ar-ticles, pronouns, and particles) in an English anda German morphological lexicon (for details seeLezius, Rapp, & Wettler, 1998) and at wordfrequency lists derived from our corpora. 1 Byeliminating function words, we assumed wewould lose little information: Function wordsare often highly ambiguous and their co-occur-rences are mostly based on syntactic instead ofsemantic patterns. Since semantic patterns aremore reliable than syntactic patterns acrosslanguage families, we hoped that eliminating thefunction words would give our method moregenerality.We also decided to lemmatize our corpora.Since we were interested in the translations ofbase forms only, it was clear that lemmatizationwould be useful. It not only reduces the sparse-data problem but also takes into account thatGerman is a highly inflectional language,whereas English is not. For both languages weconducted a partial lemmatization procedurethat was based only on a morphological lexiconand did not take the context of a word form intoaccount. This means that we could not lem-matize those ambiguous word forms that can bederived from more than one base form. How-ever, this is a relatively rare case. (According toLezius, Rapp, & Wettler, 1998, 93% of the to-kens of a German text had only one lemma.) Al-though we had a context-sensitive lemmatizerfor German available (Lezius, Rapp, & Wettler,1998), this was not the case for English, so forreasons of symmetry we decided not to use thecontext feature.I In cases in which an ambiguous word can be both acontent and a function word (e.g., can), preferencewas given to those interpretations that appeared tooccur more frequently.521
  4. 4. 3.3 Co-occurrence CountingFor counting word co-occurrences, in most otherstudies a fixed window size is chosen and it isdetermined how often each pair of words occurswithin a text window of this size. However, thisapproach does not take word order within awindow into account. Since it has been empiri-cally observed that word order of content wordsis often similar between languages (even be-tween unrelated languages such as English andChinese), and since this may be a useful statisti-cal clue, we decided to modify the common ap-proach in the way proposed by Rapp (1996, p.162). Instead of computing a single co-occur-rence vector for a word A, we compute several,one for each position within the window. Forexample, if we have chosen the window size 2,we would compute a first co-occurrence vectorfor the case that word A is two words ahead ofanother word B, a second vector for the case thatword A is one word ahead of word B, a thirdvector for A directly following B, and a fourthvector for A following two words after B. If weadded up these four vectors, the result would bethe co-occurrence vector as obtained when nottaking word order into account. However, this isnot what we do. Instead, we combine the fourvectors of length n into a single vector of length4n.Since preliminary experiments showed that awindow size of 3 with consideration of wordorder seemed to give somewhat better resultsthan other window types, the results reportedhere are based on vectors of this kind. However,the computational methods described below arein the same way applicable to window sizes ofany length with or without consideration ofword order.3.4 Association FormulaOur method is based on the assumption thatthere is a correlation between the patterns ofword co-occurrences in texts of different lan-guages. However, as Rapp (1995) proposed, thiscorrelation may be strengthened by not using theco-occurrence counts directly, but associationstrengths between words instead. The idea is toeliminate word-frequency effects and to empha-size significant word pairs by comparing theirobserved co-occurrence counts with their ex-pected co-occurrence counts. In the past, for thispurpose a number of measures have been pro-posed. They were based on mutual information(Church & Hanks, 1989), conditional probabili-ties (Rapp, 1996), or on some standard statisti-cal tests, such as the chi-square test or the log-likelihood ratio (Dunning, 1993). For the pur-pose of this paper, we decided to use the log-likelihood ratio, which is theoretically welljustified and more appropriate for sparse datathan chi-square. In preliminary experiments italso led to slightly better results than the con-ditional probability measure. Results based onmutual information or co-occurrence countswere significantly worse. For efficient compu-tation of the log-likelihood ratio we used the fol-lowing formula: 2kiiN- 2 log ~ = ~ ki~ log c~Rji,j~{l,2}kilN -- kl2N= kll log c-~-+kl2 log c,R2• k21N -- k22N+ k21 log ~ + g22 log c2R2whereC1=kll +k12 C2 =k21 +k22Rl = kit + k2t Rz = ki2 + k22N=kll+k12+k21+k22with parameters kij expressed in terms of corpusfrequencies:kl~ = frequency of common occurrence ofword A and word Bkl2 = corpus frequency of word A - kllk21 = corpus frequency of word B - kllk22 = size of corpus (no. of tokens) - corpusfrequency of A - corpus frequency of BAll co-occurrence vectors were transformed us-ing this formula. Thereafter, they were nor-malized in such a way that for each vector thesum of its entries adds up to one. In the rest ofthe paper, we refer to the transformed and nor-malized vectors as association vectors.2This formulation of the log-likelihood ratio was pro-posed by Ted Dunning during a discussion on thecorpora mailing list (e-mail of July 22, 1997). It isfaster and more mnemonic than the one in Dunning(1993).522
  5. 5. 3.5 Vector SimilarityTo determine the English translation of an un-known German word, the association vector ofthe German word is computed and compared toall association vectors in the English associationmatrix. For comparison, the correspondencesbetween the vector positions and the columns ofthe matrix are determined by using the baselexicon. Thus, for each vector in the Englishmatrix a similarity value is computed and theEnglish words are ranked according to thesevalues. It is expected that the correct translationis ranked first in the sorted list.For vector comparison, different similaritymeasures can be considered. Salton & McGill(1983) proposed a number of measures, such asthe Cosine coefficient, the Jaccard coefficient,and the Dice coefficient (see also Jones & Fur-nas, 1987). For the computation of related termsand synonyms, Ruge (1995), Landauer andDumais (1997), and Fung and McKeown (1997)used the cosine measure, whereas Grefenstette(1994, p. 48) used a weighted Jaccard measure.We propose here the city-block metric, whichcomputes the similarity between two vectors Xand Y as the sum of the absolute differences ofcorresponding vector positions:S:Z[Xi -Yi[i=lIn a number of experiments we compared it toother similarity measures, such as the cosinemeasure, the Jaccard measure (standard and bi-nary), the Euclidean distance, and the scalarproduct, and found that the city-block metricyielded the best results. This may seem sur-prising, since the formula is very simple and thecomputational effort smaller than with the othermeasures. It must be noted, however, that theother authors applied their similarity measuresdirectly to the (log of the) co-occurrence vec-tors, whereas we applied the measures to the as-sociation vectors based on the log-likelihoodratio. According to our observations, estimatesbased on the log-likelihood ratio are generallymore reliable across different corpora and lan-guages.3.6 Simulation ProcedureThe results reported in the next section wereobtained using the following procedure:1. Based on the word co-occurrences in theGerman corpus, for each of the 100 Germantest words its association vector was com-puted. In these vectors, all entries belongingto words not found in the English part of thebase lexicon were deleted.2. Based on the word co-occurrences in theEnglish corpus, an association matrix wascomputed whose rows were all word types ofthe corpus with a frequency of 100 or higher3and whose columns were all English wordsoccurring as first translations of the Germanwords in the base lexicon.43. Using the similarity function, each of theGerman vectors was compared to all vectorsof the English matrix. The mapping betweenvector positions was based on the first trans-lations given in the base lexicon. For each ofthe German source words, the English vo-cabulary was ranked according to the re-suiting similarity value.3 The limitation to words with frequencies above 99was introduced for computational reasons to reducethe number of vector comparisons and thus speed upthe program. (The English corpus contains 657,787word types after lemmatization, which leads toextremely large matrices.) The purpose of thislimitation was not to limit the number of translationcandidates considered. Experiments with lowerthresholds showed that this choice has little effect onthe results to our set of test words.4 This means that alternative translations of a wordwere not considered. Another approach, as conductedby Fung & Yee (1998), would be to consider allpossible translations listed in the lexicon and to givethem equal (or possibly descending) weight. Ourdecision was motivated by the observation that manywords have a salient first translation and that thistranslation is listed first in the Collins Gem Dictio-nary German-English. We did not explore this issuefurther since in a small pocket dictionary only fewambiguities are listed.523
  6. 6. 4 Results and EvaluationTable 1 shows the results for 20 of the 100 Ger-man test words. For each of these test words, thetop five translations as automatically generatedare listed. In addition, for each word its ex-pected English translation from the test set isgiven together with its position in the rankedlists of computed translations. The positions inthe ranked lists are a measure for the quality ofthe predictions, with a 1 meaning that the pre-diction is correct and a high value meaning thatthe program was far from predicting the correctword.If we look at the table, we see that in manycases the program predicts the expected word,with other possible translations immediatelyfollowing. For example, for the German wordHiiuschen, the correct translations bungalow,cottage, house, and hut are listed. In other cases,typical associates follow the correct translation.For example, the correct translation of Miid-chen, girl, is followed by boy, man, brother, andlady. This behavior can be expected from ourassociationist approach. Unfortunately, in somecases the correct translation and one of itsstrong associates are mixed up, as for examplewith Frau, where its correct translation, woman,is listed only second after its strong associateman. Another example of this typical kind oferror is pfeifen, where the correct translationwhistle is listed third after linesman and referee.Let us now look at some cases where the pro-gram did particularly badly. For Kohl we hadexpected its dictionary translation cabbage,but- given that a substantial part of our news-paper corpora consists of political texts - we donot need to further explain why our programlists Major, Kohl, Thatcher, Gorbachev, andBush, state leaders who were in office duringthe time period the texts were written. In othercases, such as Krankheit and Whisky, the simu-lation program simply preferred the British us-age of the Guardian over the American usage inour test set: Instead of sickness, the programpredicted disease and illness, and instead ofwhiskey it predicted whisky.A much more severe problem is that our cur-rent approach cannot properly handle ambigui-ties: For the German word weifl it does not pre-dict white, but instead know. The reason is thatweifl can also be third person singular of theGerman verb wissen (to know), which in news-paper texts is more frequent than the colorwhite. Since our lemmatizer is not context-sen-sitive, this word was left unlemmatized, whichexplains the result.To be able to compare our results with otherwork, we also did a quantitative evaluation. Forall test words we checked whether the predictedtranslation (first word in the ranked list) wasidentical to our expected translation. This wastrue for 65 of the 100 test words. However, insome cases the choice of the expected transla-tion in the test set had been somewhat arbitrary.For example, for the German word Strafle wehad expected street, but the system predictedroad, which is a translation quite as good.Therefore, as a better measure for the accuracyof our system we counted the number of timeswhere an acceptable translation of the sourceword is ranked first. This was true for 72 of the100 test words, which gives us an accuracy of72%. In another test, we checked whether an ac-ceptable translation appeared among the top 10of the ranked lists. This was true in 89 cases, sFor comparison, Fung & McKeown (1997)report an accuracy of about 30% when only thetop candidate is counted. However, it must beemphasized that their result has been achievedunder very different circumstances. On the onehand, their task was more difficult because theyworked on a pair of unrelated languages (Eng-lish/Japanese) using smaller corpora and a ran-dom selection of test words, many of whichwere multi-word terms. Also, they predeter-mined a single translation as being correct. Onthe other hand, when conducting their evalua-tion, Fung & McKeown limited the vocabularythey considered as translation candidates to afew hundred terms, which obviously facilitatesthe task.5 We did not check for the completeness of thetranslations found (recall), since this measure dependsvery much on the size of the dictionary used as thestandard.524
  7. 7. German testwordBabyBrotFraugelbH~iuschenKindKohlKrankheitM~idchenMusikOfenpfeifenReligionSchafSoldatStraBesiiBTabakweiBWhiskyexpected trans-lation and rankbaby 1bread 1woman 2yellow 1cottage 2child 1cabbage 17074sickness 86babybreadmanyellowbungalowchildMajordiseasetop five translations as automatically generatedchild mother daughter fathercheese meat food butterwoman boy friend wifeblue red pink greencottage house hut villagedaughter son father motherKohl Thatcher Gorbachev Bushillness Aids patient doctorgirl 1 girlmusic 1 music dancestove 3 heat oven stove housewhistle 3 linesman referee whistle blow offsidereligion 1sheep 1soldier 1street 2boy man brother ladytheatre musical songburnreligion culture faith religious beliefsheep cattle cow pig goatsoldier army troop force civilianroad street city town walksweet smell delicious taste lovesweet 1tobacco 1white 46whiskey 11tobacco cigarette consumption nicotine drinkknow say thought see thinkwhisky beer Scotch bottle wineTable 1: Results for 20 of the 100 test words (for full list see Discussion and ConclusionThe method described can be seen as a simplecase of the gradient descent method proposed byRapp (1995), which does not need an initiallexicon but is computationally prohibitively ex-pensive. It can also be considered as an exten-sion from the monolingual to the bilingual caseof the well-established methods for semantic orsyntactic word clustering as proposed bySchtitze (1993), Grefenstette (1994), Ruge(1995), Rapp (1996), Lin (1998), and others.Some of these authors perform a shallow or fullsyntactical analysis before constructing the co-occurrence vectors. Others reduce the size of theco-occurrence matrices by performing a singularvalue decomposition. However, in yet un-published work we found that at least for thecomputation of synonyms and related wordsneither syntactical analysis nor singular valuedecomposition lead to significantly better resultsthan the approach described here when appliedto the monolingual case (see also Grefenstette,1993), so we did not try to include these me-thods in our system. Nevertheless, both methodsare of technical value since they lead to a re-duction in the size of the co-occurrence matri-ces.Future work has to approach the difficultproblem of ambiguity resolution, which has notbeen dealt with here. One possibility would beto semantically disambiguate the words in thecorpora beforehand, another to look at co-oc-currences between significant word sequencesinstead of co-occurrences between single words.To conclude with, let us add some specula-tion by mentioning that the ability to identifyword translations from non-parallel texts can beseen as an indicator in favor of the associationistview of human language acquisition (see alsoLandauer & Dumais, 1997, and Wettler & Rapp,1993). It gives us an idea of how it is possible toderive the meaning of unknown words fromtexts by only presupposing a limited number ofknown words and then iteratively expanding thisknowledge base. One possibility to get the pro-525
  8. 8. cess going would be to learn vocabulary lists asin school, another to simply acquire the namesof items in the physical world.AcknowledgementsI thank Manfred Wettler, Gisela Zunker-Rapp,Wolfgang Lezius, and Anita Todd for their sup-port of this work.ReferencesArmstrong, S.; Kempen, M.; Petitpierre, D.; Rapp,R.; Thompson, H. (1998). Multilingual Corpora forCooperation. Proceedings of the 1st InternationalConference on Linguistic Resources and Evalua-tion (LREC), Granada, Vol. 2, 975-980.Brown, P.; Cocke, J.; Della Pietra, S. A.; Della Pietra,V. J.; Jelinek, F.; Lafferty, J. D.; Mercer, R. L.;Rossin, P. S. (1990). A statistical approach to ma-chine translation. ComputationalLinguistics, 16(2),79-85.Church, K. W.; Hanks, P. (1989). Word associationnorms, mutual information, and lexicography. In:Proceedings of the 27th Annual Meeting of the As-sociation for Computational Linguistics. Vancou-ver, British Columbia, 76-83.Dunning, T. (1993). Accurate methods for the sta-tistics of surprise and coincidence. ComputationalLinguistics, 19(1), 61-74.Fung, P. (1995). Compiling bilingual lexicon entriesfrom a non-parallel English-Chinese corpus. Pro-ceedings of the 3rd Annual Workshop on VeryLarge Corpora, Boston, Massachusetts, 173-183.Fung, P.; McKeown, K. (1997). Finding terminologytranslations from non-parallel corpora. Proceedingsof the 5th Annual Workshop on Very Large Cor-pora, Hong Kong, 192-202.Fung, P.; Yee, L. Y. (1998). An IR approach fortranslating new words from nonparallel, compa-rable texts. In: Proceedingsof COLING-ACL1998,Montreal, Vol. 1,414-420.Gale, W. A.; Church, K. W. (1993). A program foraligning sentences in bilingual corpora. Computa-tional Linguistics, 19(3), 75-102.Grefenstette, G. (1993). Evaluation techniques forautomatic semantic extraction: comparing syntacticand window based approaches. In: Proceedings ofthe Workshopon Acquisition of Lexical Knowledgefrom Text,Columbus, Ohio.Grefenstette, G. (1994). Explorations in AutomaticThesaurus Discovery. Dordrecht: Kluwer.Jones, W. P.; Furnas, G. W. (1987). Pictures of rele-vance: a geometric analysis of similarity measures.Journal of the American Society for InformationScience, 38(6), 420-442.Kay, M.; Rfscheisen, M. (1993). Text-TranslationAlignment. Computational Linguistics, 19(1), 121-142.Landauer, T. K.; Dumais, S. T. (1997). A solution toPlatos problem: the latent semantic analysis theoryof acquisition, induction, and representation ofknowledge. Psychological Review, 104(2), 211-240.Langlais, P.; Simard, M.; V6ronis, J. (1998). Methodsand practical issues in evaluating alignment tech-niques. In: Proceedings of COLING-ACL 1998,Montreal, Vol. l, 711-717.Lezius, W.; Rapp, R.; Wettler, M. (1998). A freelyavailable morphology system, part-of-speech tag-ger, and context-sensitive lemmatizer for German.In: Proceedings of COLING-ACL 1998, Montreal,Vol. 2, 743-748.Lin, D. (1998). Automatic Retrieval and Clustering ofSimilar Words. In: Proceedings of COLING-ACL1998, Montreal, Vol. 2, 768-773.Rapp, R. (1995). Identifying word translations in non-parallel texts. In: Proceedings of the 33rd Meetingof the Association for Computational Linguistics.Cambridge, Massachusetts, 320-322.Rapp, R. (1996). Die Berechnung von Assoziationen.Hildesheim: Olms.Ruge, G. (1995). Human memory models and termassociation. Proceedings of the ACM SIGIR Con-ference, Seattle, 219-227.Russell, W. A. (1970). The complete German lan-guage norms for responses to 100 words from theKent-Rosanoff word association test. In: L. Post-man, G. Keppel (eds.): Norms of WordAssociation.New York: Academic Press, 53-94.Salton, G.; McGill, M. (1983). Introduction to Mod-em Information Retrieval. New York: McGraw-Hill.Schiitze, H. (1993). Part-of-speech induction fromscratch. In: Proceedings of the 31st Annual Meet-ing of the Association for Computational Lingu-istics, Columbus, Ohio, 251-258.Wettler, M.; Rapp, R. (1993). Computation of wordassociations based on the co-occurrences of wordsin large corpora. In: Proceedings of the 1st Work-shop on VeryLarge Corpora: Columbus, Ohio, 84-93.526