SlideShare a Scribd company logo
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 
_______________________________________________________________________________________ 
Volume: 03 Special Issue: 10 | NCCOTII 2014 | Jun-2014, Available @ http://www.ijret.org 46 
CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE Pratibha Bajpai1, Dr. Parul Verma2 1Research Scholar, Department of Information Technology, Amity University, Lucknow 2Assistant Professor, Department of Computer Science, Amity University, Lucknow Abstract Multilingual information is overflowing on internet these days. This increasing diversity of web pages in almost every popular language in the world should enable the user to access information in any language of his choice. But sometimes it is difficult for a user to write her request in a language which she could easily read and understand. This makes cross-language information retrieval (CLIR) and multilingual information retrieval (MLIR) for Web applications a valuable need of the day. It increases the accessibility of web users to retrieve information in any language while post their queries in their native language. The paper critically analyzes the various researchers work in the area of Indian language CLIR. In this paper we also present our prospective prototype for English to Hindi language CLIR. It will also discuss the issues related to the English to Hindi language translation. We had tested 30 queries manually using suggested prototype and found that the precision level is quite good. Keywords: Cross lingual Information Retrieval, Query Translation, Sense Disambiguation, English to Hindi Translation. 
--------------------------------------------------------------------***------------------------------------------------------------------ 1. INTRODUCTION A classic IR system accepts the user information need in a form of query and gives back the documents that are relevant to the user need. With the explosion of knowledge on the web, it became necessary to break the language barriers for the monolingual IR systems. This may allow the users of IR systems to give query in one language and retrieve documents in different languages. IR system, with different source and target language is called CLIR system. Cross-Lingual Information Retrieval (CLIR) translates the user query (given in source language) into the target language, and uses translated query to retrieve the target language documents. The drive for evaluation of monolingual and cross-lingual retrieval systems started with Cross-Language Evaluation Forum (CLEF) in European languages and NTCIR in Chinese-Japanese-Korean languages. It is only in the recent past that the Indian languages have gained importance in evaluation. From 2008, a specific campaign focusing on Indian languages started with the Forum for Information Retrieval Evaluation (FIRE). This resulted in the development of large document collection in some Indian languages like Bangla, Hindi, Marathi and Tamil. Through our paper we like to provide a brief review of the work done by various researchers in the field of Indian languages for CLIR system. The paper is organized as follows: section 2 illustrates different techniques used for query translation. Comparative analysis of CLIR approaches in Indian languages perspective is discussed in section 3. Section 4 describes our prototype for query translation and sense disambiguation while section 5 draws the conclusion. 
2. DIFFERENT TECHNIQUES FOR CLIR 
Based on different translation resources, three different techniques have been identified in CLIR: Dictionary based CLIR, Corpora based CLIR and Machine translator based CLIR. 
2.1 Machine Translation Machine translation, in simple terms, is a technique that makes use of software that translates text from one language to another language. But machine translation is not all about substitution of words from one language to another only; rather it also involves finding phrases and its counterparts in target language to produce good quality translations. Machine translation is of three types: 2.1.1 Rule Based Machine Translation Rule based MT uses linguistic information about source and target language. M. Nimaiti and Y. Izumi (2012) developed Japanese Uighur machine translation system using rule based approach. They propose a word-for-word translation system using subject verb agreement in Uighur. The results aren‟t positive and there are still some rooms for improvement. In case of Indian languages, R.Rajan et. al.(2009) propose a rule based system for translating English sentences to Malayalam by utilizing dependencies from parser, POS tagger and transfer link rules for reordering and rules for morphology. 2.1.2 Statistical Machine Translation 
Statistical machine translation generates translations using statistical methods based on bilingual text corpora. Dan Wu
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 
_______________________________________________________________________________________ 
Volume: 03 Special Issue: 10 | NCCOTII 2014 | Jun-2014, Available @ http://www.ijret.org 47 
& Daqing He (2010) conducted a series of CLIR experiments using Google Translate for translating queries. Their results show that with the help of relevance feedback, MT can achieve significant improvement over the monolingual baseline, no matter whether the query length are short or long. Kraaij & Simard(2003) experimentally claim that web can be used for automatic construction of parallel corpus which can then be used to train statistical translation models automatically. 2.1.3 Example Based Machine Translation Example based MT reads similar examples in the form of source text and its translation from the set of examples, adapting the examples to translate a new input. Sato and Nagao (1990) investigated the problem of example selection by approximate matching of input sentences and example sentences, using a similarity measure based on the syntactic similarity of dependency tree structures of a sentence pair in question and on the word distance of corresponding words, which were predefined in a thesaurus. Sumita et al. (1990) looked into example-based translation of Japanese noun phrases of the pattern [N1 no N2] into English as [N2 prep N1] or [N1 N2], based on a distance measure for the input phrase and example phrase, calculated as a linear weighted sum of the distances of the three sub-parts, each of which is predefined in a thesaurus. 2.2 Dictionary Based CLIR The most natural approach to cross-lingual IR is to replace each query term with most appropriate translations extracted automatically from Machine Readable Dictionaries (MRD). The translation using bilingual dictionaries is simple but Ballesteros and Croft (1996) and Hull & Grefenstette(1996) claim that it leads to a 40-60% loss in effectiveness as compared to monolingual retrieval. A.Pirkola (2001) asserts that the loss can be due to factors as untranslatable search keys due to limitations in dictionaries, processing of derived or inflected word forms, phrase and compound translation and lexical ambiguity in source and target languages. To handle these problems, researchers have made use of domain specific dictionaries for the dictionary coverage problem( Pirkola, 1998, 1999), Stemming and morphological analysis to handle inflected words(Hull, 1996, Krovetz, 1993; Porter, 1990), POS tagging for phrase translation(Ballesteros & Croft, 1997), corpus based query expansion (Ballesteros & Croft, 1998; Nie et al. , 1999; Sheridan et. al., 1997) and query structuring for the ambiguity problem(Pirkola, 1998, 1999; Sperer & Oard, 2000). 
2.3. Corpus Based Cross Lingual Information Retrieval Corpus based CLIR methods use multilingual terminology derived from parallel or comparable corpora for query translation and expansion. There are two types of corpus: 2.3.1 Parallel Corpus A parallel corpus is a collection where texts in one language are aligned with their translations in another language. Several systems have been developed to mine large parallel corpora from the web. Wang and Lin (2010) give a method which first identifies a set of seed URLs and crawl candidate bilingual websites. The obtained pages are cleaned and bilingual texts collected to construct comparable corpora. Wang et. al. (2004) exploit the bilingual search result pages obtained from a real search engine as a corpus for automatic translation of unknown query terms not included in the dictionary. They propose a PAT-tree based local maxima method for effective extraction of translation candidates. The approach gives excellent results. 2.3.2 Comparable Corpus Comparable corpus, on the other hand, consist of texts that are not translations, but share similar topics. They can be, e.g., newspaper collections written in the same time period in different countries. Sadat Fatiha (2011) exploit the idea of using multilingual based encyclopedias such as Wikipedia to extract terms and their translations to construct a bilingual ontology or enhance the coverage of existing ontologies. The method show promising results for any pair of languages. Qian & Meng (2008) expanded Chinese OOV phrase with its partial English translation and submitted to the search engine. The translation of OOV words is mined by preprocessing the snippets obtained to extract the main text from the web page. The strings obtained are sorted by weighted frequency to output the top n translation of OOV phrase. The method proves to obtain the translation with high time efficiency and high precision. 3. COMPARATIVE ANALYSIS OF CLIR APPROACHES FOR INDIAN LANGUAGES Cross-language retrieval is a budding field in India and the works are still in its primitive state. Table 1 analyzes the performance of various approaches used by the researchers for Indian languages. In many approaches the cross-lingual results are comparable to that of mono-lingual approaches. 
Table 1: Critical Analysis of CLIR for Indian Languages 
Languages 
Translation 
Size of test data/ Performance 
Specific Features 
English to Hindi A.Seetha , S.Das & M. Kumar (2007) 
Select first equivalent/ preferred –n/ random nth equivalent/ all equivalents from 
6219 hindi document test collection/ performance of strategy 1,2,3,4 are 64.80%, 57.90%, 11.83% and 57.13% of monolingual retrieval 
The four strategies are used to test the system performance on the number of equivalents in the query translation by selecting n equivalents from the list of the dictionary.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 
_______________________________________________________________________________________ 
Volume: 03 Special Issue: 10 | NCCOTII 2014 | Jun-2014, Available @ http://www.ijret.org 48 
Bilingual dictionary 
Tamil to English S. Saraswathi & A. Siddhiqaa (2010) 
Machine translation and Ontological tree 
200 documents from the domain “festival”/ relevance improves by 40% for English and 60% for Tamil 
A generic platform is built for bilingual IR which can be extended to any foreign or Indian language working with the same efficiency. 
English to Hindi 
A. Seetha, S. Das, J. Rana & M. Kumar 
(2010) 
Translation by Shabdanjali dictionary & query expansion by Hindi Wordnet. 
Fire 2010 Hindi test collection/ method is not very effective 
Query expansion reformulates the initial query by adding some new related words so that query provides a wider coverage than the original query. 
English to Malayalam P.L. Nikesh, S.M. Idicula, David Peter (2008) 
Bilingual dictionary developed in house. 
System proves to be efficient for CLIR 
A basic system can be constructed quickly once the linguistic tools become available. 
English to Hindi Larkey & Connell (2003) 
Probabilistic dictionary derived from parallel corpus 
41697 Hindi news articles/ method contributes to effective Hindi retrieval 
It combines the ranked lists from the Inquery search and the Language Modeling search to obtain the final ranking of retrieved documents. 
English to Hindi & Hindi to English S. Sethuramalingam & V. Varma (2008) 
Bilingual Dictionary 
English corpus consisted of 125,638 news articles from the Telegraph, Calcutta edition while Hindi corpus consisted of 95215 news articles published in Jagran/ English-Hindi CLIR performance is 58% while Hindi-English CLIR is 25% of the monolingual performance 
Disjunctive query formulation using weighted keywords give an overall better performance in both CLIR and Multi Lingual scenario. 
Tamil to English 
Bilingual dictionary 
Web/ The approach used improves the significance of the content retrieved and the overall efficiency of the process 
Using summarization techniques and snippet clustering the result closet to user‟s query is displayed. 
Bengali & Hindi to English D. Mandal & P. Banerjee (2007) 
Machine Translation using Bilingual dictionary 
English news corpus of LA Times 2002 containing 135153 documents/ Map for Bengali-English queries is 7.26 & for Hindi-English queries is 4.77 
Queries with named entities provided better results as compared to the queries without named entities implying the importance of a very good bilingual lexicon and transliteration tool in CLIR for Indian languages. 
Tamil to English D.Thenmozhi & C. Aravindan (2010) 
Machine Translation 
Agricultural ontology/ Retrieves pages with MAP of 95% 
The system exhibits a dynamic learning approach wherein any new word that is encountered in the translation process could be updated to the bilingual dictionary. 
Hindi to English R. Udupa & J. Jagarlamudi (2008) 
Probabilistic translation lexicon produced by Statistical Machine Learning 
Parallel corpus consisting of 100K sentence pairs from the news domain/ Retrieval performance is about 81% of that of monolingual system 
Transliteration mining of OOV words from the document performance whereas date restriction hurts the retrieval performance. 
Hindi to telugu to English P.Pingali & V.Verma (2006) 
Bilingual Dictionary 
English news corpus of LA Times 1995 containing 113005 documents & 56472 documents from Glasgow Herald of 1995/ The system is much robust 
Simple techniques such as dictionary lookup with minimal lemmatization such as suffix removal is not sufficient for Indian Languages CLIR. 
English to Bangla 
A.Imam & S. 
SMT using parallel corpus 
English to Bangla corpus of approximately 29000 sentences/ 
Improving corpus quality is about 3 times effectual than
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 
_______________________________________________________________________________________ 
Volume: 03 Special Issue: 10 | NCCOTII 2014 | Jun-2014, Available @ http://www.ijret.org 49 
Chowdhury (2011) 
NIST & BLUE scores (scoring system for evaluating the performance of a Machine Translation System.)are 4.6 and 0.39 which is below the standard 
increasing the corpus size for English-Bengali SMT. 
Tamil to English Pattabhi R.K Rao and Sobha. L (2010) 
Bilingual dictionary and ontology 
125638 documents from English news magazine “The Telegraph” / Results are encouraging 
The system performs well for queries for which the world knowledge has been imparted. 
3.1 Observation Cross lingual information retrieval for foreign languages like English, French, Chinese etc. has been an appealing area for researchers from long time. But Indian languages have grabbed attention only a decade back. The work done by researchers show mixed results in terms of improvement over monolingual retrieval in Indian language perspective. Anurag Seetha & S. Das (2010) performed translation on Fire 2010 Hindi test collection using Shabdanjali dictionary & query expansion by Hindi Wordnet. The method proved to be ineffective. It is because general dictionaries have low coverage problem. To remove this inefficiency Larkey and Connell (2003) used probabilistic dictionary derived from parallel corpus for English to Hindi translation and achieved effective cross lingual retrieval. Pattabhi R.K Rao and Sobha. L. (2010) found encouraging results by incorporating Bilingual dictionary and ontology. Other researchers have made use of machine translation for cross lingual retrieval. D.Thenmozhi & C. Aravindan (2010) used MT on agricultural domain and retrieved pages with MAP of 95%. MT systems produce high quality translations only in limited domains and are very expensive too. It involves the cost of creating bilingual dictionary, parallel corpora and the construction and evaluation of MT system. R. Udupa & J. Jagarlamudi (2008) used Probabilistic translation lexicon produced by Statistical Machine Learning while A.Imam & S. Chowdhury (2011) used SMT using parallel corpus for English to Bangla translation. Parallel or comparable corpora are yet other useful resources for CLIR. Parallel corpora are preferred in CLIR because they provide more accurate translation knowledge but due to their scarcity, comparable corpora are often used in CLIR. The above observation concludes that there is a wide scope of research to improve existing algorithms or developing new one to improve the performance level of CLIR system. 4. PROTOTYPE APPROACH In this section we propose an approach for cross-lingual information retrieval on the web and briefly discuss the components of the proposed design. The major components of the design are: Preprocessing, Query translation, Word sense disambiguation and Information Retrieval. Before we start discussing the major components of the system, we need to know the grammatical complexities of the two languages. 
4.1 Grammatical Complexities of English to Hindi Translation 
Hindi and English are morphologically different languages. Translating from poor (e.g. English) to rich (e.g. Hindi) morphology is a tough job and requires deeper linguistic investigation during translation. The major differences are: (i) The basic word order in Hindi is Subject-Object-Verb (SOV) as against SVO word order in English. But in Hindi, the constituents of a sentence can be freely moved around in the sentence without affecting the core meaning. E.g. the following sentence pair conveys the same meaning with different word order: राम ने सीता को देखा Ram ne Sita ko dekha सीता को राम ने देखा Sita ko Ram ne dekhaa The identity of Ram as the subject and Sita as the object in both sentences comes from the case markers ने (ne – nominative) and को (ko –accusative) (ii) Unlike English, vowel length and Vowel nasalization are meaningful in Hindi e.g. (Kam) means „less‟ and (Kaam) means „work‟ (Puuch) means „ask‟ and (puunch) means „tail‟ (iii) In English, prepositions precede the words to which they relate. In Hindi, such words are called postpositions because they follow the words they govern. (iv) Hindi is morphologically richer than English. This can be illustrated from following example: The plural-marker in the word “boys” in English is translated as ए (e – plural direct) or ओं (on – plural oblique): The boys went to school ऱड़के पाठशाऱा गये The boys ate apples. ऱड़को ने सेब खाये
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 
_______________________________________________________________________________________ 
Volume: 03 Special Issue: 10 | NCCOTII 2014 | Jun-2014, Available @ http://www.ijret.org 50 
Future tense in Hindi is marked on the verb. In the following example, “will go” is translated as जायेंगे (jaaenge), with एंगे (enge) as the future tense marker: The boys will go to school. ऱड़के पाठशाऱा जायेंगे (v) There are no articles in Hindi. Definiteness of a noun is indicated through pronoun, context or word order. (vi) All nouns in Hindi are either masculine or feminine. This means an arbitrary gender is assigned to the nouns that have a neutral gender in English e.g. „chair‟ is a feminine noun and „door‟ is a masculine noun in Hindi. 4.2 Preprocessing The first step in any CLIR system is preprocessing of query terms to speed up the translation process without affecting the retrieval quality. This preprocessing is done using tokenization, stemming and stop word removal. 4.2.1 Tokenization Tokenization is defined as an attempt to recognize the boundaries between words and isolate those parts of a query which should be translated in the source query. 4.2.2 Stop Word Removal Stop Words are words which do not contain important significance in Search Queries and hence can be removed from the query to increase search performance. Removing stop words can be done using a list that contains all stop words. 4.2.3 Stemming It maps all the different inflected forms of a word to the same stem. For languages like English which have weaker inflections, simple stemming algorithms can be used. Such algorithms only remove plural endings. In languages with stronger inflections, suffices are joined to the stem end to end. The advanced stemming algorithm can recognize such multiple endings and remove them in an iterative fashion. Porter stemmer, Snowball stemmer etc. are well known advanced stemming algorithm. 4.3 Query Translation 
In Query Translation, the given query is converted from Source language to Target language and the obtained query searches the database to get the documents in Target language. Query Translation often suffers from the problem of translation ambiguity and this problem is amplified due to the limited amount of context in short queries. Query translation can be done using any one technique including machine translation, dictionary based or corpus based method. The techniques have already been discussed in section 2. The query translation is quiet complex while translating English to Hindi query as the two languages are morphologically different from each other. Out of vocabulary (OOV) words are not translated even after morphological analysis. This type of words can be transliterated using the target language alphabet and be added to final queries. Not much work has been done for the translation of these two languages by Indian researchers till date. 
4.4 Ambiguity Removal in Translated Query Ambiguity is a common problem with all natural languages i.e. there exist a large number of words in these languages carrying more than one meaning. For instance, the English noun plant can mean green plant or factory or the word bank means financial institution or pool of a river. The correct sense of an ambiguous word can be selected based on the context where it occurs. This task of automatically assigning the most appropriate meaning to a polysemous word within a given context is called word sense disambiguation. Disambiguation algorithms use a variety of resources and follow different techniques. On the basis of resource utilization and their processing techniques, the disambiguation techniques can be classified as Knowledge Based Methods (resources used are Machine Readable Dictionaries, Thesaurus, Lexicons ), Supervised Learning Methods (Naïve Bayesian Classifier, Exemplar Based Classifier, Lazy Boosting Algorithm), Minimally Supervised Methods and Unsupervised Methods. 4.5 Information Retrieval after Query Translation and Ambiguity Removal The retrieval system presents the user a set of documents that match his query. The retrieval model is of three types: The Boolean, Vector Space and Probabilistic model. In Boolean model, queries are represented as Boolean expressions and only those documents that logically match the query is presented to the user leaving behind those documents that do not match at all. The major drawback with this model is that it only judges documents completely matching or not and does not determines the degree of matching. The other two methods present the ranked list of documents depending on the degree of matching. Vector Space method calculates the degree of matching by calculating the angle between the query vector and each document vector. The Probabilistic model estimates the probability that a document is relevant for the query on the basis of the assumption that the probability depends on the query and the document representation only. Step by Step Evaluation of CLIR Based on Prototype Approach The steps of the proposed approach can be explained by considering the following queries: Query 1: Hunger Strikes Tokenization- Using whitespace between words the tokens obtained from the query are „Hunger‟ and „Strikes‟
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 
_______________________________________________________________________________________ 
Volume: 03 Special Issue: 10 | NCCOTII 2014 | Jun-2014, Available @ http://www.ijret.org 51 
Stop Word Removal- No stop words exist in the above query. Stemming- Next using Porter stemmer, the inflected tokens are reduced to their base form. After stemming, query becomes Hunger strike. Query translation- The required translation of the query is „भूख हड़ताऱ’. Ambiguity Removal- Since the translated query is unambiguous, so no disambiguation is required. Precision- The precision of the query is .83, where the number of relevant documents is 10 out of top 12 retrieved documents appeared on first page. Query 2: Alcohol Consumption in India Tokenization- Tokens of the above query are „Alcohol‟, „consumption‟, „in‟ and „India‟. Stop word removal- Next stop word „in‟ is removed using stop word list given by MIT. The query now becomes Alcohol consumption India Stemming- Stemming using Porter stemmer returns the query as Alcohol consumpt India Query Translation- Hindi translation of the query is „भारत में शराब की खपत’. Ambiguity removal- The Hindi translation “भारत” is ambiguous i.e. it has multiple senses. It refers to country India as well as the son of Pandu, a Mahabharat character. The correct sense of a word can be identified based on the context of the query in which it appears using disambiguation algorithm. Precision- The precision of the query is 1.0, where the number of relevant documents is 10 out of 10 retrieved documents appeared on first page. The queries have been preprocessed and translated manually using tools like Potter stemmer, Stop Word list by MIT etc. and received positive results. Based on suggested approach we will formulize an algorithm for English to Hindi language query translation for CLIR. 5. CONCLUSIONS The respective work with regard to Indian languages has gained impetus in last decade and there is much to be explored in this field. It is quite obvious from the observations that there is still a scope of improvement in the performance level of CLIR. We presume that the proposed prototype system will prove to be competent with other existing systems. 
REFERENCES 
[1]. Ballesteros, L, and Bruce W Croft, “Phrasal Translation and Query Expansion Techniques for Cross Language Information Retrieval”. In: Proceedings of 20th International ACM SIGIR Conference in Research and Development in IR 1997. [2]. Ballesteros, L., and Croft, W.B. 1998. “Resolving ambiguity for cross-language retrieval.” In Proceedings of SIGIR Conference, pages 64-71, 1998. [3]. Chawre, S. M., Srikantha Rao. Domain Specific Information Retrieval in Multilingual Environment, International Journal of Recent Trends in Engineering, 2, 4, 179-181, 2009. [4]. Chinnakotla Kumar Manoj, Ranadive Sagar, Bhattacharyya Pushpak and Damani P. Om “Hindi and Marathi to English Cross Language Information Retrieval at CLEF 2007”, in the working notes of CLEF 2007. [5]. David A. Hull and Gregory Grefenstette. Querying across languages: A dictionary-based approach to multilingual information retrieval. In Proceedings of the 19th International Conference on Research and Development in Information Retrieval, pages 49–57, 1996. [6]. Dr. Saraswathi, S., Asma Siddhiqaa, M., Kalaimagal, K., and Kalaiyarasi M. BiLingual Information Retrieval System for English and Tamil, Journal Of Computing, 2,4, 85-89, April 2010. [7]. Grefenstette, G. (1998b). The problem of cross-language information retrieval. In Grefenstette (1998a), pages 1-9. [8]. Hsu Hung Ming, Tsai Feng Ming, and Hsin-Hsi Chen Query Expansion with ConceptNet and WordNet: An Intrinsic Comparison. In : AIRS 2006, LNCS 4182, (2006) 1-13. [9]. Hang Cui, Ji-Rong Wen, Jian-Yun Nie, and Wei-Ying Ma. Query Expansion by Mining User Logs IEEE. Transactions on Knowledge and Data Engineering, Vol. 15(4) 2003. [10]. Hiemstra, D. And De Jong, F. 1999. “Disambiguation strategies for cross-language information retrieval”. In Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries. 274-293. [11]. Jagadeesh Jagarlamudi and Kumaran, A. Cross- Lingual Information Retrieval System for Indian Languages, Proceedings of CLEF 2007, 2007. [12]. Kishida, K. (2005). Technical issues of cross-language information retrieval: a review. Inf. Process. Manage., 41(3):433- 455. [13]. Nakazawa, S. Ochiai, T. Satoh K., and Okumura A. Cross language Information Retrieval based on Comparable Corpora. In: Proceedings of the first NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition (NTCIR1) 1999. [14]. Pirkola, A., Toivonen, J., Keskustalo, H., Visala, K., and Järvelin, K. (2003). Fuzzy translation of cross-lingual spelling variants. In SIGIR ‟03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 345-352, New York, NY, USA. ACM. [15]. Pattabhi R. K. Rao., and Sobha, L. Cross Lingual Information Retrieval Track: Tamil – English, Working notes from FIRE 2010, Feb 2010.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 
_______________________________________________________________________________________ 
Volume: 03 Special Issue: 10 | NCCOTII 2014 | Jun-2014, Available @ http://www.ijret.org 52 
[16]. Prasad Pingali and Vasudeva Varma, “IIIT Hyderabad at CLEF 2007 - Adhoc Indian Language CLIR task”. In: CLEF- 2007, Cross Language Evaluation Forum 2007 Workshop at Budapest Hungary. [17]. Pingali, P., Varma, V., “Hindi and Telugu to English Cross Language Information Retrieval”, Cross Language Extraction Forum(CLEF), 2006. [18]. Sperer, R. and Oard, D. 2000. “Structured query translation for cross-language information retrieval.” In Proceedings of the ACM SIGIR Conference. ACM, New York, 2000. 
[19]. Seetha Anurag, Das Sujoy, Kumar M., Evaluation of the English-Hindi Cross Language Information Retrieval System Based on Dictionary Based Query Translation Method. In: Proceedings of 10th International Conference on Information Technology (ICIT2007). Available at http://doi.ieeecomputersociety.org/10.1109/ICIT.2007.40. [20]. Thenmozhi, D., and Aravindan, C. Tamil-English Cross Lingual Information Retrieval System for Agriculture Society, International Forum for Information Technology in Tamil Conference, October 2009.

More Related Content

What's hot

Improving performance of english hindi cross language information retrieval u...
Improving performance of english hindi cross language information retrieval u...Improving performance of english hindi cross language information retrieval u...
Improving performance of english hindi cross language information retrieval u...
ijnlc
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
aciijournal
 
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachPunjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
IJERA Editor
 
Myanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov ModelMyanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov Model
ijtsrd
 
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
kevig
 
A New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in MalayalamA New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in Malayalam
ijcsit
 
Named Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid ApproachNamed Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid Approach
Waqas Tariq
 
Hybrid approaches for automatic vowelization of arabic texts
Hybrid approaches for automatic vowelization of arabic textsHybrid approaches for automatic vowelization of arabic texts
Hybrid approaches for automatic vowelization of arabic texts
ijnlc
 
Keywords- Based on Arabic Information Retrieval Using Light Stemmer
Keywords- Based on Arabic Information Retrieval Using Light Stemmer Keywords- Based on Arabic Information Retrieval Using Light Stemmer
Keywords- Based on Arabic Information Retrieval Using Light Stemmer
IJCSIS Research Publications
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
ijnlc
 
IRJET- Querying Database using Natural Language Interface
IRJET-  	  Querying Database using Natural Language InterfaceIRJET-  	  Querying Database using Natural Language Interface
IRJET- Querying Database using Natural Language Interface
IRJET Journal
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
ijnlc
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
Syeful Islam
 
ATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliterationATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliteration
IJECEIAES
 
Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...
baskaran_md
 
Cl35491494
Cl35491494Cl35491494
Cl35491494
IJERA Editor
 

What's hot (20)

Improving performance of english hindi cross language information retrieval u...
Improving performance of english hindi cross language information retrieval u...Improving performance of english hindi cross language information retrieval u...
Improving performance of english hindi cross language information retrieval u...
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachPunjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
 
C8 akumaran
C8 akumaranC8 akumaran
C8 akumaran
 
E1 geetha2 karthikeyan
E1 geetha2 karthikeyanE1 geetha2 karthikeyan
E1 geetha2 karthikeyan
 
I1 geetha3 revathi
I1 geetha3 revathiI1 geetha3 revathi
I1 geetha3 revathi
 
Myanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov ModelMyanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov Model
 
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
 
A New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in MalayalamA New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in Malayalam
 
Named Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid ApproachNamed Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid Approach
 
Hybrid approaches for automatic vowelization of arabic texts
Hybrid approaches for automatic vowelization of arabic textsHybrid approaches for automatic vowelization of arabic texts
Hybrid approaches for automatic vowelization of arabic texts
 
AICOL2015_paper_16
AICOL2015_paper_16AICOL2015_paper_16
AICOL2015_paper_16
 
Keywords- Based on Arabic Information Retrieval Using Light Stemmer
Keywords- Based on Arabic Information Retrieval Using Light Stemmer Keywords- Based on Arabic Information Retrieval Using Light Stemmer
Keywords- Based on Arabic Information Retrieval Using Light Stemmer
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
 
IRJET- Querying Database using Natural Language Interface
IRJET-  	  Querying Database using Natural Language InterfaceIRJET-  	  Querying Database using Natural Language Interface
IRJET- Querying Database using Natural Language Interface
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
 
ATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliterationATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliteration
 
Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...
 
Cl35491494
Cl35491494Cl35491494
Cl35491494
 

Viewers also liked

Clustering of medline documents using semi supervised spectral clustering
Clustering of medline documents using semi supervised spectral clusteringClustering of medline documents using semi supervised spectral clustering
Clustering of medline documents using semi supervised spectral clustering
eSAT Publishing House
 
Correlation measure for intuitionistic fuzzy multi sets
Correlation measure for intuitionistic fuzzy multi setsCorrelation measure for intuitionistic fuzzy multi sets
Correlation measure for intuitionistic fuzzy multi sets
eSAT Publishing House
 
Orchestration of web services based on t qo s using user and web services agent
Orchestration of web services based on t qo s using user and web services agentOrchestration of web services based on t qo s using user and web services agent
Orchestration of web services based on t qo s using user and web services agent
eSAT Publishing House
 
The pattern and realization of zigbee wi-fi
The pattern and realization of zigbee  wi-fiThe pattern and realization of zigbee  wi-fi
The pattern and realization of zigbee wi-fi
eSAT Publishing House
 
Gps enabled android application for bus
Gps enabled android application for busGps enabled android application for bus
Gps enabled android application for bus
eSAT Publishing House
 
Effect of the use of crumb rubber in conventional
Effect of the use of crumb rubber in conventionalEffect of the use of crumb rubber in conventional
Effect of the use of crumb rubber in conventional
eSAT Publishing House
 
A novel way of verifiable redistribution of the secret in a multiuser environ...
A novel way of verifiable redistribution of the secret in a multiuser environ...A novel way of verifiable redistribution of the secret in a multiuser environ...
A novel way of verifiable redistribution of the secret in a multiuser environ...
eSAT Publishing House
 
Internet enebled data acquisition and device control
Internet enebled data acquisition and device controlInternet enebled data acquisition and device control
Internet enebled data acquisition and device control
eSAT Publishing House
 
Experimental study of the forces above and under the vibration insulators of ...
Experimental study of the forces above and under the vibration insulators of ...Experimental study of the forces above and under the vibration insulators of ...
Experimental study of the forces above and under the vibration insulators of ...
eSAT Publishing House
 
Design of vco using current mode logic with low supply
Design of vco using current mode logic with low supplyDesign of vco using current mode logic with low supply
Design of vco using current mode logic with low supply
eSAT Publishing House
 
Secure data storage and retrieval in the cloud
Secure data storage and retrieval in the cloudSecure data storage and retrieval in the cloud
Secure data storage and retrieval in the cloud
eSAT Publishing House
 
Studies on seasonal variation of ground water quality
Studies on seasonal variation of ground water qualityStudies on seasonal variation of ground water quality
Studies on seasonal variation of ground water quality
eSAT Publishing House
 
Cfd analaysis of flow through a conical exhaust
Cfd analaysis of flow through a conical exhaustCfd analaysis of flow through a conical exhaust
Cfd analaysis of flow through a conical exhaust
eSAT Publishing House
 
Analysis of rc bridge decks for selected national and internationalstandard l...
Analysis of rc bridge decks for selected national and internationalstandard l...Analysis of rc bridge decks for selected national and internationalstandard l...
Analysis of rc bridge decks for selected national and internationalstandard l...
eSAT Publishing House
 
An application specific reconfigurable architecture
An application specific reconfigurable architectureAn application specific reconfigurable architecture
An application specific reconfigurable architecture
eSAT Publishing House
 
Optimization of ultrasonicated membrane anaerobic
Optimization of ultrasonicated membrane anaerobicOptimization of ultrasonicated membrane anaerobic
Optimization of ultrasonicated membrane anaerobic
eSAT Publishing House
 
Arsenic removal from water using activated carbon
Arsenic removal from water using activated carbonArsenic removal from water using activated carbon
Arsenic removal from water using activated carbon
eSAT Publishing House
 
Monitoring and control of grain storage using plc
Monitoring and control of grain storage using plcMonitoring and control of grain storage using plc
Monitoring and control of grain storage using plc
eSAT Publishing House
 
Low cost 50 watts inverter for fluorescent lamp using
Low cost 50 watts inverter for fluorescent lamp usingLow cost 50 watts inverter for fluorescent lamp using
Low cost 50 watts inverter for fluorescent lamp using
eSAT Publishing House
 
Effect of zn s as an impurity on the physical properties
Effect of zn s as an impurity on the physical propertiesEffect of zn s as an impurity on the physical properties
Effect of zn s as an impurity on the physical properties
eSAT Publishing House
 

Viewers also liked (20)

Clustering of medline documents using semi supervised spectral clustering
Clustering of medline documents using semi supervised spectral clusteringClustering of medline documents using semi supervised spectral clustering
Clustering of medline documents using semi supervised spectral clustering
 
Correlation measure for intuitionistic fuzzy multi sets
Correlation measure for intuitionistic fuzzy multi setsCorrelation measure for intuitionistic fuzzy multi sets
Correlation measure for intuitionistic fuzzy multi sets
 
Orchestration of web services based on t qo s using user and web services agent
Orchestration of web services based on t qo s using user and web services agentOrchestration of web services based on t qo s using user and web services agent
Orchestration of web services based on t qo s using user and web services agent
 
The pattern and realization of zigbee wi-fi
The pattern and realization of zigbee  wi-fiThe pattern and realization of zigbee  wi-fi
The pattern and realization of zigbee wi-fi
 
Gps enabled android application for bus
Gps enabled android application for busGps enabled android application for bus
Gps enabled android application for bus
 
Effect of the use of crumb rubber in conventional
Effect of the use of crumb rubber in conventionalEffect of the use of crumb rubber in conventional
Effect of the use of crumb rubber in conventional
 
A novel way of verifiable redistribution of the secret in a multiuser environ...
A novel way of verifiable redistribution of the secret in a multiuser environ...A novel way of verifiable redistribution of the secret in a multiuser environ...
A novel way of verifiable redistribution of the secret in a multiuser environ...
 
Internet enebled data acquisition and device control
Internet enebled data acquisition and device controlInternet enebled data acquisition and device control
Internet enebled data acquisition and device control
 
Experimental study of the forces above and under the vibration insulators of ...
Experimental study of the forces above and under the vibration insulators of ...Experimental study of the forces above and under the vibration insulators of ...
Experimental study of the forces above and under the vibration insulators of ...
 
Design of vco using current mode logic with low supply
Design of vco using current mode logic with low supplyDesign of vco using current mode logic with low supply
Design of vco using current mode logic with low supply
 
Secure data storage and retrieval in the cloud
Secure data storage and retrieval in the cloudSecure data storage and retrieval in the cloud
Secure data storage and retrieval in the cloud
 
Studies on seasonal variation of ground water quality
Studies on seasonal variation of ground water qualityStudies on seasonal variation of ground water quality
Studies on seasonal variation of ground water quality
 
Cfd analaysis of flow through a conical exhaust
Cfd analaysis of flow through a conical exhaustCfd analaysis of flow through a conical exhaust
Cfd analaysis of flow through a conical exhaust
 
Analysis of rc bridge decks for selected national and internationalstandard l...
Analysis of rc bridge decks for selected national and internationalstandard l...Analysis of rc bridge decks for selected national and internationalstandard l...
Analysis of rc bridge decks for selected national and internationalstandard l...
 
An application specific reconfigurable architecture
An application specific reconfigurable architectureAn application specific reconfigurable architecture
An application specific reconfigurable architecture
 
Optimization of ultrasonicated membrane anaerobic
Optimization of ultrasonicated membrane anaerobicOptimization of ultrasonicated membrane anaerobic
Optimization of ultrasonicated membrane anaerobic
 
Arsenic removal from water using activated carbon
Arsenic removal from water using activated carbonArsenic removal from water using activated carbon
Arsenic removal from water using activated carbon
 
Monitoring and control of grain storage using plc
Monitoring and control of grain storage using plcMonitoring and control of grain storage using plc
Monitoring and control of grain storage using plc
 
Low cost 50 watts inverter for fluorescent lamp using
Low cost 50 watts inverter for fluorescent lamp usingLow cost 50 watts inverter for fluorescent lamp using
Low cost 50 watts inverter for fluorescent lamp using
 
Effect of zn s as an impurity on the physical properties
Effect of zn s as an impurity on the physical propertiesEffect of zn s as an impurity on the physical properties
Effect of zn s as an impurity on the physical properties
 

Similar to Cross language information retrieval in indian

HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMHINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
ijnlc
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
aciijournal
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
aciijournal
 
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
International Journal of Science and Research (IJSR)
 
Attentive_fine-tuning_of_Transformers_for_Translat.pdf
Attentive_fine-tuning_of_Transformers_for_Translat.pdfAttentive_fine-tuning_of_Transformers_for_Translat.pdf
Attentive_fine-tuning_of_Transformers_for_Translat.pdf
SatishBhalshankar
 
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
ijnlc
 
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language ModelsIRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET Journal
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu Language
IJERA Editor
 
Fq2510361043
Fq2510361043Fq2510361043
Fq2510361043
IJERA Editor
 
Developing an architecture for translation engine using ontology
Developing an architecture for translation engine using ontologyDeveloping an architecture for translation engine using ontology
Developing an architecture for translation engine using ontology
Alexander Decker
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
Universidad Nacional de San Martin
 
Ac04507168175
Ac04507168175Ac04507168175
Ac04507168175
IJERA Editor
 
Arabic MT Project
Arabic MT ProjectArabic MT Project
Arabic MT Project
Hind Abdulkhaleq
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
cscpconf
 
A decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri languageA decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri language
acijjournal
 
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVALDICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
csandit
 
Designing Cross-Language Information Retrieval System using various Technique...
Designing Cross-Language Information Retrieval System using various Technique...Designing Cross-Language Information Retrieval System using various Technique...
Designing Cross-Language Information Retrieval System using various Technique...
IRJET Journal
 
Identification of monolingual and code-switch information from English-Kannad...
Identification of monolingual and code-switch information from English-Kannad...Identification of monolingual and code-switch information from English-Kannad...
Identification of monolingual and code-switch information from English-Kannad...
IJECEIAES
 
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
cseij
 
A Novel approach for Document Clustering using Concept Extraction
A Novel approach for Document Clustering using Concept ExtractionA Novel approach for Document Clustering using Concept Extraction
A Novel approach for Document Clustering using Concept Extraction
AM Publications
 

Similar to Cross language information retrieval in indian (20)

HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMHINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
 
Attentive_fine-tuning_of_Transformers_for_Translat.pdf
Attentive_fine-tuning_of_Transformers_for_Translat.pdfAttentive_fine-tuning_of_Transformers_for_Translat.pdf
Attentive_fine-tuning_of_Transformers_for_Translat.pdf
 
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
 
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language ModelsIRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu Language
 
Fq2510361043
Fq2510361043Fq2510361043
Fq2510361043
 
Developing an architecture for translation engine using ontology
Developing an architecture for translation engine using ontologyDeveloping an architecture for translation engine using ontology
Developing an architecture for translation engine using ontology
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
 
Ac04507168175
Ac04507168175Ac04507168175
Ac04507168175
 
Arabic MT Project
Arabic MT ProjectArabic MT Project
Arabic MT Project
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
 
A decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri languageA decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri language
 
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVALDICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
 
Designing Cross-Language Information Retrieval System using various Technique...
Designing Cross-Language Information Retrieval System using various Technique...Designing Cross-Language Information Retrieval System using various Technique...
Designing Cross-Language Information Retrieval System using various Technique...
 
Identification of monolingual and code-switch information from English-Kannad...
Identification of monolingual and code-switch information from English-Kannad...Identification of monolingual and code-switch information from English-Kannad...
Identification of monolingual and code-switch information from English-Kannad...
 
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
 
A Novel approach for Document Clustering using Concept Extraction
A Novel approach for Document Clustering using Concept ExtractionA Novel approach for Document Clustering using Concept Extraction
A Novel approach for Document Clustering using Concept Extraction
 

More from eSAT Publishing House

Likely impacts of hudhud on the environment of visakhapatnam
Likely impacts of hudhud on the environment of visakhapatnamLikely impacts of hudhud on the environment of visakhapatnam
Likely impacts of hudhud on the environment of visakhapatnam
eSAT Publishing House
 
Impact of flood disaster in a drought prone area – case study of alampur vill...
Impact of flood disaster in a drought prone area – case study of alampur vill...Impact of flood disaster in a drought prone area – case study of alampur vill...
Impact of flood disaster in a drought prone area – case study of alampur vill...
eSAT Publishing House
 
Hudhud cyclone – a severe disaster in visakhapatnam
Hudhud cyclone – a severe disaster in visakhapatnamHudhud cyclone – a severe disaster in visakhapatnam
Hudhud cyclone – a severe disaster in visakhapatnam
eSAT Publishing House
 
Groundwater investigation using geophysical methods a case study of pydibhim...
Groundwater investigation using geophysical methods  a case study of pydibhim...Groundwater investigation using geophysical methods  a case study of pydibhim...
Groundwater investigation using geophysical methods a case study of pydibhim...
eSAT Publishing House
 
Flood related disasters concerned to urban flooding in bangalore, india
Flood related disasters concerned to urban flooding in bangalore, indiaFlood related disasters concerned to urban flooding in bangalore, india
Flood related disasters concerned to urban flooding in bangalore, india
eSAT Publishing House
 
Enhancing post disaster recovery by optimal infrastructure capacity building
Enhancing post disaster recovery by optimal infrastructure capacity buildingEnhancing post disaster recovery by optimal infrastructure capacity building
Enhancing post disaster recovery by optimal infrastructure capacity building
eSAT Publishing House
 
Effect of lintel and lintel band on the global performance of reinforced conc...
Effect of lintel and lintel band on the global performance of reinforced conc...Effect of lintel and lintel band on the global performance of reinforced conc...
Effect of lintel and lintel band on the global performance of reinforced conc...
eSAT Publishing House
 
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
eSAT Publishing House
 
Wind damage to buildings, infrastrucuture and landscape elements along the be...
Wind damage to buildings, infrastrucuture and landscape elements along the be...Wind damage to buildings, infrastrucuture and landscape elements along the be...
Wind damage to buildings, infrastrucuture and landscape elements along the be...
eSAT Publishing House
 
Shear strength of rc deep beam panels – a review
Shear strength of rc deep beam panels – a reviewShear strength of rc deep beam panels – a review
Shear strength of rc deep beam panels – a review
eSAT Publishing House
 
Role of voluntary teams of professional engineers in dissater management – ex...
Role of voluntary teams of professional engineers in dissater management – ex...Role of voluntary teams of professional engineers in dissater management – ex...
Role of voluntary teams of professional engineers in dissater management – ex...
eSAT Publishing House
 
Risk analysis and environmental hazard management
Risk analysis and environmental hazard managementRisk analysis and environmental hazard management
Risk analysis and environmental hazard management
eSAT Publishing House
 
Review study on performance of seismically tested repaired shear walls
Review study on performance of seismically tested repaired shear wallsReview study on performance of seismically tested repaired shear walls
Review study on performance of seismically tested repaired shear walls
eSAT Publishing House
 
Monitoring and assessment of air quality with reference to dust particles (pm...
Monitoring and assessment of air quality with reference to dust particles (pm...Monitoring and assessment of air quality with reference to dust particles (pm...
Monitoring and assessment of air quality with reference to dust particles (pm...
eSAT Publishing House
 
Low cost wireless sensor networks and smartphone applications for disaster ma...
Low cost wireless sensor networks and smartphone applications for disaster ma...Low cost wireless sensor networks and smartphone applications for disaster ma...
Low cost wireless sensor networks and smartphone applications for disaster ma...
eSAT Publishing House
 
Coastal zones – seismic vulnerability an analysis from east coast of india
Coastal zones – seismic vulnerability an analysis from east coast of indiaCoastal zones – seismic vulnerability an analysis from east coast of india
Coastal zones – seismic vulnerability an analysis from east coast of india
eSAT Publishing House
 
Can fracture mechanics predict damage due disaster of structures
Can fracture mechanics predict damage due disaster of structuresCan fracture mechanics predict damage due disaster of structures
Can fracture mechanics predict damage due disaster of structures
eSAT Publishing House
 
Assessment of seismic susceptibility of rc buildings
Assessment of seismic susceptibility of rc buildingsAssessment of seismic susceptibility of rc buildings
Assessment of seismic susceptibility of rc buildings
eSAT Publishing House
 
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
eSAT Publishing House
 
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
eSAT Publishing House
 

More from eSAT Publishing House (20)

Likely impacts of hudhud on the environment of visakhapatnam
Likely impacts of hudhud on the environment of visakhapatnamLikely impacts of hudhud on the environment of visakhapatnam
Likely impacts of hudhud on the environment of visakhapatnam
 
Impact of flood disaster in a drought prone area – case study of alampur vill...
Impact of flood disaster in a drought prone area – case study of alampur vill...Impact of flood disaster in a drought prone area – case study of alampur vill...
Impact of flood disaster in a drought prone area – case study of alampur vill...
 
Hudhud cyclone – a severe disaster in visakhapatnam
Hudhud cyclone – a severe disaster in visakhapatnamHudhud cyclone – a severe disaster in visakhapatnam
Hudhud cyclone – a severe disaster in visakhapatnam
 
Groundwater investigation using geophysical methods a case study of pydibhim...
Groundwater investigation using geophysical methods  a case study of pydibhim...Groundwater investigation using geophysical methods  a case study of pydibhim...
Groundwater investigation using geophysical methods a case study of pydibhim...
 
Flood related disasters concerned to urban flooding in bangalore, india
Flood related disasters concerned to urban flooding in bangalore, indiaFlood related disasters concerned to urban flooding in bangalore, india
Flood related disasters concerned to urban flooding in bangalore, india
 
Enhancing post disaster recovery by optimal infrastructure capacity building
Enhancing post disaster recovery by optimal infrastructure capacity buildingEnhancing post disaster recovery by optimal infrastructure capacity building
Enhancing post disaster recovery by optimal infrastructure capacity building
 
Effect of lintel and lintel band on the global performance of reinforced conc...
Effect of lintel and lintel band on the global performance of reinforced conc...Effect of lintel and lintel band on the global performance of reinforced conc...
Effect of lintel and lintel band on the global performance of reinforced conc...
 
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
 
Wind damage to buildings, infrastrucuture and landscape elements along the be...
Wind damage to buildings, infrastrucuture and landscape elements along the be...Wind damage to buildings, infrastrucuture and landscape elements along the be...
Wind damage to buildings, infrastrucuture and landscape elements along the be...
 
Shear strength of rc deep beam panels – a review
Shear strength of rc deep beam panels – a reviewShear strength of rc deep beam panels – a review
Shear strength of rc deep beam panels – a review
 
Role of voluntary teams of professional engineers in dissater management – ex...
Role of voluntary teams of professional engineers in dissater management – ex...Role of voluntary teams of professional engineers in dissater management – ex...
Role of voluntary teams of professional engineers in dissater management – ex...
 
Risk analysis and environmental hazard management
Risk analysis and environmental hazard managementRisk analysis and environmental hazard management
Risk analysis and environmental hazard management
 
Review study on performance of seismically tested repaired shear walls
Review study on performance of seismically tested repaired shear wallsReview study on performance of seismically tested repaired shear walls
Review study on performance of seismically tested repaired shear walls
 
Monitoring and assessment of air quality with reference to dust particles (pm...
Monitoring and assessment of air quality with reference to dust particles (pm...Monitoring and assessment of air quality with reference to dust particles (pm...
Monitoring and assessment of air quality with reference to dust particles (pm...
 
Low cost wireless sensor networks and smartphone applications for disaster ma...
Low cost wireless sensor networks and smartphone applications for disaster ma...Low cost wireless sensor networks and smartphone applications for disaster ma...
Low cost wireless sensor networks and smartphone applications for disaster ma...
 
Coastal zones – seismic vulnerability an analysis from east coast of india
Coastal zones – seismic vulnerability an analysis from east coast of indiaCoastal zones – seismic vulnerability an analysis from east coast of india
Coastal zones – seismic vulnerability an analysis from east coast of india
 
Can fracture mechanics predict damage due disaster of structures
Can fracture mechanics predict damage due disaster of structuresCan fracture mechanics predict damage due disaster of structures
Can fracture mechanics predict damage due disaster of structures
 
Assessment of seismic susceptibility of rc buildings
Assessment of seismic susceptibility of rc buildingsAssessment of seismic susceptibility of rc buildings
Assessment of seismic susceptibility of rc buildings
 
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
 
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
 

Recently uploaded

Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSCW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
veerababupersonal22
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 

Recently uploaded (20)

Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSCW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 

Cross language information retrieval in indian

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Special Issue: 10 | NCCOTII 2014 | Jun-2014, Available @ http://www.ijret.org 46 CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE Pratibha Bajpai1, Dr. Parul Verma2 1Research Scholar, Department of Information Technology, Amity University, Lucknow 2Assistant Professor, Department of Computer Science, Amity University, Lucknow Abstract Multilingual information is overflowing on internet these days. This increasing diversity of web pages in almost every popular language in the world should enable the user to access information in any language of his choice. But sometimes it is difficult for a user to write her request in a language which she could easily read and understand. This makes cross-language information retrieval (CLIR) and multilingual information retrieval (MLIR) for Web applications a valuable need of the day. It increases the accessibility of web users to retrieve information in any language while post their queries in their native language. The paper critically analyzes the various researchers work in the area of Indian language CLIR. In this paper we also present our prospective prototype for English to Hindi language CLIR. It will also discuss the issues related to the English to Hindi language translation. We had tested 30 queries manually using suggested prototype and found that the precision level is quite good. Keywords: Cross lingual Information Retrieval, Query Translation, Sense Disambiguation, English to Hindi Translation. --------------------------------------------------------------------***------------------------------------------------------------------ 1. INTRODUCTION A classic IR system accepts the user information need in a form of query and gives back the documents that are relevant to the user need. With the explosion of knowledge on the web, it became necessary to break the language barriers for the monolingual IR systems. This may allow the users of IR systems to give query in one language and retrieve documents in different languages. IR system, with different source and target language is called CLIR system. Cross-Lingual Information Retrieval (CLIR) translates the user query (given in source language) into the target language, and uses translated query to retrieve the target language documents. The drive for evaluation of monolingual and cross-lingual retrieval systems started with Cross-Language Evaluation Forum (CLEF) in European languages and NTCIR in Chinese-Japanese-Korean languages. It is only in the recent past that the Indian languages have gained importance in evaluation. From 2008, a specific campaign focusing on Indian languages started with the Forum for Information Retrieval Evaluation (FIRE). This resulted in the development of large document collection in some Indian languages like Bangla, Hindi, Marathi and Tamil. Through our paper we like to provide a brief review of the work done by various researchers in the field of Indian languages for CLIR system. The paper is organized as follows: section 2 illustrates different techniques used for query translation. Comparative analysis of CLIR approaches in Indian languages perspective is discussed in section 3. Section 4 describes our prototype for query translation and sense disambiguation while section 5 draws the conclusion. 2. DIFFERENT TECHNIQUES FOR CLIR Based on different translation resources, three different techniques have been identified in CLIR: Dictionary based CLIR, Corpora based CLIR and Machine translator based CLIR. 2.1 Machine Translation Machine translation, in simple terms, is a technique that makes use of software that translates text from one language to another language. But machine translation is not all about substitution of words from one language to another only; rather it also involves finding phrases and its counterparts in target language to produce good quality translations. Machine translation is of three types: 2.1.1 Rule Based Machine Translation Rule based MT uses linguistic information about source and target language. M. Nimaiti and Y. Izumi (2012) developed Japanese Uighur machine translation system using rule based approach. They propose a word-for-word translation system using subject verb agreement in Uighur. The results aren‟t positive and there are still some rooms for improvement. In case of Indian languages, R.Rajan et. al.(2009) propose a rule based system for translating English sentences to Malayalam by utilizing dependencies from parser, POS tagger and transfer link rules for reordering and rules for morphology. 2.1.2 Statistical Machine Translation Statistical machine translation generates translations using statistical methods based on bilingual text corpora. Dan Wu
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Special Issue: 10 | NCCOTII 2014 | Jun-2014, Available @ http://www.ijret.org 47 & Daqing He (2010) conducted a series of CLIR experiments using Google Translate for translating queries. Their results show that with the help of relevance feedback, MT can achieve significant improvement over the monolingual baseline, no matter whether the query length are short or long. Kraaij & Simard(2003) experimentally claim that web can be used for automatic construction of parallel corpus which can then be used to train statistical translation models automatically. 2.1.3 Example Based Machine Translation Example based MT reads similar examples in the form of source text and its translation from the set of examples, adapting the examples to translate a new input. Sato and Nagao (1990) investigated the problem of example selection by approximate matching of input sentences and example sentences, using a similarity measure based on the syntactic similarity of dependency tree structures of a sentence pair in question and on the word distance of corresponding words, which were predefined in a thesaurus. Sumita et al. (1990) looked into example-based translation of Japanese noun phrases of the pattern [N1 no N2] into English as [N2 prep N1] or [N1 N2], based on a distance measure for the input phrase and example phrase, calculated as a linear weighted sum of the distances of the three sub-parts, each of which is predefined in a thesaurus. 2.2 Dictionary Based CLIR The most natural approach to cross-lingual IR is to replace each query term with most appropriate translations extracted automatically from Machine Readable Dictionaries (MRD). The translation using bilingual dictionaries is simple but Ballesteros and Croft (1996) and Hull & Grefenstette(1996) claim that it leads to a 40-60% loss in effectiveness as compared to monolingual retrieval. A.Pirkola (2001) asserts that the loss can be due to factors as untranslatable search keys due to limitations in dictionaries, processing of derived or inflected word forms, phrase and compound translation and lexical ambiguity in source and target languages. To handle these problems, researchers have made use of domain specific dictionaries for the dictionary coverage problem( Pirkola, 1998, 1999), Stemming and morphological analysis to handle inflected words(Hull, 1996, Krovetz, 1993; Porter, 1990), POS tagging for phrase translation(Ballesteros & Croft, 1997), corpus based query expansion (Ballesteros & Croft, 1998; Nie et al. , 1999; Sheridan et. al., 1997) and query structuring for the ambiguity problem(Pirkola, 1998, 1999; Sperer & Oard, 2000). 2.3. Corpus Based Cross Lingual Information Retrieval Corpus based CLIR methods use multilingual terminology derived from parallel or comparable corpora for query translation and expansion. There are two types of corpus: 2.3.1 Parallel Corpus A parallel corpus is a collection where texts in one language are aligned with their translations in another language. Several systems have been developed to mine large parallel corpora from the web. Wang and Lin (2010) give a method which first identifies a set of seed URLs and crawl candidate bilingual websites. The obtained pages are cleaned and bilingual texts collected to construct comparable corpora. Wang et. al. (2004) exploit the bilingual search result pages obtained from a real search engine as a corpus for automatic translation of unknown query terms not included in the dictionary. They propose a PAT-tree based local maxima method for effective extraction of translation candidates. The approach gives excellent results. 2.3.2 Comparable Corpus Comparable corpus, on the other hand, consist of texts that are not translations, but share similar topics. They can be, e.g., newspaper collections written in the same time period in different countries. Sadat Fatiha (2011) exploit the idea of using multilingual based encyclopedias such as Wikipedia to extract terms and their translations to construct a bilingual ontology or enhance the coverage of existing ontologies. The method show promising results for any pair of languages. Qian & Meng (2008) expanded Chinese OOV phrase with its partial English translation and submitted to the search engine. The translation of OOV words is mined by preprocessing the snippets obtained to extract the main text from the web page. The strings obtained are sorted by weighted frequency to output the top n translation of OOV phrase. The method proves to obtain the translation with high time efficiency and high precision. 3. COMPARATIVE ANALYSIS OF CLIR APPROACHES FOR INDIAN LANGUAGES Cross-language retrieval is a budding field in India and the works are still in its primitive state. Table 1 analyzes the performance of various approaches used by the researchers for Indian languages. In many approaches the cross-lingual results are comparable to that of mono-lingual approaches. Table 1: Critical Analysis of CLIR for Indian Languages Languages Translation Size of test data/ Performance Specific Features English to Hindi A.Seetha , S.Das & M. Kumar (2007) Select first equivalent/ preferred –n/ random nth equivalent/ all equivalents from 6219 hindi document test collection/ performance of strategy 1,2,3,4 are 64.80%, 57.90%, 11.83% and 57.13% of monolingual retrieval The four strategies are used to test the system performance on the number of equivalents in the query translation by selecting n equivalents from the list of the dictionary.
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Special Issue: 10 | NCCOTII 2014 | Jun-2014, Available @ http://www.ijret.org 48 Bilingual dictionary Tamil to English S. Saraswathi & A. Siddhiqaa (2010) Machine translation and Ontological tree 200 documents from the domain “festival”/ relevance improves by 40% for English and 60% for Tamil A generic platform is built for bilingual IR which can be extended to any foreign or Indian language working with the same efficiency. English to Hindi A. Seetha, S. Das, J. Rana & M. Kumar (2010) Translation by Shabdanjali dictionary & query expansion by Hindi Wordnet. Fire 2010 Hindi test collection/ method is not very effective Query expansion reformulates the initial query by adding some new related words so that query provides a wider coverage than the original query. English to Malayalam P.L. Nikesh, S.M. Idicula, David Peter (2008) Bilingual dictionary developed in house. System proves to be efficient for CLIR A basic system can be constructed quickly once the linguistic tools become available. English to Hindi Larkey & Connell (2003) Probabilistic dictionary derived from parallel corpus 41697 Hindi news articles/ method contributes to effective Hindi retrieval It combines the ranked lists from the Inquery search and the Language Modeling search to obtain the final ranking of retrieved documents. English to Hindi & Hindi to English S. Sethuramalingam & V. Varma (2008) Bilingual Dictionary English corpus consisted of 125,638 news articles from the Telegraph, Calcutta edition while Hindi corpus consisted of 95215 news articles published in Jagran/ English-Hindi CLIR performance is 58% while Hindi-English CLIR is 25% of the monolingual performance Disjunctive query formulation using weighted keywords give an overall better performance in both CLIR and Multi Lingual scenario. Tamil to English Bilingual dictionary Web/ The approach used improves the significance of the content retrieved and the overall efficiency of the process Using summarization techniques and snippet clustering the result closet to user‟s query is displayed. Bengali & Hindi to English D. Mandal & P. Banerjee (2007) Machine Translation using Bilingual dictionary English news corpus of LA Times 2002 containing 135153 documents/ Map for Bengali-English queries is 7.26 & for Hindi-English queries is 4.77 Queries with named entities provided better results as compared to the queries without named entities implying the importance of a very good bilingual lexicon and transliteration tool in CLIR for Indian languages. Tamil to English D.Thenmozhi & C. Aravindan (2010) Machine Translation Agricultural ontology/ Retrieves pages with MAP of 95% The system exhibits a dynamic learning approach wherein any new word that is encountered in the translation process could be updated to the bilingual dictionary. Hindi to English R. Udupa & J. Jagarlamudi (2008) Probabilistic translation lexicon produced by Statistical Machine Learning Parallel corpus consisting of 100K sentence pairs from the news domain/ Retrieval performance is about 81% of that of monolingual system Transliteration mining of OOV words from the document performance whereas date restriction hurts the retrieval performance. Hindi to telugu to English P.Pingali & V.Verma (2006) Bilingual Dictionary English news corpus of LA Times 1995 containing 113005 documents & 56472 documents from Glasgow Herald of 1995/ The system is much robust Simple techniques such as dictionary lookup with minimal lemmatization such as suffix removal is not sufficient for Indian Languages CLIR. English to Bangla A.Imam & S. SMT using parallel corpus English to Bangla corpus of approximately 29000 sentences/ Improving corpus quality is about 3 times effectual than
  • 4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Special Issue: 10 | NCCOTII 2014 | Jun-2014, Available @ http://www.ijret.org 49 Chowdhury (2011) NIST & BLUE scores (scoring system for evaluating the performance of a Machine Translation System.)are 4.6 and 0.39 which is below the standard increasing the corpus size for English-Bengali SMT. Tamil to English Pattabhi R.K Rao and Sobha. L (2010) Bilingual dictionary and ontology 125638 documents from English news magazine “The Telegraph” / Results are encouraging The system performs well for queries for which the world knowledge has been imparted. 3.1 Observation Cross lingual information retrieval for foreign languages like English, French, Chinese etc. has been an appealing area for researchers from long time. But Indian languages have grabbed attention only a decade back. The work done by researchers show mixed results in terms of improvement over monolingual retrieval in Indian language perspective. Anurag Seetha & S. Das (2010) performed translation on Fire 2010 Hindi test collection using Shabdanjali dictionary & query expansion by Hindi Wordnet. The method proved to be ineffective. It is because general dictionaries have low coverage problem. To remove this inefficiency Larkey and Connell (2003) used probabilistic dictionary derived from parallel corpus for English to Hindi translation and achieved effective cross lingual retrieval. Pattabhi R.K Rao and Sobha. L. (2010) found encouraging results by incorporating Bilingual dictionary and ontology. Other researchers have made use of machine translation for cross lingual retrieval. D.Thenmozhi & C. Aravindan (2010) used MT on agricultural domain and retrieved pages with MAP of 95%. MT systems produce high quality translations only in limited domains and are very expensive too. It involves the cost of creating bilingual dictionary, parallel corpora and the construction and evaluation of MT system. R. Udupa & J. Jagarlamudi (2008) used Probabilistic translation lexicon produced by Statistical Machine Learning while A.Imam & S. Chowdhury (2011) used SMT using parallel corpus for English to Bangla translation. Parallel or comparable corpora are yet other useful resources for CLIR. Parallel corpora are preferred in CLIR because they provide more accurate translation knowledge but due to their scarcity, comparable corpora are often used in CLIR. The above observation concludes that there is a wide scope of research to improve existing algorithms or developing new one to improve the performance level of CLIR system. 4. PROTOTYPE APPROACH In this section we propose an approach for cross-lingual information retrieval on the web and briefly discuss the components of the proposed design. The major components of the design are: Preprocessing, Query translation, Word sense disambiguation and Information Retrieval. Before we start discussing the major components of the system, we need to know the grammatical complexities of the two languages. 4.1 Grammatical Complexities of English to Hindi Translation Hindi and English are morphologically different languages. Translating from poor (e.g. English) to rich (e.g. Hindi) morphology is a tough job and requires deeper linguistic investigation during translation. The major differences are: (i) The basic word order in Hindi is Subject-Object-Verb (SOV) as against SVO word order in English. But in Hindi, the constituents of a sentence can be freely moved around in the sentence without affecting the core meaning. E.g. the following sentence pair conveys the same meaning with different word order: राम ने सीता को देखा Ram ne Sita ko dekha सीता को राम ने देखा Sita ko Ram ne dekhaa The identity of Ram as the subject and Sita as the object in both sentences comes from the case markers ने (ne – nominative) and को (ko –accusative) (ii) Unlike English, vowel length and Vowel nasalization are meaningful in Hindi e.g. (Kam) means „less‟ and (Kaam) means „work‟ (Puuch) means „ask‟ and (puunch) means „tail‟ (iii) In English, prepositions precede the words to which they relate. In Hindi, such words are called postpositions because they follow the words they govern. (iv) Hindi is morphologically richer than English. This can be illustrated from following example: The plural-marker in the word “boys” in English is translated as ए (e – plural direct) or ओं (on – plural oblique): The boys went to school ऱड़के पाठशाऱा गये The boys ate apples. ऱड़को ने सेब खाये
  • 5. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Special Issue: 10 | NCCOTII 2014 | Jun-2014, Available @ http://www.ijret.org 50 Future tense in Hindi is marked on the verb. In the following example, “will go” is translated as जायेंगे (jaaenge), with एंगे (enge) as the future tense marker: The boys will go to school. ऱड़के पाठशाऱा जायेंगे (v) There are no articles in Hindi. Definiteness of a noun is indicated through pronoun, context or word order. (vi) All nouns in Hindi are either masculine or feminine. This means an arbitrary gender is assigned to the nouns that have a neutral gender in English e.g. „chair‟ is a feminine noun and „door‟ is a masculine noun in Hindi. 4.2 Preprocessing The first step in any CLIR system is preprocessing of query terms to speed up the translation process without affecting the retrieval quality. This preprocessing is done using tokenization, stemming and stop word removal. 4.2.1 Tokenization Tokenization is defined as an attempt to recognize the boundaries between words and isolate those parts of a query which should be translated in the source query. 4.2.2 Stop Word Removal Stop Words are words which do not contain important significance in Search Queries and hence can be removed from the query to increase search performance. Removing stop words can be done using a list that contains all stop words. 4.2.3 Stemming It maps all the different inflected forms of a word to the same stem. For languages like English which have weaker inflections, simple stemming algorithms can be used. Such algorithms only remove plural endings. In languages with stronger inflections, suffices are joined to the stem end to end. The advanced stemming algorithm can recognize such multiple endings and remove them in an iterative fashion. Porter stemmer, Snowball stemmer etc. are well known advanced stemming algorithm. 4.3 Query Translation In Query Translation, the given query is converted from Source language to Target language and the obtained query searches the database to get the documents in Target language. Query Translation often suffers from the problem of translation ambiguity and this problem is amplified due to the limited amount of context in short queries. Query translation can be done using any one technique including machine translation, dictionary based or corpus based method. The techniques have already been discussed in section 2. The query translation is quiet complex while translating English to Hindi query as the two languages are morphologically different from each other. Out of vocabulary (OOV) words are not translated even after morphological analysis. This type of words can be transliterated using the target language alphabet and be added to final queries. Not much work has been done for the translation of these two languages by Indian researchers till date. 4.4 Ambiguity Removal in Translated Query Ambiguity is a common problem with all natural languages i.e. there exist a large number of words in these languages carrying more than one meaning. For instance, the English noun plant can mean green plant or factory or the word bank means financial institution or pool of a river. The correct sense of an ambiguous word can be selected based on the context where it occurs. This task of automatically assigning the most appropriate meaning to a polysemous word within a given context is called word sense disambiguation. Disambiguation algorithms use a variety of resources and follow different techniques. On the basis of resource utilization and their processing techniques, the disambiguation techniques can be classified as Knowledge Based Methods (resources used are Machine Readable Dictionaries, Thesaurus, Lexicons ), Supervised Learning Methods (Naïve Bayesian Classifier, Exemplar Based Classifier, Lazy Boosting Algorithm), Minimally Supervised Methods and Unsupervised Methods. 4.5 Information Retrieval after Query Translation and Ambiguity Removal The retrieval system presents the user a set of documents that match his query. The retrieval model is of three types: The Boolean, Vector Space and Probabilistic model. In Boolean model, queries are represented as Boolean expressions and only those documents that logically match the query is presented to the user leaving behind those documents that do not match at all. The major drawback with this model is that it only judges documents completely matching or not and does not determines the degree of matching. The other two methods present the ranked list of documents depending on the degree of matching. Vector Space method calculates the degree of matching by calculating the angle between the query vector and each document vector. The Probabilistic model estimates the probability that a document is relevant for the query on the basis of the assumption that the probability depends on the query and the document representation only. Step by Step Evaluation of CLIR Based on Prototype Approach The steps of the proposed approach can be explained by considering the following queries: Query 1: Hunger Strikes Tokenization- Using whitespace between words the tokens obtained from the query are „Hunger‟ and „Strikes‟
  • 6. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Special Issue: 10 | NCCOTII 2014 | Jun-2014, Available @ http://www.ijret.org 51 Stop Word Removal- No stop words exist in the above query. Stemming- Next using Porter stemmer, the inflected tokens are reduced to their base form. After stemming, query becomes Hunger strike. Query translation- The required translation of the query is „भूख हड़ताऱ’. Ambiguity Removal- Since the translated query is unambiguous, so no disambiguation is required. Precision- The precision of the query is .83, where the number of relevant documents is 10 out of top 12 retrieved documents appeared on first page. Query 2: Alcohol Consumption in India Tokenization- Tokens of the above query are „Alcohol‟, „consumption‟, „in‟ and „India‟. Stop word removal- Next stop word „in‟ is removed using stop word list given by MIT. The query now becomes Alcohol consumption India Stemming- Stemming using Porter stemmer returns the query as Alcohol consumpt India Query Translation- Hindi translation of the query is „भारत में शराब की खपत’. Ambiguity removal- The Hindi translation “भारत” is ambiguous i.e. it has multiple senses. It refers to country India as well as the son of Pandu, a Mahabharat character. The correct sense of a word can be identified based on the context of the query in which it appears using disambiguation algorithm. Precision- The precision of the query is 1.0, where the number of relevant documents is 10 out of 10 retrieved documents appeared on first page. The queries have been preprocessed and translated manually using tools like Potter stemmer, Stop Word list by MIT etc. and received positive results. Based on suggested approach we will formulize an algorithm for English to Hindi language query translation for CLIR. 5. CONCLUSIONS The respective work with regard to Indian languages has gained impetus in last decade and there is much to be explored in this field. It is quite obvious from the observations that there is still a scope of improvement in the performance level of CLIR. We presume that the proposed prototype system will prove to be competent with other existing systems. REFERENCES [1]. Ballesteros, L, and Bruce W Croft, “Phrasal Translation and Query Expansion Techniques for Cross Language Information Retrieval”. In: Proceedings of 20th International ACM SIGIR Conference in Research and Development in IR 1997. [2]. Ballesteros, L., and Croft, W.B. 1998. “Resolving ambiguity for cross-language retrieval.” In Proceedings of SIGIR Conference, pages 64-71, 1998. [3]. Chawre, S. M., Srikantha Rao. Domain Specific Information Retrieval in Multilingual Environment, International Journal of Recent Trends in Engineering, 2, 4, 179-181, 2009. [4]. Chinnakotla Kumar Manoj, Ranadive Sagar, Bhattacharyya Pushpak and Damani P. Om “Hindi and Marathi to English Cross Language Information Retrieval at CLEF 2007”, in the working notes of CLEF 2007. [5]. David A. Hull and Gregory Grefenstette. Querying across languages: A dictionary-based approach to multilingual information retrieval. In Proceedings of the 19th International Conference on Research and Development in Information Retrieval, pages 49–57, 1996. [6]. Dr. Saraswathi, S., Asma Siddhiqaa, M., Kalaimagal, K., and Kalaiyarasi M. BiLingual Information Retrieval System for English and Tamil, Journal Of Computing, 2,4, 85-89, April 2010. [7]. Grefenstette, G. (1998b). The problem of cross-language information retrieval. In Grefenstette (1998a), pages 1-9. [8]. Hsu Hung Ming, Tsai Feng Ming, and Hsin-Hsi Chen Query Expansion with ConceptNet and WordNet: An Intrinsic Comparison. In : AIRS 2006, LNCS 4182, (2006) 1-13. [9]. Hang Cui, Ji-Rong Wen, Jian-Yun Nie, and Wei-Ying Ma. Query Expansion by Mining User Logs IEEE. Transactions on Knowledge and Data Engineering, Vol. 15(4) 2003. [10]. Hiemstra, D. And De Jong, F. 1999. “Disambiguation strategies for cross-language information retrieval”. In Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries. 274-293. [11]. Jagadeesh Jagarlamudi and Kumaran, A. Cross- Lingual Information Retrieval System for Indian Languages, Proceedings of CLEF 2007, 2007. [12]. Kishida, K. (2005). Technical issues of cross-language information retrieval: a review. Inf. Process. Manage., 41(3):433- 455. [13]. Nakazawa, S. Ochiai, T. Satoh K., and Okumura A. Cross language Information Retrieval based on Comparable Corpora. In: Proceedings of the first NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition (NTCIR1) 1999. [14]. Pirkola, A., Toivonen, J., Keskustalo, H., Visala, K., and Järvelin, K. (2003). Fuzzy translation of cross-lingual spelling variants. In SIGIR ‟03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 345-352, New York, NY, USA. ACM. [15]. Pattabhi R. K. Rao., and Sobha, L. Cross Lingual Information Retrieval Track: Tamil – English, Working notes from FIRE 2010, Feb 2010.
  • 7. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Special Issue: 10 | NCCOTII 2014 | Jun-2014, Available @ http://www.ijret.org 52 [16]. Prasad Pingali and Vasudeva Varma, “IIIT Hyderabad at CLEF 2007 - Adhoc Indian Language CLIR task”. In: CLEF- 2007, Cross Language Evaluation Forum 2007 Workshop at Budapest Hungary. [17]. Pingali, P., Varma, V., “Hindi and Telugu to English Cross Language Information Retrieval”, Cross Language Extraction Forum(CLEF), 2006. [18]. Sperer, R. and Oard, D. 2000. “Structured query translation for cross-language information retrieval.” In Proceedings of the ACM SIGIR Conference. ACM, New York, 2000. [19]. Seetha Anurag, Das Sujoy, Kumar M., Evaluation of the English-Hindi Cross Language Information Retrieval System Based on Dictionary Based Query Translation Method. In: Proceedings of 10th International Conference on Information Technology (ICIT2007). Available at http://doi.ieeecomputersociety.org/10.1109/ICIT.2007.40. [20]. Thenmozhi, D., and Aravindan, C. Tamil-English Cross Lingual Information Retrieval System for Agriculture Society, International Forum for Information Technology in Tamil Conference, October 2009.