SlideShare a Scribd company logo
1 of 7
Download to read offline
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.6, December 2013

IMPROVING PERFORMANCE OF ENGLISH-HINDI
CROSS LANGUAGE INFORMATION RETRIEVAL
USING TRANSLITERATION OF QUERY TERMS
Saurabh Varshney1 and Jyoti Bajpai2
1
2

Department of Computer Engineering, GLA University, Mathura, India
Department of Computer Engineering, GLA University, Mathura, India

ABSTRACT
The main issue in Cross Language Information Retrieval (CLIR) is the poor performance of retrieval in
terms of average precision when compared to monolingual retrieval performance. The main reasons
behind poor performance of CLIR are mismatching of query terms, lexical ambiguity and un-translated
query terms. The existing problems of CLIR are needed to be addressed in order to increase the
performance of the CLIR system. In this paper, we are putting our effort to solve the given problem by
proposed an algorithm for improving the performance of English-Hindi CLIR system. We used all possible
combination of Hindi translated query using transliteration of English query terms and choosing the best
query among them for retrieval of documents. The experiment is performed on FIRE 2010 (Forum of
Information Retrieval Evaluation) datasets. The experimental result show that the proposed approach gives
better performance of English-Hindi CLIR system and also helps in overcoming existing problems and
outperforms the existing English-Hindi CLIR system in terms of average precision.

KEYWORDS
Cross Language Information Retrieval; Transliteration of query terms; Lexical ambiguity; English-Hindi
query translation; ‘Shabdanjali’ multi-lingual dictionary; FIRE data collection; ‘Title’ field of initial
query; Mean Average Precision.

1. INTRODUCTION
We are rapidly constructing the broad network architecture for transferring information across
national barriers, but much remains to be done before linguistic boundaries can be better as
effectively as geographic ones [1]. Now a days, peoples have more likely to interest on global
things like education, economy, business, marketing, research etc. because of that peoples are
interested to collect information and data of other regions of the world. The one and only medium
for doing this, is the Internet. But we also knows users are more likely to retrieve information in a
language in which a user is more comfortable or we can say that user wants information in his/her
native language to understand the language of documents more easily. Accessing information in a
host language is clearly important for many users. In India, about 70% of peoples know Hindi as
a primary language while based on human development survey in 2012; there are only 10.35 %
peoples in India who are the English speakers. India is third country that has largest number of
internet users but when we talk about penetration means total population, in India only 12.6% of
people are the internet user which decrease the rank of India on to 164th position based on survey.
And we also know that entering query in another language to retrieve documents is very difficult
to the user. So, the conclusion is that, there should be require a tool that takes query in English
language and provides relevant information in our native language.
DOI : 10.5121/ijnlc.2013.2604

53
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.6, December 2013

The Internet environment gives the benefits for this issue by providing Cross Language
Information Retrieval (CLIR) technology. Because of big bang of on-line non-English webpage’s,
CLIR systems have become progressively more important in recent years [2]. CLIR filling the
gap of linguistic barrier by allow a user to search in one language and retrieve documents in
another language.
CLIR is important because of various reasons that are as follows:
•

•

Sometime, we are not able to find an appropriate query to find top relevant document.
Like if I want to download Ramcharitramaanas in Hindi language. If I enters query in
Hindi language (like रामचǐरऽमानस) than it gives more promising result as compared to
the English query (like Ramcharitramaanas) because sometimes documents are
completely in a single language(like Hindi) because of that user query based IR system
cannot retrieve such documents.
CLIR increases the percentage of users in internet because it provides the information in
their native language.

But we also know in India, there are many words that are known because of their English
meaning like computer, cricket, bank and many more, people’s do not knows their Hindi meaning
and even sometime peoples prefer English words to makes sentences. We also know very well
that information retrieval models works on similarity between query and documents. After the
query translation in English-Hindi CLIR, if we get a Hindi meaning of such types of words then
definitely the performance of CLIR system will decrease because of mismatching between query
terms and documents.

2. RELATED WORK
Considerable amount of work is already done in English-Hindi CLIR. The different-different
approaches for retrieving information from CLIR system have some advantages and
disadvantages. Lisa, et al. [3] in 1998 proposed a method for resolving ambiguity in query
translation and phrasal translation by using statistics co-occurrence analysis from unlinked corpus
and combines this technique with other techniques for resolving ambiguity and achieve more than
90% of CLIR performance while compared to the monolingual performance and also author
compared their method with machine translation and parallel corpus techniques and they proved
that good performance of retrieval can be achieved without the use of complex resources. KyungSoon et al. [4] in 2002 proposed a method to implicitly resolve ambiguities in Korean-English
CLIR system using dynamic incremental clustering approach means the clusters are incrementally
created for the top ranked documents for a particular query and next time when the same query
will fired than the weight of each retrieved document is recalculated by using these clusters. Dong
Zhou et al. [5] in 2008 developed a disambiguation strategy for determining the correct
translation for a given query by using novel graph based analysis of co-occurrence information
and also developed a new approach to translate OOV (Out Of Vocabulary) terms means the
words that are commonly not found in dictionary like, proper names, location, address etc. Sujoy
Das et al. [6] in 2010 investigated the influence of query expansion using WordNet in EnglishHindi CLIR system. Author used shabdanjali dictionary for English-Hindi query translation and
expands Hindi queries by using Hindi WordNet and used nine different strategies for query
expansion. Based on the results, author observed that query expansion using Hindi WordNet is
not more effective and not gives a better performance while compared to monolingual
performance. S.M. Chaware et al. [7] in 2011 proposed an approach to build ontology from
relational database with the help of some additional rules that can also be used for cross lingual
information retrieval. The ontology approach is based on user requirements that give overall
knowledge of domain to the user.
54
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.6, December 2013

3. PROPOSED METHODOLOGY
The system which takes user query in one language and retrieves relevant documents in other
language is known as cross language information retrieval system. Studies say that the
performance of CLIR is still poor as compared to Mono-lingual performance and also the
problem of ambiguity in query translation down the performance of CLIR in term of recall and
precision. Several methods have been already proposed in order to solve the given problem of
CLIR like query expansion, co-occurrence statistics, Clustering etc, but still the performance of
English-Hindi CLIR is not as good as compared to monolingual IR performance. The most
common reasons behind the poor performance of CLIR are as follows:
•
•
•
•

Lack of availability of resources like Bilingual dictionary, mismatching of out of
vocabulary (OOV) terms, stemmer, part of speech (POS) tagger etc.
Multiple representations of query words (Lexical ambiguity).
Problem in encoding the text (UTF-8)
Poor matching and translation techniques.

These problems are due to the limitations in the existing approaches. Therefore, the limitations of
the existing approaches need to be further inquired towards achieving the increase in the
performance English-Hindi CLIR. The main aim for inquiring the limitations of existing
approaches and to develop a new approach to find out all the relevant information from CLIR
with higher and higher recall and with no or very less amount of irrelevant information retrieved
according to the query given by the user. So, in our approach, we used transliteration of each
query terms to make all possible combination of query.
The proposed algorithm for English-Hindi CLIR is given below that shows the step by step
process of English-Hindi CLIR.
1. User enters the query in English language ܳா .
2. Finds all terms from ܳா and translate those terms into Hindi language using EnglishHindi dictionary and naming them as {‫ݐ‬ଵ , ‫ݐ‬ଶ , ‫ݐ‬ଷ ……… ‫ݐ‬௄ }.
3. Finds all terms from ܳா and transliteration those terms into Hindi language using Itrans
tool and naming them as {‫ݐ‬Ԣଵ , ‫ݐ‬Ԣଶ , ‫ݐ‬Ԣଷ …… ‫ݐ‬Ԣ௄ }
4. Mapping terms {‫ݐ‬ଵ , ‫ݐ‬ଶ , ‫ݐ‬ଷ ……… ‫ݐ‬௄ }= {‫ݐ‬Ԣଵ , ‫ݐ‬Ԣଶ , ‫ݐ‬Ԣଷ ……… ‫ݐ‬Ԣ௄ }
5. Translate English query ܳா into Hindi query ܳு .
6. Making all the possible combination of Hindi Query ܳு using {‫ݐ‬Ԣଵ , ‫ݐ‬Ԣଶ , ‫ݐ‬Ԣଷ ……… ‫ݐ‬Ԣ௄ }
without replacement of term position up to 2௞ times, where k is the number of terms in
ܳா .
7. Calculate the mean average precision (MAP) of all possible queries that is generated from
step 6 and from them choose the best query and named that query as ܳԢு .
8. All the relevant documents generate by the query ܳԢு gives to the user.
To understand the algorithm, we consider an example that is shown below:
1. User enters the query ܳா = {Democracy in India}.
2. Finds all the terms from ܳா i.e. {Democracy, India} and translate them into Hindi
language as {लोकतंऽ, भारत} and naming them as {‫ݐ‬ଵ , ‫ݐ‬ଶ ሽ.
3. Finds all the terms from ܳா and transliteration those terms into Hindi language using
Itrans tool as {डे मोबेसी, इं Ǒडया} and naming them as {‫ݐ‬Ԣଵ , ‫ݐ‬Ԣଶ }.
4. Mapping terms { ‫ݐ‬ଵ , ‫ݐ‬ଶ , ‫ݐ‬ଷ ……… ‫ݐ‬௄ }= { ‫ݐ‬Ԣଵ , ‫ݐ‬Ԣଶ , ‫ݐ‬Ԣଷ ……… ‫ݐ‬Ԣ௄ } means
55
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.6, December 2013

‫ݐ‬ଵ = ‫ݐ‬Ԣଵ (लोकतंऽ = डे मोबेसी) and ‫ݐ‬ଶ = ‫ݐ‬Ԣଶ (भारत = इं Ǒडया)
5. Translate English query ܳா into Hindi queryܳு .
ܳு = {भारत मɅ लोकतंऽ}
6. Making all the possible combination of Hindi Query ܳு using { ‫ݐ‬Ԣଵ , ‫ݐ‬Ԣଶ } without
replacement of term position up to 2௞ times means 2ଶ times
ܳு = ሺभारत मɅ लोकतंऽ | इं Ǒडया मɅ लोकतंऽ | भारत मɅ डे मोबेसी | इं Ǒडया मɅ डे मोबेसी ሻ
7. Calculate the mean average precision (MAP) of all possible queries that is generated from
step 6 and from them choose the best query and named that query as ܳԢு .
8. All the relevant documents generate by the query ܳԢு gives to the user.

3.1. Query Translation
Translation of query from one language to other language is known as query translation. Query
translation is a crucial step in CLIR system because all problems come from this step like
mismatching of query terms, ambiguities, poor retrieval performance etc. There are various
approaches are used to translate user query like Bilingual dictionary, parallel corpus, online
translator etc. In this paper, we have used ‘Shabdanjali’ Multi-lingual Readable Dictionary as a
lexicon resource for translating English to Hindi query. The dictionary was developed in IIIT
Hyderabad. This dictionary is available in ISCII conversion. So, a conversion from ISCII to UTF8 encoding code is required. The other inbuilt tools/resources that help to translate English query
to Hindi query is shown in table 1.
TABLE 1 Tools used for query translation

Resources
Morphological Analysis
POS tagger
Transliteration
STOP word
Stemmer

Tool Used
ittoolbox
Stanfort POS tagger
I-Trans
List of 480 Stop words
Porter Stemmer

4. EXPERIMENTS
Experiment is performed on FIRE (Forum of Information Retrieval Evaluation) 2010 datasets.
FIRE 2010 datasets consists of set of user queries in terms of ‘Title’ field, ‘Description’ field and
‘narrative’ field, set of documents and qrel files which gives a list of relevant documents for a
queries. The experiment performed for English-Hindi CLIR to retrieve Hindi documents using
English queries. In this paper we used only ‘Title’ field of queries. Test data collection describe in
table 2 as follows:
TABLE 2 Statistics of FIRE 2010 data collection

Metrics

CLIR

Query Language

English

Document Language

Hindi

No. of Queries
No. of Documents
Size of Documents
Avg. no of rel. documents per

50
1,49,481
1.36 GB
18
56
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.6, December 2013

query
We obtain results of two approaches, first EHT approach which means English-Hindi CLIR using
title field of query and second is EHRT approach which means English-Hindi CLIR using refined
title field of query using our approach. The results of both approaches are shown in term of
interpolation recall-precision average in table 3.
Table 3 Recall-Precision Average

Metric
No. of query
No. of retrieve
documents
No. of rel. documents
No. of retrieve relevant
At 0.00
At 0.10
At 0.20
At 0.30
At 0.40
At 0.50
At 0.60
At 0.70
At 0.80
At 0.90
At 1.00
MAP
At 5 docs
At 10 docs
At 15 docs
At 20 docs
At 30 docs
At 100 docs
At 200 docs
At 500 docs
At 1000 docs
R-precision

EHT
50
50,000

EHRT
50
50,000

915
859
0.6707
0.5852
0.5088
0.4395
0.3827
0.3334
0.2935
0.2478
0.1929
0.1364
0.1074
0.3324
0.4240
0.3800
0.3373
0.3000
0.2460
0.1174
0.0672
0.0325
0.0171
0.3260

915
880
0.7221
0.6233
0.5421
0.4741
0.4202
0.3697
0.3260
0.2737
0.2150
0.1517
0.1153
0.3609
0.4360
0.4060
0.3667
0.3260
0.2607
0.1236
0.0708
0.0334
0.0176
0.3459

4.1. Performance graph
From the table 2, it is clear that our English-Hindi CLIR gives better performance as compared to
original title field queries of English-Hindi CLIR in terms of MAP (mean average precision).
Figure 1 shows the interpolation recall-precision average of both systems.

57
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.6, December 2013

0.9
EHT
0.7
0.5
EHRT
0.3
0.1
0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

Figure 1 Interpolation Recall-Precision Averages

5. CONCLUSION
The hurdle problem in CLIR is poor performance when compared to monolingual IR performance
because of query term mismatching, un-translated query words, multiple representations of query
terms etc. There are considerable amount of work is already done in English-Hindi CLIR. The
different-different approaches for retrieving information from CLIR system are discussed in
related work part. Those earlier proposed approaches have some advantages and disadvantages. In
order to make these approaches to be more efficient and effective practically the limitations of
those approaches need to be further inquired towards achieving the increase in the performance of
the English-Hindi CLIR system.
In this paper, we used transliteration of each query terms of English query to make all possible
combination of Hindi query without replacement of query term position in order to improving of
English Hindi CLIR. The average mean precision of EHT and EHRT strategies are .3324 and
.3721. The experimental results show that the proposed approach for refined ‘title’ field of queries
gives more relevant information as compared to original ‘title’ field of FIRE 2010 queries.
Therefore proposed approach can helps to improving the performance of English-Hindi CLIR and
retrieving rich and high quality of information with increase in average precision. This approach
is language independent means this approach works on any combination of languages.

REFERENCES
[1] B. Ashwin, (2012) "Profound Survey on Cross Language Information Retrieval Methods", Published
in Advanced Computing & Communication Technologies (ACCT), 2012 Second International
Conference on IEEE, pp. 65-70.
[2] J. M. Ponte & W. B. Croft, (1998) “A language modelling approach to information retrieval”,
In Proceedings of the 21st annual international ACM SIGIR conference on Research and development
in information retrieval, ACM, pp. 275-281.
[3] L. Ballesteros & W. Bruce Croft, (1998) "Resolving ambiguity for cross-language retrieval”,
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in
information retrieval, pp. 61-71.
[4] L. Kyung-Soon, K. Kageura & K. Choi, (2002) "Implicit ambiguity resolution using incremental
clustering in Korean-to-English cross-language information retrieval," Proceedings of the 19th
international conference on Computational linguistics, Vol. 1, pp. 1-7, 2002.
[5] D. Zhou, M. Truran & T. Brailsford, (2008) "Disambiguation and unknown term translation in cross
language information retrieval", Published in Advances in Multilingual and Multimodal Information
Retrieval, Springer Berlin Heidelberg, pp. 64-71.
58
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.6, December 2013
[6] S. Das, A. Seetha, M. Kumar & J. L. Rana, (2010) "Post Translation Query Expansion using Hindi
Word-Net for English-Hindi CLIR System", Published in Forum of Information Retrieval Evaluation,
FIRE, pp. 53-41.
[7] S. M Chaware & S. Rao, (2011) "Ontology Approach for Cross-Language Information
Retrieval", Published in International Journal of Computer Technology and Application, vol. 2, pp.
379-384.
Author
Saurabh Varshney obtained his bachelor’s degree from UPTU University, Lucknow,
India. Now he is currently an M.Tech student under the supervision of Asstt. Prof. Jyoti
Bajpai. His research is centred on performance of English-Hindi Cross Language
Information Retrieval System.

59

More Related Content

What's hot

ATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliterationATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliterationIJECEIAES
 
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTS
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTSSENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTS
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTSijnlc
 
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVALDICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVALcsandit
 
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...ijnlc
 
AN EFFECTIVE ARABIC TEXT CLASSIFICATION APPROACH BASED ON KERNEL NAIVE BAYES ...
AN EFFECTIVE ARABIC TEXT CLASSIFICATION APPROACH BASED ON KERNEL NAIVE BAYES ...AN EFFECTIVE ARABIC TEXT CLASSIFICATION APPROACH BASED ON KERNEL NAIVE BAYES ...
AN EFFECTIVE ARABIC TEXT CLASSIFICATION APPROACH BASED ON KERNEL NAIVE BAYES ...ijaia
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...Syeful Islam
 
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...ijnlc
 
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES ijnlc
 
Developemnt and evaluation of a web based question answering system for arabi...
Developemnt and evaluation of a web based question answering system for arabi...Developemnt and evaluation of a web based question answering system for arabi...
Developemnt and evaluation of a web based question answering system for arabi...ijnlc
 
ISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITION
ISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITIONISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITION
ISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITIONijnlc
 
Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...
Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...
Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...CSCJournals
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...Syeful Islam
 
Language Identification from a Tri-lingual Printed Document: A Simple Approach
Language Identification from a Tri-lingual Printed Document: A Simple ApproachLanguage Identification from a Tri-lingual Printed Document: A Simple Approach
Language Identification from a Tri-lingual Printed Document: A Simple ApproachIJERA Editor
 
Language translation english to hindi
Language translation english to hindiLanguage translation english to hindi
Language translation english to hindiRAJENDRA VERMA
 
Hybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textHybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textijnlc
 
A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...
A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...
A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...Eswar Publications
 
Robust extended tokenization framework for romanian by semantic parallel text...
Robust extended tokenization framework for romanian by semantic parallel text...Robust extended tokenization framework for romanian by semantic parallel text...
Robust extended tokenization framework for romanian by semantic parallel text...ijnlc
 

What's hot (20)

ATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliterationATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliteration
 
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTS
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTSSENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTS
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTS
 
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVALDICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
 
Ac04507168175
Ac04507168175Ac04507168175
Ac04507168175
 
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
 
AN EFFECTIVE ARABIC TEXT CLASSIFICATION APPROACH BASED ON KERNEL NAIVE BAYES ...
AN EFFECTIVE ARABIC TEXT CLASSIFICATION APPROACH BASED ON KERNEL NAIVE BAYES ...AN EFFECTIVE ARABIC TEXT CLASSIFICATION APPROACH BASED ON KERNEL NAIVE BAYES ...
AN EFFECTIVE ARABIC TEXT CLASSIFICATION APPROACH BASED ON KERNEL NAIVE BAYES ...
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
 
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
 
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
 
Developemnt and evaluation of a web based question answering system for arabi...
Developemnt and evaluation of a web based question answering system for arabi...Developemnt and evaluation of a web based question answering system for arabi...
Developemnt and evaluation of a web based question answering system for arabi...
 
ISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITION
ISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITIONISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITION
ISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITION
 
Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...
Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...
Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...
 
Ny3424442448
Ny3424442448Ny3424442448
Ny3424442448
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
 
Language Identification from a Tri-lingual Printed Document: A Simple Approach
Language Identification from a Tri-lingual Printed Document: A Simple ApproachLanguage Identification from a Tri-lingual Printed Document: A Simple Approach
Language Identification from a Tri-lingual Printed Document: A Simple Approach
 
C.s & c.m
C.s & c.mC.s & c.m
C.s & c.m
 
Language translation english to hindi
Language translation english to hindiLanguage translation english to hindi
Language translation english to hindi
 
Hybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textHybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic text
 
A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...
A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...
A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...
 
Robust extended tokenization framework for romanian by semantic parallel text...
Robust extended tokenization framework for romanian by semantic parallel text...Robust extended tokenization framework for romanian by semantic parallel text...
Robust extended tokenization framework for romanian by semantic parallel text...
 

Viewers also liked

Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to HindiRajat Jain
 
The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...Brian Solis
 
Open Source Creativity
Open Source CreativityOpen Source Creativity
Open Source CreativitySara Cannon
 
Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)maditabalnco
 
The Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsThe Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsBarry Feldman
 
The Outcome Economy
The Outcome EconomyThe Outcome Economy
The Outcome EconomyHelge Tennø
 

Viewers also liked (7)

Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job? Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
 
The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...
 
Open Source Creativity
Open Source CreativityOpen Source Creativity
Open Source Creativity
 
Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)
 
The Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsThe Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post Formats
 
The Outcome Economy
The Outcome EconomyThe Outcome Economy
The Outcome Economy
 

Similar to Improving performance of english hindi cross language information retrieval using transliteration of query terms

Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageSurvey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageEditor IJCATR
 
Designing Cross-Language Information Retrieval System using various Technique...
Designing Cross-Language Information Retrieval System using various Technique...Designing Cross-Language Information Retrieval System using various Technique...
Designing Cross-Language Information Retrieval System using various Technique...IRJET Journal
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
 
Named Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid ApproachNamed Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid ApproachWaqas Tariq
 
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURESMULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURESmlaij
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageIJERA Editor
 
Car-Following Parameters by Means of Cellular Automata in the Case of Evacuation
Car-Following Parameters by Means of Cellular Automata in the Case of EvacuationCar-Following Parameters by Means of Cellular Automata in the Case of Evacuation
Car-Following Parameters by Means of Cellular Automata in the Case of EvacuationCSCJournals
 
Diacritic Oriented Arabic Information Retrieval System
Diacritic Oriented Arabic Information Retrieval SystemDiacritic Oriented Arabic Information Retrieval System
Diacritic Oriented Arabic Information Retrieval SystemCSCJournals
 
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...kevig
 
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGESSTUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGESijistjournal
 
An unsupervised approach to develop ir system the case of urdu
An unsupervised approach to develop ir system  the case of urduAn unsupervised approach to develop ir system  the case of urdu
An unsupervised approach to develop ir system the case of urduijaia
 
An exploratory research on grammar checking of Bangla sentences using statist...
An exploratory research on grammar checking of Bangla sentences using statist...An exploratory research on grammar checking of Bangla sentences using statist...
An exploratory research on grammar checking of Bangla sentences using statist...IJECEIAES
 
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRA NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRcscpconf
 
MACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGE
MACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGEMACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGE
MACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGEJomy Jose
 
Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Waqas Tariq
 
Ijarcet vol-2-issue-4-1339-1341
Ijarcet vol-2-issue-4-1339-1341Ijarcet vol-2-issue-4-1339-1341
Ijarcet vol-2-issue-4-1339-1341Editor IJARCET
 
Mobile query reformulations
Mobile query reformulationsMobile query reformulations
Mobile query reformulations祺傑 林
 

Similar to Improving performance of english hindi cross language information retrieval using transliteration of query terms (20)

Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageSurvey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi Language
 
Designing Cross-Language Information Retrieval System using various Technique...
Designing Cross-Language Information Retrieval System using various Technique...Designing Cross-Language Information Retrieval System using various Technique...
Designing Cross-Language Information Retrieval System using various Technique...
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
 
Named Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid ApproachNamed Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid Approach
 
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURESMULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu Language
 
Car-Following Parameters by Means of Cellular Automata in the Case of Evacuation
Car-Following Parameters by Means of Cellular Automata in the Case of EvacuationCar-Following Parameters by Means of Cellular Automata in the Case of Evacuation
Car-Following Parameters by Means of Cellular Automata in the Case of Evacuation
 
Diacritic Oriented Arabic Information Retrieval System
Diacritic Oriented Arabic Information Retrieval SystemDiacritic Oriented Arabic Information Retrieval System
Diacritic Oriented Arabic Information Retrieval System
 
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
 
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGESSTUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
 
An unsupervised approach to develop ir system the case of urdu
An unsupervised approach to develop ir system  the case of urduAn unsupervised approach to develop ir system  the case of urdu
An unsupervised approach to develop ir system the case of urdu
 
Viva
VivaViva
Viva
 
An exploratory research on grammar checking of Bangla sentences using statist...
An exploratory research on grammar checking of Bangla sentences using statist...An exploratory research on grammar checking of Bangla sentences using statist...
An exploratory research on grammar checking of Bangla sentences using statist...
 
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRA NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
 
MACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGE
MACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGEMACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGE
MACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGE
 
C8 akumaran
C8 akumaranC8 akumaran
C8 akumaran
 
Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...
 
Ijarcet vol-2-issue-4-1339-1341
Ijarcet vol-2-issue-4-1339-1341Ijarcet vol-2-issue-4-1339-1341
Ijarcet vol-2-issue-4-1339-1341
 
Mobile query reformulations
Mobile query reformulationsMobile query reformulations
Mobile query reformulations
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Improving performance of english hindi cross language information retrieval using transliteration of query terms

  • 1. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.6, December 2013 IMPROVING PERFORMANCE OF ENGLISH-HINDI CROSS LANGUAGE INFORMATION RETRIEVAL USING TRANSLITERATION OF QUERY TERMS Saurabh Varshney1 and Jyoti Bajpai2 1 2 Department of Computer Engineering, GLA University, Mathura, India Department of Computer Engineering, GLA University, Mathura, India ABSTRACT The main issue in Cross Language Information Retrieval (CLIR) is the poor performance of retrieval in terms of average precision when compared to monolingual retrieval performance. The main reasons behind poor performance of CLIR are mismatching of query terms, lexical ambiguity and un-translated query terms. The existing problems of CLIR are needed to be addressed in order to increase the performance of the CLIR system. In this paper, we are putting our effort to solve the given problem by proposed an algorithm for improving the performance of English-Hindi CLIR system. We used all possible combination of Hindi translated query using transliteration of English query terms and choosing the best query among them for retrieval of documents. The experiment is performed on FIRE 2010 (Forum of Information Retrieval Evaluation) datasets. The experimental result show that the proposed approach gives better performance of English-Hindi CLIR system and also helps in overcoming existing problems and outperforms the existing English-Hindi CLIR system in terms of average precision. KEYWORDS Cross Language Information Retrieval; Transliteration of query terms; Lexical ambiguity; English-Hindi query translation; ‘Shabdanjali’ multi-lingual dictionary; FIRE data collection; ‘Title’ field of initial query; Mean Average Precision. 1. INTRODUCTION We are rapidly constructing the broad network architecture for transferring information across national barriers, but much remains to be done before linguistic boundaries can be better as effectively as geographic ones [1]. Now a days, peoples have more likely to interest on global things like education, economy, business, marketing, research etc. because of that peoples are interested to collect information and data of other regions of the world. The one and only medium for doing this, is the Internet. But we also knows users are more likely to retrieve information in a language in which a user is more comfortable or we can say that user wants information in his/her native language to understand the language of documents more easily. Accessing information in a host language is clearly important for many users. In India, about 70% of peoples know Hindi as a primary language while based on human development survey in 2012; there are only 10.35 % peoples in India who are the English speakers. India is third country that has largest number of internet users but when we talk about penetration means total population, in India only 12.6% of people are the internet user which decrease the rank of India on to 164th position based on survey. And we also know that entering query in another language to retrieve documents is very difficult to the user. So, the conclusion is that, there should be require a tool that takes query in English language and provides relevant information in our native language. DOI : 10.5121/ijnlc.2013.2604 53
  • 2. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.6, December 2013 The Internet environment gives the benefits for this issue by providing Cross Language Information Retrieval (CLIR) technology. Because of big bang of on-line non-English webpage’s, CLIR systems have become progressively more important in recent years [2]. CLIR filling the gap of linguistic barrier by allow a user to search in one language and retrieve documents in another language. CLIR is important because of various reasons that are as follows: • • Sometime, we are not able to find an appropriate query to find top relevant document. Like if I want to download Ramcharitramaanas in Hindi language. If I enters query in Hindi language (like रामचǐरऽमानस) than it gives more promising result as compared to the English query (like Ramcharitramaanas) because sometimes documents are completely in a single language(like Hindi) because of that user query based IR system cannot retrieve such documents. CLIR increases the percentage of users in internet because it provides the information in their native language. But we also know in India, there are many words that are known because of their English meaning like computer, cricket, bank and many more, people’s do not knows their Hindi meaning and even sometime peoples prefer English words to makes sentences. We also know very well that information retrieval models works on similarity between query and documents. After the query translation in English-Hindi CLIR, if we get a Hindi meaning of such types of words then definitely the performance of CLIR system will decrease because of mismatching between query terms and documents. 2. RELATED WORK Considerable amount of work is already done in English-Hindi CLIR. The different-different approaches for retrieving information from CLIR system have some advantages and disadvantages. Lisa, et al. [3] in 1998 proposed a method for resolving ambiguity in query translation and phrasal translation by using statistics co-occurrence analysis from unlinked corpus and combines this technique with other techniques for resolving ambiguity and achieve more than 90% of CLIR performance while compared to the monolingual performance and also author compared their method with machine translation and parallel corpus techniques and they proved that good performance of retrieval can be achieved without the use of complex resources. KyungSoon et al. [4] in 2002 proposed a method to implicitly resolve ambiguities in Korean-English CLIR system using dynamic incremental clustering approach means the clusters are incrementally created for the top ranked documents for a particular query and next time when the same query will fired than the weight of each retrieved document is recalculated by using these clusters. Dong Zhou et al. [5] in 2008 developed a disambiguation strategy for determining the correct translation for a given query by using novel graph based analysis of co-occurrence information and also developed a new approach to translate OOV (Out Of Vocabulary) terms means the words that are commonly not found in dictionary like, proper names, location, address etc. Sujoy Das et al. [6] in 2010 investigated the influence of query expansion using WordNet in EnglishHindi CLIR system. Author used shabdanjali dictionary for English-Hindi query translation and expands Hindi queries by using Hindi WordNet and used nine different strategies for query expansion. Based on the results, author observed that query expansion using Hindi WordNet is not more effective and not gives a better performance while compared to monolingual performance. S.M. Chaware et al. [7] in 2011 proposed an approach to build ontology from relational database with the help of some additional rules that can also be used for cross lingual information retrieval. The ontology approach is based on user requirements that give overall knowledge of domain to the user. 54
  • 3. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.6, December 2013 3. PROPOSED METHODOLOGY The system which takes user query in one language and retrieves relevant documents in other language is known as cross language information retrieval system. Studies say that the performance of CLIR is still poor as compared to Mono-lingual performance and also the problem of ambiguity in query translation down the performance of CLIR in term of recall and precision. Several methods have been already proposed in order to solve the given problem of CLIR like query expansion, co-occurrence statistics, Clustering etc, but still the performance of English-Hindi CLIR is not as good as compared to monolingual IR performance. The most common reasons behind the poor performance of CLIR are as follows: • • • • Lack of availability of resources like Bilingual dictionary, mismatching of out of vocabulary (OOV) terms, stemmer, part of speech (POS) tagger etc. Multiple representations of query words (Lexical ambiguity). Problem in encoding the text (UTF-8) Poor matching and translation techniques. These problems are due to the limitations in the existing approaches. Therefore, the limitations of the existing approaches need to be further inquired towards achieving the increase in the performance English-Hindi CLIR. The main aim for inquiring the limitations of existing approaches and to develop a new approach to find out all the relevant information from CLIR with higher and higher recall and with no or very less amount of irrelevant information retrieved according to the query given by the user. So, in our approach, we used transliteration of each query terms to make all possible combination of query. The proposed algorithm for English-Hindi CLIR is given below that shows the step by step process of English-Hindi CLIR. 1. User enters the query in English language ܳா . 2. Finds all terms from ܳா and translate those terms into Hindi language using EnglishHindi dictionary and naming them as {‫ݐ‬ଵ , ‫ݐ‬ଶ , ‫ݐ‬ଷ ……… ‫ݐ‬௄ }. 3. Finds all terms from ܳா and transliteration those terms into Hindi language using Itrans tool and naming them as {‫ݐ‬Ԣଵ , ‫ݐ‬Ԣଶ , ‫ݐ‬Ԣଷ …… ‫ݐ‬Ԣ௄ } 4. Mapping terms {‫ݐ‬ଵ , ‫ݐ‬ଶ , ‫ݐ‬ଷ ……… ‫ݐ‬௄ }= {‫ݐ‬Ԣଵ , ‫ݐ‬Ԣଶ , ‫ݐ‬Ԣଷ ……… ‫ݐ‬Ԣ௄ } 5. Translate English query ܳா into Hindi query ܳு . 6. Making all the possible combination of Hindi Query ܳு using {‫ݐ‬Ԣଵ , ‫ݐ‬Ԣଶ , ‫ݐ‬Ԣଷ ……… ‫ݐ‬Ԣ௄ } without replacement of term position up to 2௞ times, where k is the number of terms in ܳா . 7. Calculate the mean average precision (MAP) of all possible queries that is generated from step 6 and from them choose the best query and named that query as ܳԢு . 8. All the relevant documents generate by the query ܳԢு gives to the user. To understand the algorithm, we consider an example that is shown below: 1. User enters the query ܳா = {Democracy in India}. 2. Finds all the terms from ܳா i.e. {Democracy, India} and translate them into Hindi language as {लोकतंऽ, भारत} and naming them as {‫ݐ‬ଵ , ‫ݐ‬ଶ ሽ. 3. Finds all the terms from ܳா and transliteration those terms into Hindi language using Itrans tool as {डे मोबेसी, इं Ǒडया} and naming them as {‫ݐ‬Ԣଵ , ‫ݐ‬Ԣଶ }. 4. Mapping terms { ‫ݐ‬ଵ , ‫ݐ‬ଶ , ‫ݐ‬ଷ ……… ‫ݐ‬௄ }= { ‫ݐ‬Ԣଵ , ‫ݐ‬Ԣଶ , ‫ݐ‬Ԣଷ ……… ‫ݐ‬Ԣ௄ } means 55
  • 4. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.6, December 2013 ‫ݐ‬ଵ = ‫ݐ‬Ԣଵ (लोकतंऽ = डे मोबेसी) and ‫ݐ‬ଶ = ‫ݐ‬Ԣଶ (भारत = इं Ǒडया) 5. Translate English query ܳா into Hindi queryܳு . ܳு = {भारत मɅ लोकतंऽ} 6. Making all the possible combination of Hindi Query ܳு using { ‫ݐ‬Ԣଵ , ‫ݐ‬Ԣଶ } without replacement of term position up to 2௞ times means 2ଶ times ܳு = ሺभारत मɅ लोकतंऽ | इं Ǒडया मɅ लोकतंऽ | भारत मɅ डे मोबेसी | इं Ǒडया मɅ डे मोबेसी ሻ 7. Calculate the mean average precision (MAP) of all possible queries that is generated from step 6 and from them choose the best query and named that query as ܳԢு . 8. All the relevant documents generate by the query ܳԢு gives to the user. 3.1. Query Translation Translation of query from one language to other language is known as query translation. Query translation is a crucial step in CLIR system because all problems come from this step like mismatching of query terms, ambiguities, poor retrieval performance etc. There are various approaches are used to translate user query like Bilingual dictionary, parallel corpus, online translator etc. In this paper, we have used ‘Shabdanjali’ Multi-lingual Readable Dictionary as a lexicon resource for translating English to Hindi query. The dictionary was developed in IIIT Hyderabad. This dictionary is available in ISCII conversion. So, a conversion from ISCII to UTF8 encoding code is required. The other inbuilt tools/resources that help to translate English query to Hindi query is shown in table 1. TABLE 1 Tools used for query translation Resources Morphological Analysis POS tagger Transliteration STOP word Stemmer Tool Used ittoolbox Stanfort POS tagger I-Trans List of 480 Stop words Porter Stemmer 4. EXPERIMENTS Experiment is performed on FIRE (Forum of Information Retrieval Evaluation) 2010 datasets. FIRE 2010 datasets consists of set of user queries in terms of ‘Title’ field, ‘Description’ field and ‘narrative’ field, set of documents and qrel files which gives a list of relevant documents for a queries. The experiment performed for English-Hindi CLIR to retrieve Hindi documents using English queries. In this paper we used only ‘Title’ field of queries. Test data collection describe in table 2 as follows: TABLE 2 Statistics of FIRE 2010 data collection Metrics CLIR Query Language English Document Language Hindi No. of Queries No. of Documents Size of Documents Avg. no of rel. documents per 50 1,49,481 1.36 GB 18 56
  • 5. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.6, December 2013 query We obtain results of two approaches, first EHT approach which means English-Hindi CLIR using title field of query and second is EHRT approach which means English-Hindi CLIR using refined title field of query using our approach. The results of both approaches are shown in term of interpolation recall-precision average in table 3. Table 3 Recall-Precision Average Metric No. of query No. of retrieve documents No. of rel. documents No. of retrieve relevant At 0.00 At 0.10 At 0.20 At 0.30 At 0.40 At 0.50 At 0.60 At 0.70 At 0.80 At 0.90 At 1.00 MAP At 5 docs At 10 docs At 15 docs At 20 docs At 30 docs At 100 docs At 200 docs At 500 docs At 1000 docs R-precision EHT 50 50,000 EHRT 50 50,000 915 859 0.6707 0.5852 0.5088 0.4395 0.3827 0.3334 0.2935 0.2478 0.1929 0.1364 0.1074 0.3324 0.4240 0.3800 0.3373 0.3000 0.2460 0.1174 0.0672 0.0325 0.0171 0.3260 915 880 0.7221 0.6233 0.5421 0.4741 0.4202 0.3697 0.3260 0.2737 0.2150 0.1517 0.1153 0.3609 0.4360 0.4060 0.3667 0.3260 0.2607 0.1236 0.0708 0.0334 0.0176 0.3459 4.1. Performance graph From the table 2, it is clear that our English-Hindi CLIR gives better performance as compared to original title field queries of English-Hindi CLIR in terms of MAP (mean average precision). Figure 1 shows the interpolation recall-precision average of both systems. 57
  • 6. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.6, December 2013 0.9 EHT 0.7 0.5 EHRT 0.3 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 1 Interpolation Recall-Precision Averages 5. CONCLUSION The hurdle problem in CLIR is poor performance when compared to monolingual IR performance because of query term mismatching, un-translated query words, multiple representations of query terms etc. There are considerable amount of work is already done in English-Hindi CLIR. The different-different approaches for retrieving information from CLIR system are discussed in related work part. Those earlier proposed approaches have some advantages and disadvantages. In order to make these approaches to be more efficient and effective practically the limitations of those approaches need to be further inquired towards achieving the increase in the performance of the English-Hindi CLIR system. In this paper, we used transliteration of each query terms of English query to make all possible combination of Hindi query without replacement of query term position in order to improving of English Hindi CLIR. The average mean precision of EHT and EHRT strategies are .3324 and .3721. The experimental results show that the proposed approach for refined ‘title’ field of queries gives more relevant information as compared to original ‘title’ field of FIRE 2010 queries. Therefore proposed approach can helps to improving the performance of English-Hindi CLIR and retrieving rich and high quality of information with increase in average precision. This approach is language independent means this approach works on any combination of languages. REFERENCES [1] B. Ashwin, (2012) "Profound Survey on Cross Language Information Retrieval Methods", Published in Advanced Computing & Communication Technologies (ACCT), 2012 Second International Conference on IEEE, pp. 65-70. [2] J. M. Ponte & W. B. Croft, (1998) “A language modelling approach to information retrieval”, In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp. 275-281. [3] L. Ballesteros & W. Bruce Croft, (1998) "Resolving ambiguity for cross-language retrieval”, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 61-71. [4] L. Kyung-Soon, K. Kageura & K. Choi, (2002) "Implicit ambiguity resolution using incremental clustering in Korean-to-English cross-language information retrieval," Proceedings of the 19th international conference on Computational linguistics, Vol. 1, pp. 1-7, 2002. [5] D. Zhou, M. Truran & T. Brailsford, (2008) "Disambiguation and unknown term translation in cross language information retrieval", Published in Advances in Multilingual and Multimodal Information Retrieval, Springer Berlin Heidelberg, pp. 64-71. 58
  • 7. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.6, December 2013 [6] S. Das, A. Seetha, M. Kumar & J. L. Rana, (2010) "Post Translation Query Expansion using Hindi Word-Net for English-Hindi CLIR System", Published in Forum of Information Retrieval Evaluation, FIRE, pp. 53-41. [7] S. M Chaware & S. Rao, (2011) "Ontology Approach for Cross-Language Information Retrieval", Published in International Journal of Computer Technology and Application, vol. 2, pp. 379-384. Author Saurabh Varshney obtained his bachelor’s degree from UPTU University, Lucknow, India. Now he is currently an M.Tech student under the supervision of Asstt. Prof. Jyoti Bajpai. His research is centred on performance of English-Hindi Cross Language Information Retrieval System. 59