SlideShare a Scribd company logo
1 of 12
Running Head: MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 1
Multilingualism in Information Retrieval Systems
Ariel Hess
University of North Texas
INFO 5206
May 5, 2017
Summary/Author’s Note:
Multilingualism in information retrieval systems is a topic that researchers have spent
countless hours examining. The challenge of creating a system that allows the user to input a
query that contains multiple languages and a result are populated in multiple languages is
something that will continue to be examined. Information retrieval systems can be adjusted to
include features that are designed to translate documents and queries. This paper will examine
different strategies used for text translation, projects implemented and challenges faced.
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 2
Introduction
Most search engines provide only monolingual search interface for documents mostly
written in English (Chen, Lee & Yang, 2009, p.4). Users often translate their query into English
before using a search engine. The goal of creating a Multilingual Retrieval System is to allow
users to search for information in multiple languages and retrieve information in multiple
languages. This is done with the deployment of Cross Language Retrieval, allows the user to ask
a question in one language and retrieve the information in another.
A survey of academic users was done to gain a better understanding of why users want to
have access to information documents in different languages. This was done to see if users in a
Digital Library would want access to a multilingual retrieval systems. Most users wanted the
access because of educational purposes. Users would use a Multilingual Information Retrieval
System to complete assignments that require documents to be searched using a language other
than English. The study showed that some users felt it would be too difficult to search for
documents that contain more than one language (He, Luo & Wu, 2012, pp. 188). The overall
takeaway from the survey is to gain a better understanding of user needs to determine if this
system works with the preexisting Information Retrieval System and the users. Developers want
to dismantle the barrier between the user query and multilingual documents. This can be done by
adjusting the Information Retrieval System to incorporate multilingualism by adding translation
tools and various other techniques.
Generally, a Multilingual Retrieval Systems works by first searching retrieving
documents from different collections from each language. Then a monolingual list or results is
retrieved from each collection to be merged to create a multilingual list. Each system can be
adapted to cater to the needs of the organization. Different tools are employed to ensure
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 3
compatibility. The Multilingual Retrieval System generally focuses on one or all the following:
document, query, and translation.
Approach/ Methods
The method of executing the process of a Multilingual Retrieval System includes a
variety of tools and features. The system has three levels of concern: query, translation, and
document. These areas are expressed through different techniques such as creating a dictionary
based model. Each Multilingual Retrieval System has its own features and deploys different
methods for retrieval. These methods are adaptable and catered to the type of audience the
system is intended for.
The use of text mining is the process of originating quality information from an
unstructured text. (Chen, Lee, & Yang, 2009, p.4) “Text mining in a multilingual setting [is also
incorporated as] an automated process that is design to discover the relationship between
languages (Hsiao, Lee & Yang, 2009, pp. 648).” These three techniques are often employed to
deal with the problem of creating a multilingual friendly system. Using a machine translation
systems, using a bilingual dictionary or terminology base, and using a statistical/probabilistic
mode based on parallel texts are different methods for creating this system.
Query translation is a strategy where the users query is translated into each language
presented into the multilingual collection to generate a monolingual information retrieval process
per language (Cumbreras, Lopez & Santiago, 2011, pp. 414)” The most common query search
depends on concepts of natural language. Dictionary based tool uses a bilingual list of words and
translates it into different languages. A machine translates every document in the corpus into
multiple languages. Corpus Based retrieval tools use knowledge based procurement techniques
to discover cross-lingual relationships and use them in Multilingual Retrieval Systems. This
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 4
method uses word alignment to generate bilingual corpa which establishes relationships between
words in different languages. This in turn is used to create a translation table used in query
translation. It is recommended that the corpus be virtual to save storage and time. These three
methods are grouped together because of their relation to each other. Query translation is made
possible because of dictionary based tools. Once the query is translated then the information is
obtained from a corpus which may have documents clustered. The documents in the corpora are
commonly indexed based on a single keyword or a group of keywords that can be easily found
during searching. Multilingual Comparable Corpus is another tool translated documents that
have the same topics. Many of the text mining themes are based on this method (Hsiao, Lee &
Yang, 2009, pp. 650).
Thesaurus based multilingual retrieval takes related terms in a document that are
commonly used and indexes them. This method can be done in Multilingual Information
Retrieval through mapping between thesauri of different languages (Chen, Lee, & Yang, 2009,
pp.6).
The methods addressed above are all interchangeable with any system that is
implementing a multilingual extension. The intended purpose of tools such as corpora’s is to
ensure a repository is available to access the intended information. The benefit of clustering
corpora’s is that is provides a narrower grouping of documents and text that are comparable.
Applications
The following sections provides examples of existing systems that have added the multilingual
feature to an existing Information Retrieval System or created a new system. Multilingualism is
designed to be incorporated into an already existing system. The following systems examine
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 5
their implementation of multilingualism into their pre-existing system.
SveMed
SveMed is uses terms from the Medical Subject Headings thesaurus which contains a list
of controlled vocabularies and translates these terms into different languages. These terms are
arranged in a hierarchical tree and when deciding which terms are going to be indexed the
indexer tries to select the finest term possible. These terms are then indexed and can be retrieved
by performing a truncation search. This is to ensure user submitted queries can provide results.
The interfaces use a thesaurus based database to translate the medical terms into three different
languages and distinguish information between the document terms. (Gavel, & Anderson, 2014,
pp.272) Uses the Solr search engine that relies solely on query expansion. “The search interface
allows the user to search terms in English, Swedish, or Norwegian, and browse for MeSH terms.
(Gavel & Anderson, 2014, pp.274).” A great advantage of this searching interface is that it
allows the user to select which language to search for information in.
GHSOM
“Growing hierarchical self-organizing map (GHSOM) constructs hierarchical structure of
expandable maps. Algorithms are developed after the relationships between other languages
based on the hierarchical map has been determined (Chen, Lee & Yang, 2009, pp.7).” A speech
tagger is used to select nouns from the text that will be used as keywords. The queries are
reprocessed to convert to vectors that will attach to the overall meaning of the document. Once
the keywords have been selected then they are converted into roots. The training is aid in the
encoding of bilingual documents to ensure users can access the information in these documents.
The expandable maps allow for better results.
Merge Model
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 6
The system first starts out with the user query that is carried out by the Cross Lingual
Information Retrieval system. The query is sent to three different collections and three sets of
results are populated. The merge model is design to combine the three monolingual lists into one
multilingual list. In this model sixty-two features are extracted from the three levels of
Multilingual Retrieval Systems query, document, and translation (Chen, Tsai, & Wang, 2011,
pp.638) A learning based ranking algorithm is employed called Frank to rank items based on
relevance. This learning based merge model has room for improvement.
ICE-TEA
Interactive Cross-Language search English with Translation Enhancement performs
query translation based on an interactive Multilingual Information Access system. The language
resources used is a bilingual dictionary translating English to Chinese. “Translation enhancement
is a feature of this system that provides users the original returned documents and their
translations. [The] system implements post-translation query expansions (He, Wu & Xu, 2012,
pp.527).” The system is designed to allow users to delete any translations that were returned that
was not needed. The system allows more users to interact with various stages of the Multilingual
Information Access system (He, Wu & Xu, 2012, pp.536). The system will need to be developed
to allow for better retrieval of relevant documents. Users can become more involved in the
information retrieval process with the help of this system.
BRUJA
A question and answer system for the management of multilingual collections. This
system uses Cross Lingual Information Retrieval to retrieve documents form a multilingual
system. This a common practice employed in the multilingual systems. The system produces
more correct answers in Spanish then in other languages. This system uses a machine translation
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 7
resource which requires a word-level alignment algorithm for the translations (Cumbreras, Lopez
& Santiago, 2011, pp. 420)
The commonalities of each system is the use of some form of query translation to bridge
the gap between the query and the documents. Each system’s goal is to enable the user to search
for information in multiple languages. Systems mention the involvement of Cross Lingual
Retrieval System in the Multilingual Retrieval System. These two system work together to
connect the user to information requested. The user is able to submit a query and a tool is used to
translate the query into a language corresponding with each collection. Then a list of
monolingual results are populated. This list is merge together with the use of the merging model
explained above. This model is just a model and can be adjust to cater to any other system. The
process of organizing the multilingual documents is different depending on the use of the system.
Documents can be translated then divided into comparable clusters or comparable corpora’s.
Keywords are often taken from documents and they are then translated into various languages
before being searched in the system. The sample systems and methods explained above discuss
methods of helping the user from the input of the query to receiving of the information.
ML News Clustering
Multilingual Document Clustering involves dividing a set of documents into two
languages into clusters, in such a way that similar documents are in the same cluster. News
cloistering is something that is popular because of the vast amount of news available to users.
This study uses a language independent representation of news documents by focusing of
clustering the news documents according to their content. They started with using comparable
multilingual news articles. (Fresno, Martinez & Montavo, 2015, pp.522) Name entities played a
role in the natural language processing, such as machine translation, clustering, summarizing and
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 8
extraction(cite) Comparable corpora were Spanish and English were the languages used.
Expected Density is a measurement tool that can be used in a ML setting to determine the quality
of the clusters (Fresno, Martinez & Montavo, 2015,pp.528).
Challenges/ Limitations
Each article read explain the challenges of creating a multilingual retrieval systems.
There is a large amount of text that has multiple meanings in different languages. This poses a
problem when indexed terms are translated into a term that is represented in the system.
Multilingualism in Information Retrieval Systems is a challenge due to the limitations of existing
programs that are available. The amount of resources available is limited to main items such as
query translation. Many developers want to steer away from translator due to the inaccuracy of
some translations. When words are translated into another language the developer runs the risk
of the word not being translated correctly due to the missed meaning or inadequate translation
tools for languages derived from a specific region. For example, there are many regions of origin
of Spanish which means a viable translation system must be equipped to translate different
versions of Spanish words. This has not been developed.
Some translation systems aren’t equipped to handle the translation of proper nouns. A machine
translation system is deemed as impractical due to the large amount of text being translated
(Dhavachelvan & Sujatha, 2011, pp.116). The larger the text, the slower the retrieval time.
It is important that when choosing keywords to comprehensive ones to allow for chance of
retrieving relevant documents (Peters, et al., 2011, pp.5) In some languages there is no way to
change a verb to a noun which is why some systems require the keyword to have a noun in it.
(Peters, et al., 2011,pp.11) These challenges are common in an information setting where the
user is looking for information in either their native or nonnative language.
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 9
Future Research
Future research should include the creation of a large bilingual text corpa, large scale text
databases for testing, and a database with lexical semantic relations (Fluhr, n.d.,para. 24).
Systems need to be tested in various languages. The Cross-Language Evaluation Forum spent it’s
time from 2000 to 2005 researching implemented systems that have multilingual features for
digital media. CLEF noticed that most systems examined pre-processed the document collection,
adopted linguistic processors and language resources such as POS-taggers (Peters, 2011,
pp.677).
Future testing should include a wide range of users in the test group. Having a group of
test users who are from one specific region does not allow for accurate results. The test group
used needs to be diverse. Questions catered to multilingualism should be asked to determine how
they would use the system and if it would be necessary to implement.
User knowledge needs to be improved. The challenge of implementing a new system that
involves more than one language can frustrate native English speakers and nonnative English
speakers. A study showed “the language choices made by the students while searching for
information on the Internet seemed to indicate that the students used their native languages just
as much as they used English. This is a reflection of the rising multilingualism and
multiculturalism in the online environment and the fact that English is not as dominant as it was
some years ago: (Ajiferuke, et al., 2016, pp.498)” There needs to be adequate time set aside to
train users how to search and use such system. Organizations need to decide if implementing a
Multilingual Retrieval System will be beneficial to their user audience.
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 10
Discussion
Multilingualism in information retrieval systems is a concept that is still in the beginning
stages. It is a challenge to take a document that is written in multiple languages and translate it
into the language derived in the search query. “Multilingualism plays a role in the quality and
effectiveness of communication services offered [to users] (Menard, 2011, pp.15).”
Multilingualism is not only needed in library systems but a museum felt the need to offer this
service to their users as well. This feature was used to allow users to search images that have
been indexed in multiple languages.
Multilingual Information Retrieval System provides document retrieval techniques that
enable a user to enter a query, including a natural language query, in a desired one of a plurality
of supported languages, and retrieve documents from a database that includes documents in at
least one other language of the plurality of supported languages (Libby, et al., 1999, pp.8.)
A variety of articles were examined, each discussing different but similar aspects of Multilingual
Retrieval Systems. A significant improvement can be made to existing samples of retrieval
systems that are implementing the new system. Multilingualism is design to be incorporate to an
already existing Information Retrieval System. There are many tools currently available and
tools that need to be developed. Currently this system is limited to dictionary based tools,
corpora’s, clustering, indexing, and thesaurus based tools. These tools have been beneficial to the
development of this system but need to be enhanced due to errors that can arise.
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 11
References
García-Cumbreras, M. Á, Martínez-Santiago, F., & Ureña-López, L. A. (2011, 10). Architecture and
evaluation of BRUJA, a multilingual question answering system. Information Retrieval, 15(5),
413-432. doi:10.1007/s10791-011-9177-5
Fluhr, Christian (n.d). Multilingual Information Retrieval. Retrieved from
http://www.cslu.ogi.edu/HLTsurvey/ch8node7.html
Gavel, Y., & Andersson, P. (2014, 06). Multilingual query expansion in the SveMed bibliographic
database: A case study. Journal of Information Science, 40(3), 269-280.
doi:10.1177/0165551514524685
Libby, E. D., Palk, W., Yu, E. S., & Li, M. (1999). U.S. Patent No. 6006221. Washington, DC: U.S.
Patent and Trademark Office.
Montalvo, S., Martínez, R., & Fresno, V. (2015, 08). Quality prediction of multilingual news
clustering: An experimental study. Journal of Information Science, 41(4), 518-530.
doi:10.1177/0165551515586671
Ménard, E. (2011, 07). Search Behaviours of Image Users: A Pilot Study on Museum Objects.
Partnership: The Canadian Journal of Library and Information Practice and Research, 6(1).
doi:10.21083/partnership.v6i1.1433
Nzomo, P., Ajiferuke, I., Vaughan, L., & Mckenzie, P. (2016, 09). Multilingual Information Retrieval
& Use: Perceptions and Practices Amongst Bi/Multilingual Academic Users. The Journal of
Academic Librarianship, 42(5), 495-502. doi:10.1016/j.acalib.2016.06.012
Peters, C., Braschler, M., & Clough, P. (2011, 09). Evaluation for Multilingual Information Retrieval
Systems. Multilingual Information Retrieval, 129-169. doi:10.1007/978-3-642-23008-0_5
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 12
P., & D. (2011, 10). A Review on the Cross and Multilingual Information Retrieval. International
Journal of Web & Semantic Technology, 2(4), 115-124. doi:10.5121/ijwest.2011.2409
Tsai, M., Chen, H., & Wang, Y. (2011, 09). Learning a merge model for multilingual information
retrieval. Information Processing & Management, 47(5), 635-646.
doi:10.1016/j.ipm.2009.12.002
Wu, D., He, D., & Luo, B. (2012, 04). Multilingual needs and expectations in digital libraries. The
Electronic Library, 30(2), 182-197. doi:10.1108/02640471211221322
Wu, D., He, D., & Xu, X. (2012, 08). A study of relevance feedback techniques in interactive
multilingual information access. Library Hi Tech, 30(3), 523-544.
doi:10.1108/07378831211266645
Yang, H., Hsiao, H., & Lee, C. (2011, 09). Multilingual document mining and navigation using self-
organizing maps. Information Processing & Management, 47(5), 647-666.
doi:10.1016/j.ipm.2009.12.003
Yang, H., Lee, C., & Chen, D. (2009, 02). A method for multilingual text mining and retrieval using
growing hierarchical self-organizing maps. Journal of Information Science, 35(1), 3-23.
doi:10.1177/0165551508088968
Zhang, X., Liu, J. N., & Atwell, E. (n.d.) Multilingual Information Retrieval in World Wide Web.
Retrieved from
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.165.90&rep=rep1&type=pdf

More Related Content

What's hot

CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notesAnandh Arumugakan
 
[2019] Class-based N-gram Models of Natural Language
[2019] Class-based N-gram Models of Natural Language[2019] Class-based N-gram Models of Natural Language
[2019] Class-based N-gram Models of Natural LanguageJinho Choi
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introductionnimmyjans4
 
Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrievalmghgk
 
Information storage and retrieval
Information storage and  retrievalInformation storage and  retrieval
Information storage and retrievalDr. Utpal Das
 
Reference sources presentation geographical and biographical sources final
Reference sources presentation geographical and biographical sources finalReference sources presentation geographical and biographical sources final
Reference sources presentation geographical and biographical sources finalShaunaKY
 
Catalogue objective, purpose, functions
Catalogue objective, purpose, functionsCatalogue objective, purpose, functions
Catalogue objective, purpose, functionsMahendraAdhikari7
 
Ppt evaluation of information retrieval system
Ppt evaluation of information retrieval systemPpt evaluation of information retrieval system
Ppt evaluation of information retrieval systemsilambu111
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLPkartikaVashisht
 
Architecture of a search engine
Architecture of a search engineArchitecture of a search engine
Architecture of a search engineSylvain Utard
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Daniel Adenew
 
Classaurus classification
Classaurus classificationClassaurus classification
Classaurus classificationavid
 
Z39.50: Information Retrieval protocol ppt
Z39.50: Information Retrieval protocol pptZ39.50: Information Retrieval protocol ppt
Z39.50: Information Retrieval protocol pptSUNILKUMARSINGH
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: ParsingRushdi Shams
 

What's hot (20)

CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
A brief history of MARC
A brief history of MARCA brief history of MARC
A brief history of MARC
 
[2019] Class-based N-gram Models of Natural Language
[2019] Class-based N-gram Models of Natural Language[2019] Class-based N-gram Models of Natural Language
[2019] Class-based N-gram Models of Natural Language
 
Inverted index
Inverted indexInverted index
Inverted index
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
Thesaurus 2101
Thesaurus 2101Thesaurus 2101
Thesaurus 2101
 
Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrieval
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 
Information storage and retrieval
Information storage and  retrievalInformation storage and  retrieval
Information storage and retrieval
 
Reference sources presentation geographical and biographical sources final
Reference sources presentation geographical and biographical sources finalReference sources presentation geographical and biographical sources final
Reference sources presentation geographical and biographical sources final
 
Catalogue objective, purpose, functions
Catalogue objective, purpose, functionsCatalogue objective, purpose, functions
Catalogue objective, purpose, functions
 
Ppt evaluation of information retrieval system
Ppt evaluation of information retrieval systemPpt evaluation of information retrieval system
Ppt evaluation of information retrieval system
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
 
Architecture of a search engine
Architecture of a search engineArchitecture of a search engine
Architecture of a search engine
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...
 
Classaurus classification
Classaurus classificationClassaurus classification
Classaurus classification
 
Z39.50: Information Retrieval protocol ppt
Z39.50: Information Retrieval protocol pptZ39.50: Information Retrieval protocol ppt
Z39.50: Information Retrieval protocol ppt
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
 
what's a dictionary?
 what's a dictionary? what's a dictionary?
what's a dictionary?
 
Query optimization
Query optimizationQuery optimization
Query optimization
 

Similar to Multilingual Systems Improve Information Access

MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESMULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESijcseit
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrievalunyil96
 
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSDMarathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSDIJERA Editor
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageIJERA Editor
 
A Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information RetrievalA Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information Retrievaldannyijwest
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indianeSAT Publishing House
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibEl Habib NFAOUI
 
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRA NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRcscpconf
 
INTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMINTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMijcsa
 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourseijitcs
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for TranslationRIILP
 
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISHDICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISHcscpconf
 
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...Kim Daniels
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
 
Survey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse DictionarySurvey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse DictionaryEditor IJMTER
 
Car-Following Parameters by Means of Cellular Automata in the Case of Evacuation
Car-Following Parameters by Means of Cellular Automata in the Case of EvacuationCar-Following Parameters by Means of Cellular Automata in the Case of Evacuation
Car-Following Parameters by Means of Cellular Automata in the Case of EvacuationCSCJournals
 

Similar to Multilingual Systems Improve Information Access (20)

MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESMULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
 
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSDMarathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSD
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu Language
 
A Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information RetrievalA Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information Retrieval
 
Viva
VivaViva
Viva
 
A SURVEY ON VARIOUS CLIR TECHNIQUES
A SURVEY ON VARIOUS CLIR TECHNIQUESA SURVEY ON VARIOUS CLIR TECHNIQUES
A SURVEY ON VARIOUS CLIR TECHNIQUES
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indian
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRA NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
Use and integration of controlled vocabularies (AGROVOC) in DSpace Repositories
Use and integration of controlled vocabularies (AGROVOC) in DSpace RepositoriesUse and integration of controlled vocabularies (AGROVOC) in DSpace Repositories
Use and integration of controlled vocabularies (AGROVOC) in DSpace Repositories
 
INTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMINTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAM
 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourse
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation
 
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISHDICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
 
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
 
Survey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse DictionarySurvey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse Dictionary
 
Car-Following Parameters by Means of Cellular Automata in the Case of Evacuation
Car-Following Parameters by Means of Cellular Automata in the Case of EvacuationCar-Following Parameters by Means of Cellular Automata in the Case of Evacuation
Car-Following Parameters by Means of Cellular Automata in the Case of Evacuation
 

Recently uploaded

办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdf
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdfThe Intriguing World of CDR Analysis by Police: What You Need to Know.pdf
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdfMilind Agarwal
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Roomdivyansh0kumar0
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一3sw2qly1
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Denver Web Design brochure for public viewing
Denver Web Design brochure for public viewingDenver Web Design brochure for public viewing
Denver Web Design brochure for public viewingbigorange77
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Deliverybabeytanya
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Personfurqan222004
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 

Recently uploaded (20)

办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdf
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdfThe Intriguing World of CDR Analysis by Police: What You Need to Know.pdf
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdf
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
Denver Web Design brochure for public viewing
Denver Web Design brochure for public viewingDenver Web Design brochure for public viewing
Denver Web Design brochure for public viewing
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Person
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
 

Multilingual Systems Improve Information Access

  • 1. Running Head: MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 1 Multilingualism in Information Retrieval Systems Ariel Hess University of North Texas INFO 5206 May 5, 2017 Summary/Author’s Note: Multilingualism in information retrieval systems is a topic that researchers have spent countless hours examining. The challenge of creating a system that allows the user to input a query that contains multiple languages and a result are populated in multiple languages is something that will continue to be examined. Information retrieval systems can be adjusted to include features that are designed to translate documents and queries. This paper will examine different strategies used for text translation, projects implemented and challenges faced.
  • 2. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 2 Introduction Most search engines provide only monolingual search interface for documents mostly written in English (Chen, Lee & Yang, 2009, p.4). Users often translate their query into English before using a search engine. The goal of creating a Multilingual Retrieval System is to allow users to search for information in multiple languages and retrieve information in multiple languages. This is done with the deployment of Cross Language Retrieval, allows the user to ask a question in one language and retrieve the information in another. A survey of academic users was done to gain a better understanding of why users want to have access to information documents in different languages. This was done to see if users in a Digital Library would want access to a multilingual retrieval systems. Most users wanted the access because of educational purposes. Users would use a Multilingual Information Retrieval System to complete assignments that require documents to be searched using a language other than English. The study showed that some users felt it would be too difficult to search for documents that contain more than one language (He, Luo & Wu, 2012, pp. 188). The overall takeaway from the survey is to gain a better understanding of user needs to determine if this system works with the preexisting Information Retrieval System and the users. Developers want to dismantle the barrier between the user query and multilingual documents. This can be done by adjusting the Information Retrieval System to incorporate multilingualism by adding translation tools and various other techniques. Generally, a Multilingual Retrieval Systems works by first searching retrieving documents from different collections from each language. Then a monolingual list or results is retrieved from each collection to be merged to create a multilingual list. Each system can be adapted to cater to the needs of the organization. Different tools are employed to ensure
  • 3. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 3 compatibility. The Multilingual Retrieval System generally focuses on one or all the following: document, query, and translation. Approach/ Methods The method of executing the process of a Multilingual Retrieval System includes a variety of tools and features. The system has three levels of concern: query, translation, and document. These areas are expressed through different techniques such as creating a dictionary based model. Each Multilingual Retrieval System has its own features and deploys different methods for retrieval. These methods are adaptable and catered to the type of audience the system is intended for. The use of text mining is the process of originating quality information from an unstructured text. (Chen, Lee, & Yang, 2009, p.4) “Text mining in a multilingual setting [is also incorporated as] an automated process that is design to discover the relationship between languages (Hsiao, Lee & Yang, 2009, pp. 648).” These three techniques are often employed to deal with the problem of creating a multilingual friendly system. Using a machine translation systems, using a bilingual dictionary or terminology base, and using a statistical/probabilistic mode based on parallel texts are different methods for creating this system. Query translation is a strategy where the users query is translated into each language presented into the multilingual collection to generate a monolingual information retrieval process per language (Cumbreras, Lopez & Santiago, 2011, pp. 414)” The most common query search depends on concepts of natural language. Dictionary based tool uses a bilingual list of words and translates it into different languages. A machine translates every document in the corpus into multiple languages. Corpus Based retrieval tools use knowledge based procurement techniques to discover cross-lingual relationships and use them in Multilingual Retrieval Systems. This
  • 4. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 4 method uses word alignment to generate bilingual corpa which establishes relationships between words in different languages. This in turn is used to create a translation table used in query translation. It is recommended that the corpus be virtual to save storage and time. These three methods are grouped together because of their relation to each other. Query translation is made possible because of dictionary based tools. Once the query is translated then the information is obtained from a corpus which may have documents clustered. The documents in the corpora are commonly indexed based on a single keyword or a group of keywords that can be easily found during searching. Multilingual Comparable Corpus is another tool translated documents that have the same topics. Many of the text mining themes are based on this method (Hsiao, Lee & Yang, 2009, pp. 650). Thesaurus based multilingual retrieval takes related terms in a document that are commonly used and indexes them. This method can be done in Multilingual Information Retrieval through mapping between thesauri of different languages (Chen, Lee, & Yang, 2009, pp.6). The methods addressed above are all interchangeable with any system that is implementing a multilingual extension. The intended purpose of tools such as corpora’s is to ensure a repository is available to access the intended information. The benefit of clustering corpora’s is that is provides a narrower grouping of documents and text that are comparable. Applications The following sections provides examples of existing systems that have added the multilingual feature to an existing Information Retrieval System or created a new system. Multilingualism is designed to be incorporated into an already existing system. The following systems examine
  • 5. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 5 their implementation of multilingualism into their pre-existing system. SveMed SveMed is uses terms from the Medical Subject Headings thesaurus which contains a list of controlled vocabularies and translates these terms into different languages. These terms are arranged in a hierarchical tree and when deciding which terms are going to be indexed the indexer tries to select the finest term possible. These terms are then indexed and can be retrieved by performing a truncation search. This is to ensure user submitted queries can provide results. The interfaces use a thesaurus based database to translate the medical terms into three different languages and distinguish information between the document terms. (Gavel, & Anderson, 2014, pp.272) Uses the Solr search engine that relies solely on query expansion. “The search interface allows the user to search terms in English, Swedish, or Norwegian, and browse for MeSH terms. (Gavel & Anderson, 2014, pp.274).” A great advantage of this searching interface is that it allows the user to select which language to search for information in. GHSOM “Growing hierarchical self-organizing map (GHSOM) constructs hierarchical structure of expandable maps. Algorithms are developed after the relationships between other languages based on the hierarchical map has been determined (Chen, Lee & Yang, 2009, pp.7).” A speech tagger is used to select nouns from the text that will be used as keywords. The queries are reprocessed to convert to vectors that will attach to the overall meaning of the document. Once the keywords have been selected then they are converted into roots. The training is aid in the encoding of bilingual documents to ensure users can access the information in these documents. The expandable maps allow for better results. Merge Model
  • 6. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 6 The system first starts out with the user query that is carried out by the Cross Lingual Information Retrieval system. The query is sent to three different collections and three sets of results are populated. The merge model is design to combine the three monolingual lists into one multilingual list. In this model sixty-two features are extracted from the three levels of Multilingual Retrieval Systems query, document, and translation (Chen, Tsai, & Wang, 2011, pp.638) A learning based ranking algorithm is employed called Frank to rank items based on relevance. This learning based merge model has room for improvement. ICE-TEA Interactive Cross-Language search English with Translation Enhancement performs query translation based on an interactive Multilingual Information Access system. The language resources used is a bilingual dictionary translating English to Chinese. “Translation enhancement is a feature of this system that provides users the original returned documents and their translations. [The] system implements post-translation query expansions (He, Wu & Xu, 2012, pp.527).” The system is designed to allow users to delete any translations that were returned that was not needed. The system allows more users to interact with various stages of the Multilingual Information Access system (He, Wu & Xu, 2012, pp.536). The system will need to be developed to allow for better retrieval of relevant documents. Users can become more involved in the information retrieval process with the help of this system. BRUJA A question and answer system for the management of multilingual collections. This system uses Cross Lingual Information Retrieval to retrieve documents form a multilingual system. This a common practice employed in the multilingual systems. The system produces more correct answers in Spanish then in other languages. This system uses a machine translation
  • 7. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 7 resource which requires a word-level alignment algorithm for the translations (Cumbreras, Lopez & Santiago, 2011, pp. 420) The commonalities of each system is the use of some form of query translation to bridge the gap between the query and the documents. Each system’s goal is to enable the user to search for information in multiple languages. Systems mention the involvement of Cross Lingual Retrieval System in the Multilingual Retrieval System. These two system work together to connect the user to information requested. The user is able to submit a query and a tool is used to translate the query into a language corresponding with each collection. Then a list of monolingual results are populated. This list is merge together with the use of the merging model explained above. This model is just a model and can be adjust to cater to any other system. The process of organizing the multilingual documents is different depending on the use of the system. Documents can be translated then divided into comparable clusters or comparable corpora’s. Keywords are often taken from documents and they are then translated into various languages before being searched in the system. The sample systems and methods explained above discuss methods of helping the user from the input of the query to receiving of the information. ML News Clustering Multilingual Document Clustering involves dividing a set of documents into two languages into clusters, in such a way that similar documents are in the same cluster. News cloistering is something that is popular because of the vast amount of news available to users. This study uses a language independent representation of news documents by focusing of clustering the news documents according to their content. They started with using comparable multilingual news articles. (Fresno, Martinez & Montavo, 2015, pp.522) Name entities played a role in the natural language processing, such as machine translation, clustering, summarizing and
  • 8. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 8 extraction(cite) Comparable corpora were Spanish and English were the languages used. Expected Density is a measurement tool that can be used in a ML setting to determine the quality of the clusters (Fresno, Martinez & Montavo, 2015,pp.528). Challenges/ Limitations Each article read explain the challenges of creating a multilingual retrieval systems. There is a large amount of text that has multiple meanings in different languages. This poses a problem when indexed terms are translated into a term that is represented in the system. Multilingualism in Information Retrieval Systems is a challenge due to the limitations of existing programs that are available. The amount of resources available is limited to main items such as query translation. Many developers want to steer away from translator due to the inaccuracy of some translations. When words are translated into another language the developer runs the risk of the word not being translated correctly due to the missed meaning or inadequate translation tools for languages derived from a specific region. For example, there are many regions of origin of Spanish which means a viable translation system must be equipped to translate different versions of Spanish words. This has not been developed. Some translation systems aren’t equipped to handle the translation of proper nouns. A machine translation system is deemed as impractical due to the large amount of text being translated (Dhavachelvan & Sujatha, 2011, pp.116). The larger the text, the slower the retrieval time. It is important that when choosing keywords to comprehensive ones to allow for chance of retrieving relevant documents (Peters, et al., 2011, pp.5) In some languages there is no way to change a verb to a noun which is why some systems require the keyword to have a noun in it. (Peters, et al., 2011,pp.11) These challenges are common in an information setting where the user is looking for information in either their native or nonnative language.
  • 9. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 9 Future Research Future research should include the creation of a large bilingual text corpa, large scale text databases for testing, and a database with lexical semantic relations (Fluhr, n.d.,para. 24). Systems need to be tested in various languages. The Cross-Language Evaluation Forum spent it’s time from 2000 to 2005 researching implemented systems that have multilingual features for digital media. CLEF noticed that most systems examined pre-processed the document collection, adopted linguistic processors and language resources such as POS-taggers (Peters, 2011, pp.677). Future testing should include a wide range of users in the test group. Having a group of test users who are from one specific region does not allow for accurate results. The test group used needs to be diverse. Questions catered to multilingualism should be asked to determine how they would use the system and if it would be necessary to implement. User knowledge needs to be improved. The challenge of implementing a new system that involves more than one language can frustrate native English speakers and nonnative English speakers. A study showed “the language choices made by the students while searching for information on the Internet seemed to indicate that the students used their native languages just as much as they used English. This is a reflection of the rising multilingualism and multiculturalism in the online environment and the fact that English is not as dominant as it was some years ago: (Ajiferuke, et al., 2016, pp.498)” There needs to be adequate time set aside to train users how to search and use such system. Organizations need to decide if implementing a Multilingual Retrieval System will be beneficial to their user audience.
  • 10. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 10 Discussion Multilingualism in information retrieval systems is a concept that is still in the beginning stages. It is a challenge to take a document that is written in multiple languages and translate it into the language derived in the search query. “Multilingualism plays a role in the quality and effectiveness of communication services offered [to users] (Menard, 2011, pp.15).” Multilingualism is not only needed in library systems but a museum felt the need to offer this service to their users as well. This feature was used to allow users to search images that have been indexed in multiple languages. Multilingual Information Retrieval System provides document retrieval techniques that enable a user to enter a query, including a natural language query, in a desired one of a plurality of supported languages, and retrieve documents from a database that includes documents in at least one other language of the plurality of supported languages (Libby, et al., 1999, pp.8.) A variety of articles were examined, each discussing different but similar aspects of Multilingual Retrieval Systems. A significant improvement can be made to existing samples of retrieval systems that are implementing the new system. Multilingualism is design to be incorporate to an already existing Information Retrieval System. There are many tools currently available and tools that need to be developed. Currently this system is limited to dictionary based tools, corpora’s, clustering, indexing, and thesaurus based tools. These tools have been beneficial to the development of this system but need to be enhanced due to errors that can arise.
  • 11. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 11 References García-Cumbreras, M. Á, Martínez-Santiago, F., & Ureña-López, L. A. (2011, 10). Architecture and evaluation of BRUJA, a multilingual question answering system. Information Retrieval, 15(5), 413-432. doi:10.1007/s10791-011-9177-5 Fluhr, Christian (n.d). Multilingual Information Retrieval. Retrieved from http://www.cslu.ogi.edu/HLTsurvey/ch8node7.html Gavel, Y., & Andersson, P. (2014, 06). Multilingual query expansion in the SveMed bibliographic database: A case study. Journal of Information Science, 40(3), 269-280. doi:10.1177/0165551514524685 Libby, E. D., Palk, W., Yu, E. S., & Li, M. (1999). U.S. Patent No. 6006221. Washington, DC: U.S. Patent and Trademark Office. Montalvo, S., Martínez, R., & Fresno, V. (2015, 08). Quality prediction of multilingual news clustering: An experimental study. Journal of Information Science, 41(4), 518-530. doi:10.1177/0165551515586671 Ménard, E. (2011, 07). Search Behaviours of Image Users: A Pilot Study on Museum Objects. Partnership: The Canadian Journal of Library and Information Practice and Research, 6(1). doi:10.21083/partnership.v6i1.1433 Nzomo, P., Ajiferuke, I., Vaughan, L., & Mckenzie, P. (2016, 09). Multilingual Information Retrieval & Use: Perceptions and Practices Amongst Bi/Multilingual Academic Users. The Journal of Academic Librarianship, 42(5), 495-502. doi:10.1016/j.acalib.2016.06.012 Peters, C., Braschler, M., & Clough, P. (2011, 09). Evaluation for Multilingual Information Retrieval Systems. Multilingual Information Retrieval, 129-169. doi:10.1007/978-3-642-23008-0_5
  • 12. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 12 P., & D. (2011, 10). A Review on the Cross and Multilingual Information Retrieval. International Journal of Web & Semantic Technology, 2(4), 115-124. doi:10.5121/ijwest.2011.2409 Tsai, M., Chen, H., & Wang, Y. (2011, 09). Learning a merge model for multilingual information retrieval. Information Processing & Management, 47(5), 635-646. doi:10.1016/j.ipm.2009.12.002 Wu, D., He, D., & Luo, B. (2012, 04). Multilingual needs and expectations in digital libraries. The Electronic Library, 30(2), 182-197. doi:10.1108/02640471211221322 Wu, D., He, D., & Xu, X. (2012, 08). A study of relevance feedback techniques in interactive multilingual information access. Library Hi Tech, 30(3), 523-544. doi:10.1108/07378831211266645 Yang, H., Hsiao, H., & Lee, C. (2011, 09). Multilingual document mining and navigation using self- organizing maps. Information Processing & Management, 47(5), 647-666. doi:10.1016/j.ipm.2009.12.003 Yang, H., Lee, C., & Chen, D. (2009, 02). A method for multilingual text mining and retrieval using growing hierarchical self-organizing maps. Journal of Information Science, 35(1), 3-23. doi:10.1177/0165551508088968 Zhang, X., Liu, J. N., & Atwell, E. (n.d.) Multilingual Information Retrieval in World Wide Web. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.165.90&rep=rep1&type=pdf