This paper describes a transfer based scheme for translating Malayalam, a Dravidian language, to English. This system inputs Malayalam sentences and outputs equivalent English sentences. The system comprises of a preprocessor for splitting the compound words, a morphological parser for context disambiguation and chunking, a syntactic structure transfer module and a bilingual dictionary. All the modules are morpheme based to reduce dictionary size. The system does not rely on a stochastic approach and it is based on a rule-based architecture along with various linguistic knowledge components of both Malayalam and English. The system uses two sets of rules: rules for Malayalam morphology and rules for syntactic structure transfer from Malayalam to English. The system is designed using artificial intelligence techniques.
Survey on Indian CLIR and MT systems in Marathi LanguageEditor IJCATR
Cross Language Information Retrieval (CLIR) deals with retrieving relevant information stored in a language different from
the language of user’s query. This helps users to express the information need in their native languages. Machine translation based (MTbased)
approach of CLIR uses existing machine translation techniques to provide automatic translation of queries. This paper covers the
research work done in CLIR and MT systems for Marathi language in India.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES ijnlc
Stemming is the process of term conflation. It conflates all the word variants to a common form called as stem. It plays significant role in numerous Natural Language Processing (NLP) applications like morphological analysis, parsing, document summarization, text classification, part-of-speech tagging, question-answering system, machine translation, word sense disambiguation, information retrieval (IR), etc. Each of these tasks requires some pre-processing to be done. Stemming is one of the important building blocks for all these applications. This paper, presents an overview of various stemming techniques, evaluation criteria for stemmers and various existing stemmers for Indic languages.
A Novel Approach for Rule Based Translation of English to Marathiaciijournal
This paper presents a design for rule-based machine translation system for English to Marathi language pair. The machine translation system will take input script as English sentence and parse with the help of Stanford parser. The Stanford parser will be used for main purposes on the source side processing, in the machine translation system. English to Marathi Bilingual dictionary is going to be created. The system will take the parsed output and separate the source text word by word and searches for their corresponding target words in the bilingual dictionary. The hand coded rules are written for Marathi inflections and also reordering rules are there. After applying the reordering rules, English sentence will be syntactically reordered to suit Marathi language
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A New Approach to Parts of Speech Tagging in Malayalamijcsit
Parts-of-speech tagging is the process of labeling each word in a sentence. A tag mentions the word’s
usage in the sentence. Usually, these tags indicate syntactic classification like noun or verb, and sometimes
include additional information, with case markers (number, gender etc) and tense markers. A large number
of current language processing systems use a parts-of-speech tagger for pre-processing.
There are mainly two approaches usually followed in Parts of Speech Tagging. Those are Rule based
Approach and Stochastic Approach. Rule based Approach use predefined handwritten rules. This is the
oldest approach and it use lexicon or dictionary for reference. Stochastic Approach use probabilistic and
statistical information to assign tag to words. It use large corpus, so that Time complexity and Space
complexity is high whereas Rule base approach has less complexity for both Time and Space. Stochastic
Approach is the widely used one nowadays because of its accuracy.
Malayalam is a Dravidian family of languages, inflectional with suffixes with the root word forms. The
currently used Algorithms are efficient Machine Learning Algorithms but these are not built for
Malayalam. So it affects the accuracy of the result of Malayalam POS Tagging.
My proposed Approach use Dictionary entries along with adjacent tag information. This algorithm use
Multithreaded Technology. Here tagging done with the probability of the occurrence of the sentence
structure along with the dictionary entry.
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisIJERA Editor
We describe in detail a Grapheme-to-Phoneme (G2P) converter required for the development of a good quality
Marathi Text-to-Speech (TTS) system. The Festival and Festvox framework is chosen for developing the
Marathi TTS system. Since Festival does not provide complete language processing support specie to various
languages, it needs to be augmented to facilitate the development of TTS systems in certain new languages.
Because of this, a generic G2P converter has been developed. In the customized Marathi G2P converter, we
have handled schwa deletion and compound word extraction. In the experiments carried out to test the Marathi
G2P on a text segment of 2485 words, 91.47% word phonetisation accuracy is obtained. This Marathi G2P has
been used for phonetising large text corpora which in turn is used in designing an inventory of phonetically rich
sentences. The sentences ensured a good coverage of the phonetically valid di-phones using only 1.3% of the
complete text corpora.
Survey on Indian CLIR and MT systems in Marathi LanguageEditor IJCATR
Cross Language Information Retrieval (CLIR) deals with retrieving relevant information stored in a language different from
the language of user’s query. This helps users to express the information need in their native languages. Machine translation based (MTbased)
approach of CLIR uses existing machine translation techniques to provide automatic translation of queries. This paper covers the
research work done in CLIR and MT systems for Marathi language in India.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES ijnlc
Stemming is the process of term conflation. It conflates all the word variants to a common form called as stem. It plays significant role in numerous Natural Language Processing (NLP) applications like morphological analysis, parsing, document summarization, text classification, part-of-speech tagging, question-answering system, machine translation, word sense disambiguation, information retrieval (IR), etc. Each of these tasks requires some pre-processing to be done. Stemming is one of the important building blocks for all these applications. This paper, presents an overview of various stemming techniques, evaluation criteria for stemmers and various existing stemmers for Indic languages.
A Novel Approach for Rule Based Translation of English to Marathiaciijournal
This paper presents a design for rule-based machine translation system for English to Marathi language pair. The machine translation system will take input script as English sentence and parse with the help of Stanford parser. The Stanford parser will be used for main purposes on the source side processing, in the machine translation system. English to Marathi Bilingual dictionary is going to be created. The system will take the parsed output and separate the source text word by word and searches for their corresponding target words in the bilingual dictionary. The hand coded rules are written for Marathi inflections and also reordering rules are there. After applying the reordering rules, English sentence will be syntactically reordered to suit Marathi language
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A New Approach to Parts of Speech Tagging in Malayalamijcsit
Parts-of-speech tagging is the process of labeling each word in a sentence. A tag mentions the word’s
usage in the sentence. Usually, these tags indicate syntactic classification like noun or verb, and sometimes
include additional information, with case markers (number, gender etc) and tense markers. A large number
of current language processing systems use a parts-of-speech tagger for pre-processing.
There are mainly two approaches usually followed in Parts of Speech Tagging. Those are Rule based
Approach and Stochastic Approach. Rule based Approach use predefined handwritten rules. This is the
oldest approach and it use lexicon or dictionary for reference. Stochastic Approach use probabilistic and
statistical information to assign tag to words. It use large corpus, so that Time complexity and Space
complexity is high whereas Rule base approach has less complexity for both Time and Space. Stochastic
Approach is the widely used one nowadays because of its accuracy.
Malayalam is a Dravidian family of languages, inflectional with suffixes with the root word forms. The
currently used Algorithms are efficient Machine Learning Algorithms but these are not built for
Malayalam. So it affects the accuracy of the result of Malayalam POS Tagging.
My proposed Approach use Dictionary entries along with adjacent tag information. This algorithm use
Multithreaded Technology. Here tagging done with the probability of the occurrence of the sentence
structure along with the dictionary entry.
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisIJERA Editor
We describe in detail a Grapheme-to-Phoneme (G2P) converter required for the development of a good quality
Marathi Text-to-Speech (TTS) system. The Festival and Festvox framework is chosen for developing the
Marathi TTS system. Since Festival does not provide complete language processing support specie to various
languages, it needs to be augmented to facilitate the development of TTS systems in certain new languages.
Because of this, a generic G2P converter has been developed. In the customized Marathi G2P converter, we
have handled schwa deletion and compound word extraction. In the experiments carried out to test the Marathi
G2P on a text segment of 2485 words, 91.47% word phonetisation accuracy is obtained. This Marathi G2P has
been used for phonetising large text corpora which in turn is used in designing an inventory of phonetically rich
sentences. The sentences ensured a good coverage of the phonetically valid di-phones using only 1.3% of the
complete text corpora.
Quality estimation of machine translation outputs through stemmingijcsa
Machine Translation is the challenging problem for Indian languages. Every day we can see some machine
translators being developed , but getting a high quality automatic translation is still a very distant dream .
The correct translated sentence for Hindi language is rarely found. In this paper, we are emphasizing on
English-Hindi language pair, so in order to preserve the correct MT output we present a ranking system,
which employs some machine learning techniques and morphological features. In ranking no human
intervention is required. We have also validated our results by comparing it with human ranking.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
Named Entity Recognition System for Hindi Language: A Hybrid ApproachWaqas Tariq
Named Entity Recognition (NER) is a major early step in Natural Language Processing (NLP) tasks like machine translation, text to speech synthesis, natural language understanding etc. It seeks to classify words which represent names in text into predefined categories like location, person-name, organization, date, time etc. In this paper we have used a combination of machine learning and Rule based approaches to classify named entities. The paper introduces a hybrid approach for NER. We have experimented with Statistical approaches like Conditional Random Fields (CRF) & Maximum Entropy (MaxEnt) and Rule based approach based on the set of linguistic rules. Linguistic approach plays a vital role in overcoming the limitations of statistical models for morphologically rich language like Hindi. Also the system uses voting method to improve the performance of the NER system. Keywords: NER, MaxEnt, CRF, Rule base, Voting, Hybrid approach
Information retrieval (IR) system aims to retrieve
relevant documents to a user query where the query is a set of
keywords. Cross-language information retrieval (CLIR) is a
retrieval process in which the user fires queries in one language to
retrieve information from another language. The growing
requirement on the Internet for users to access information
expressed in language other than their own has led to Cross
Language Information Retrieval (CLIR) becoming established as
a major topic in IR.
Named Entity Recognition using Hidden Markov Model (HMM)kevig
Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis, natural language understanding, Information Extraction, Information retrieval, question answering etc. The aim of NER is to classify words into some predefined categories like location name, person name, organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based approach of machine learning in detail to identify the named entities. The main idea behind the use of HMM model for building NER system is that it is language independent and we can apply this system for any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can use it according to their interest. The corpus used by our NER system is also not domain specific
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGEcsandit
Arabic Multiword Term are relevant strings of words in text documents. Once they are
automatically extracted, they can be used to increase the performance of any text mining
applications such as Categorisation, Clustering, Information Retrieval System, Machine
Translation, and Summarization, etc. This paper introduces our proposed Multiword term
extraction system based on the contextual information. In fact, we propose a new method based
a hybrid approach for Arabic Multiword term extraction. Like other method based on hybrid
approach, our method is composed by two main steps: the Linguistic approach and the
Statistical one. In the first step, the Linguistic approach uses Part Of Speech (POS) Tagger
(Taani’s Tagger) and the Sequence Identifier as patterns in order to extract the candidate
AMTWs. While in the second one which includes our main contribution, the Statistical approach
incorporates the contextual information by using a new proposed association measure based on
Termhood and Unithood for AMWTs extraction. To evaluate the efficiency of our proposed
method for AMWTs extraction, this later has been tested and compared using three different
association measures: the proposed one named NTC-Value, NC-Value, and C-Value. The
experimental results using Arabic Texts taken from the environment domain, show that our
hybrid method outperforms the other ones in term of precision, in addition, it can deal correctly
with tri-gram Arabic Multiword terms.
A Review on the Cross and Multilingual Information Retrievaldannyijwest
In this paper we explore some of the most important areas of information retrieval. In particular, Cross-
lingual Information Retrieval (CLIR) and Multilingual Information Retrieval (MLIR). CLIR deals with
asking questions in one language and retrieving documents in different language. MLIR deals with asking
questions in one or more languages and retrieving documents in one or more different languages. With an
increasingly globalized economy, the ability to find information in other languages is becoming a necessity.
We also presented the evaluation initiatives of information retrieval domain. Finally we have presented the
overall review of the research works in Indian and Foreign languages.
Tamil-English Document Translation Using Statistical Machine Translation Appr...baskaran_md
The Paper presents a new method for translating a text document from Tamil to English. Our method is based on the Statistical Machine Translation Approach, combined with the Morphological Analysis, due to the fact that Tamil is a highly-inflected language. This paper presents a slight modification in SMT to make the approach more efficient and effective, and the experimental results have proven the method to be speed and accurate in the translation process.
Quality Translation Enhancement Using Sequence Knowledge and Pruning in Stati...TELKOMNIKA JOURNAL
Machine translation has two important parts, a learning process which followed by a translation
process. Unfortunately, most of the translation process requires complex operations and in-depth
knowledge of the languages in order to give a good quality translation. This study proposes a better
approach, which does not require in-depth knowledge of the linguistic properties of the languages, but it
produces a good quality translation. This study evaluated 28 different parameters in IRSTLM language
modeling, which resulting 270 millions experiments, and proposes a sequence evaluation mechanism
based on a maximum evaluation of each parameter in producing a good quality translation based on NIST
and BLEU. The parallel corpus and statistical machine learning for English and Bahasa Indonesia were
used in this study. The pruning process, user interface, and the personalization of translation have a very
important role in implementing of this machine translation. The result is quite promising. It shows that
pruning process increases of the translation process time. The particular sequence knowledge/value
parameter in translation process has a better performance than the other method using in-depth linguistic
knowledge approaches. All these processes, including the process of parsing from a stand-alone mode to
an online mode, are also discussed in detail.
Phrase Identification is one of the most critical and widely studied in Natural Language processing (NLP) tasks. Verb Phrase Identification within a sentence is very useful for a variety of application on NLP. One of the core enabling technologies required in NLP applications is a Morphological Analysis. This paper presents the Myanmar Verb Phrase Identification and Translation Algorithm and develops a Markov Model with Morphological Analysis. The system is based on Rule-Based Maximum Matching Approach. In Machine Translation, Large amount of information is needed to guide the translation process. Myanmar Language is inflected language and there are very few creations and researches of Lexicon in Myanmar, comparing to other language such as English, French and Czech etc. Therefore, this system is proposed Myanmar Verb Phrase identification and translation model based on Syntactic Structure and Morphology of Myanmar Language by using Myanmar- English bilingual lexicon. Markov Model is also used to reformulate the translation probability of Phrase pairs. Experiment results showed that proposed system can improve translation quality by applying morphological analysis on Myanmar Language.
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHijnlc
Machine translation is being carried out by the researchers from quite a long time. However, it is still a
dream to materialize flawless Machine Translator and the small numbers of researchers has focussed at
translating Marathi Text to English. Perfect Machine Translation Systems have not yet been fully built
owing to the fact that languages differ syntactically as well as morphologically. Majority of the researchers
have opted for Statistical Machine translation whereas in this paper we have addressed the challenges of
Rule based Machine Translation. The paper describes the major divergences observed in language
Marathi and English and many challenges encountered while attempting to build machine translation
system form Marathi to English using rule based approach and rules to handle these challenges. As there
are exceptions to the rules and limit to the feasibility of maintaining knowledgebase, the practical machine
translation from Marathi to English is a complex task.
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...IJERA Editor
This research paper reports preliminary results of data-driven modeling of segmentalphoneme duration for
Marathi. Classification and Regression Tree based data driven duration modeling for segmental duration
prediction is presented. A number of features are considered and their usefulness and relative contribution for
segmental duration prediction is assessed. Objective evaluation of the duration model, by root mean squared
prediction error and correlation between actual and predicted durations, is performed.
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...ijnlc
This article introduces a methodology for analyzing sentiment in Arabic text using a global foreign lexical
source. Our method leverages the available resource in another language such as the SentiWordNet in
English to the limited language resource that is Arabic. The knowledge that is taken from the external
resource will be injected into the feature model whilethe machine-learning-based classifier is trained. The
first step of our method is to build the bag-of-words (BOW) model of the Arabic text. The second step
calculates the score of polarity using translation machine technique and English SentiWordNet. The scores
for each text will be added to the model in three pairs for objective, positive, and negative. The last step of
our method involves training the ML classifier on that model to predict the sentiment of the Arabic text.
Our method increases the performance compared with the baseline model that is BOW in most cases. In
addition, it seems a viable approach to sentiment analysis in Arabic text where there is limitation of the
available resource.
Punjabi Text Classification using Naïve Bayes, Centroid and Hybrid Approachcscpconf
Punjabi Text Classification is the process of assigning predefined classes to the unlabelled text
documents. Because of dramatic increase in the amount of content available in digital form, text
classification becomes an urgent need to manage the digital data efficiently and accurately. Till
now no Punjabi Text Classifier is available for Punjabi Text Documents. Therefore, in this
paper, existing classification algorithm such as Naïve Bayes, Centroid Based techniques are
used for Punjabi Text Classification. And one new approach is proposed for the Punjabi Text
Documents which is the combination Naïve Bayes (to extract the relevant features so as to
reduce the dimensionality) and Ontology Based Classification (that act as text classifier that
used extracted features). These algorithms are performed over 184 Punjabi News Articles on
Sports that classify the documents into 7 classes such as ਿਕਕਟ (krikaṭ), ਹਾਕੀ (hākī), ਕਬੱ ਡੀ
(kabḍḍī), ਫੁਟਬਾਲ (phuṭbāl), ਟੈਿਨਸ (ṭainis), ਬੈਡਿਮੰ ਟਨ (baiḍmiṇṭan), ਓਲੰ ਿਪਕ (ōlmpik).
Language Identifier for Languages of Pakistan Including Arabic and PersianWaqas Tariq
Language recognizer/identifier/guesser is the basic application used by humans to identify the language of a text document. It takes simply a file as input and after processing its text, decides the language of text document with precision using LIJ-I, LIJ-II and LIJ-III. LIJ-I results in poor accuracy and strengthen with the use of LIJ-II which is further boosted towards a higher level of accuracy with the use of LIJ-III. It also helps in calculating the probability of digrams and the average percentages of accuracy. LIJ-I considers the complete character sets of each language while the LIJ-II considers only the difference. A JAVA based language recognizer is developed and presented in this paper in detail.
La Région Provence-Alpes-Côte d’Azur et Pôle emploi, acteurs incontournables de l’Emploi, du
Développement Economique et de la Formation Professionnelle, conjuguent leurs efforts et
renforcent leur action en faveur de l’emploi, au bénéfice des chefs d’entreprises et des demandeurs
d’emploi.
Quality estimation of machine translation outputs through stemmingijcsa
Machine Translation is the challenging problem for Indian languages. Every day we can see some machine
translators being developed , but getting a high quality automatic translation is still a very distant dream .
The correct translated sentence for Hindi language is rarely found. In this paper, we are emphasizing on
English-Hindi language pair, so in order to preserve the correct MT output we present a ranking system,
which employs some machine learning techniques and morphological features. In ranking no human
intervention is required. We have also validated our results by comparing it with human ranking.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
Named Entity Recognition System for Hindi Language: A Hybrid ApproachWaqas Tariq
Named Entity Recognition (NER) is a major early step in Natural Language Processing (NLP) tasks like machine translation, text to speech synthesis, natural language understanding etc. It seeks to classify words which represent names in text into predefined categories like location, person-name, organization, date, time etc. In this paper we have used a combination of machine learning and Rule based approaches to classify named entities. The paper introduces a hybrid approach for NER. We have experimented with Statistical approaches like Conditional Random Fields (CRF) & Maximum Entropy (MaxEnt) and Rule based approach based on the set of linguistic rules. Linguistic approach plays a vital role in overcoming the limitations of statistical models for morphologically rich language like Hindi. Also the system uses voting method to improve the performance of the NER system. Keywords: NER, MaxEnt, CRF, Rule base, Voting, Hybrid approach
Information retrieval (IR) system aims to retrieve
relevant documents to a user query where the query is a set of
keywords. Cross-language information retrieval (CLIR) is a
retrieval process in which the user fires queries in one language to
retrieve information from another language. The growing
requirement on the Internet for users to access information
expressed in language other than their own has led to Cross
Language Information Retrieval (CLIR) becoming established as
a major topic in IR.
Named Entity Recognition using Hidden Markov Model (HMM)kevig
Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis, natural language understanding, Information Extraction, Information retrieval, question answering etc. The aim of NER is to classify words into some predefined categories like location name, person name, organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based approach of machine learning in detail to identify the named entities. The main idea behind the use of HMM model for building NER system is that it is language independent and we can apply this system for any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can use it according to their interest. The corpus used by our NER system is also not domain specific
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGEcsandit
Arabic Multiword Term are relevant strings of words in text documents. Once they are
automatically extracted, they can be used to increase the performance of any text mining
applications such as Categorisation, Clustering, Information Retrieval System, Machine
Translation, and Summarization, etc. This paper introduces our proposed Multiword term
extraction system based on the contextual information. In fact, we propose a new method based
a hybrid approach for Arabic Multiword term extraction. Like other method based on hybrid
approach, our method is composed by two main steps: the Linguistic approach and the
Statistical one. In the first step, the Linguistic approach uses Part Of Speech (POS) Tagger
(Taani’s Tagger) and the Sequence Identifier as patterns in order to extract the candidate
AMTWs. While in the second one which includes our main contribution, the Statistical approach
incorporates the contextual information by using a new proposed association measure based on
Termhood and Unithood for AMWTs extraction. To evaluate the efficiency of our proposed
method for AMWTs extraction, this later has been tested and compared using three different
association measures: the proposed one named NTC-Value, NC-Value, and C-Value. The
experimental results using Arabic Texts taken from the environment domain, show that our
hybrid method outperforms the other ones in term of precision, in addition, it can deal correctly
with tri-gram Arabic Multiword terms.
A Review on the Cross and Multilingual Information Retrievaldannyijwest
In this paper we explore some of the most important areas of information retrieval. In particular, Cross-
lingual Information Retrieval (CLIR) and Multilingual Information Retrieval (MLIR). CLIR deals with
asking questions in one language and retrieving documents in different language. MLIR deals with asking
questions in one or more languages and retrieving documents in one or more different languages. With an
increasingly globalized economy, the ability to find information in other languages is becoming a necessity.
We also presented the evaluation initiatives of information retrieval domain. Finally we have presented the
overall review of the research works in Indian and Foreign languages.
Tamil-English Document Translation Using Statistical Machine Translation Appr...baskaran_md
The Paper presents a new method for translating a text document from Tamil to English. Our method is based on the Statistical Machine Translation Approach, combined with the Morphological Analysis, due to the fact that Tamil is a highly-inflected language. This paper presents a slight modification in SMT to make the approach more efficient and effective, and the experimental results have proven the method to be speed and accurate in the translation process.
Quality Translation Enhancement Using Sequence Knowledge and Pruning in Stati...TELKOMNIKA JOURNAL
Machine translation has two important parts, a learning process which followed by a translation
process. Unfortunately, most of the translation process requires complex operations and in-depth
knowledge of the languages in order to give a good quality translation. This study proposes a better
approach, which does not require in-depth knowledge of the linguistic properties of the languages, but it
produces a good quality translation. This study evaluated 28 different parameters in IRSTLM language
modeling, which resulting 270 millions experiments, and proposes a sequence evaluation mechanism
based on a maximum evaluation of each parameter in producing a good quality translation based on NIST
and BLEU. The parallel corpus and statistical machine learning for English and Bahasa Indonesia were
used in this study. The pruning process, user interface, and the personalization of translation have a very
important role in implementing of this machine translation. The result is quite promising. It shows that
pruning process increases of the translation process time. The particular sequence knowledge/value
parameter in translation process has a better performance than the other method using in-depth linguistic
knowledge approaches. All these processes, including the process of parsing from a stand-alone mode to
an online mode, are also discussed in detail.
Phrase Identification is one of the most critical and widely studied in Natural Language processing (NLP) tasks. Verb Phrase Identification within a sentence is very useful for a variety of application on NLP. One of the core enabling technologies required in NLP applications is a Morphological Analysis. This paper presents the Myanmar Verb Phrase Identification and Translation Algorithm and develops a Markov Model with Morphological Analysis. The system is based on Rule-Based Maximum Matching Approach. In Machine Translation, Large amount of information is needed to guide the translation process. Myanmar Language is inflected language and there are very few creations and researches of Lexicon in Myanmar, comparing to other language such as English, French and Czech etc. Therefore, this system is proposed Myanmar Verb Phrase identification and translation model based on Syntactic Structure and Morphology of Myanmar Language by using Myanmar- English bilingual lexicon. Markov Model is also used to reformulate the translation probability of Phrase pairs. Experiment results showed that proposed system can improve translation quality by applying morphological analysis on Myanmar Language.
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHijnlc
Machine translation is being carried out by the researchers from quite a long time. However, it is still a
dream to materialize flawless Machine Translator and the small numbers of researchers has focussed at
translating Marathi Text to English. Perfect Machine Translation Systems have not yet been fully built
owing to the fact that languages differ syntactically as well as morphologically. Majority of the researchers
have opted for Statistical Machine translation whereas in this paper we have addressed the challenges of
Rule based Machine Translation. The paper describes the major divergences observed in language
Marathi and English and many challenges encountered while attempting to build machine translation
system form Marathi to English using rule based approach and rules to handle these challenges. As there
are exceptions to the rules and limit to the feasibility of maintaining knowledgebase, the practical machine
translation from Marathi to English is a complex task.
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...IJERA Editor
This research paper reports preliminary results of data-driven modeling of segmentalphoneme duration for
Marathi. Classification and Regression Tree based data driven duration modeling for segmental duration
prediction is presented. A number of features are considered and their usefulness and relative contribution for
segmental duration prediction is assessed. Objective evaluation of the duration model, by root mean squared
prediction error and correlation between actual and predicted durations, is performed.
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...ijnlc
This article introduces a methodology for analyzing sentiment in Arabic text using a global foreign lexical
source. Our method leverages the available resource in another language such as the SentiWordNet in
English to the limited language resource that is Arabic. The knowledge that is taken from the external
resource will be injected into the feature model whilethe machine-learning-based classifier is trained. The
first step of our method is to build the bag-of-words (BOW) model of the Arabic text. The second step
calculates the score of polarity using translation machine technique and English SentiWordNet. The scores
for each text will be added to the model in three pairs for objective, positive, and negative. The last step of
our method involves training the ML classifier on that model to predict the sentiment of the Arabic text.
Our method increases the performance compared with the baseline model that is BOW in most cases. In
addition, it seems a viable approach to sentiment analysis in Arabic text where there is limitation of the
available resource.
Punjabi Text Classification using Naïve Bayes, Centroid and Hybrid Approachcscpconf
Punjabi Text Classification is the process of assigning predefined classes to the unlabelled text
documents. Because of dramatic increase in the amount of content available in digital form, text
classification becomes an urgent need to manage the digital data efficiently and accurately. Till
now no Punjabi Text Classifier is available for Punjabi Text Documents. Therefore, in this
paper, existing classification algorithm such as Naïve Bayes, Centroid Based techniques are
used for Punjabi Text Classification. And one new approach is proposed for the Punjabi Text
Documents which is the combination Naïve Bayes (to extract the relevant features so as to
reduce the dimensionality) and Ontology Based Classification (that act as text classifier that
used extracted features). These algorithms are performed over 184 Punjabi News Articles on
Sports that classify the documents into 7 classes such as ਿਕਕਟ (krikaṭ), ਹਾਕੀ (hākī), ਕਬੱ ਡੀ
(kabḍḍī), ਫੁਟਬਾਲ (phuṭbāl), ਟੈਿਨਸ (ṭainis), ਬੈਡਿਮੰ ਟਨ (baiḍmiṇṭan), ਓਲੰ ਿਪਕ (ōlmpik).
Language Identifier for Languages of Pakistan Including Arabic and PersianWaqas Tariq
Language recognizer/identifier/guesser is the basic application used by humans to identify the language of a text document. It takes simply a file as input and after processing its text, decides the language of text document with precision using LIJ-I, LIJ-II and LIJ-III. LIJ-I results in poor accuracy and strengthen with the use of LIJ-II which is further boosted towards a higher level of accuracy with the use of LIJ-III. It also helps in calculating the probability of digrams and the average percentages of accuracy. LIJ-I considers the complete character sets of each language while the LIJ-II considers only the difference. A JAVA based language recognizer is developed and presented in this paper in detail.
La Région Provence-Alpes-Côte d’Azur et Pôle emploi, acteurs incontournables de l’Emploi, du
Développement Economique et de la Formation Professionnelle, conjuguent leurs efforts et
renforcent leur action en faveur de l’emploi, au bénéfice des chefs d’entreprises et des demandeurs
d’emploi.
CSCC Resources for Foster and Homeless YouthLisa Dickson
Compiled by Stephanie Starks, in support of the Alumni Support and Assistance Project (ASAP). For more information about this project, please visit: http://www.oyit-asap.net/
A Novel Approach for Rule Based Translation of English to Marathiaciijournal
This paper presents a design for rule-based machine translation system for English to Marathi language pair. The machine translation system will take input script as English sentence and parse with the help of Stanford parser. The Stanford parser will be used for main purposes on the source side processing, in the machine translation system. English to Marathi Bilingual dictionary is going to be created. The system will take the parsed output and separate the source text word by word and searches for their corresponding target words in the bilingual dictionary. The hand coded rules are written for Marathi inflections and also reordering rules are there. After applying the reordering rules, English sentence will be syntactically reordered to suit Marathi language
A Novel Approach for Rule Based Translation of English to Marathiaciijournal
This paper presents a design for rule-based machine translation system for English to Marathi language
pair. The machine translation system will take input script as English sentence and parse with the help of
Stanford parser. The Stanford parser will be used for main purposes on the source side processing, in the
machine translation system. English to Marathi Bilingual dictionary is going to be created. The system will
take the parsed output and separate the source text word by word and searches for their corresponding
target words in the bilingual dictionary. The hand coded rules are written for Marathi inflections and also
reordering rules are there. After applying the reordering rules, English sentence will be syntactically
reordered to suit Marathi language.
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachIJERA Editor
The language is an effective medium for the communication that conveys the ideas and expression of the human
mind. There are more than 5000 languages in the world for the communication. To know all these languages is
not a solution for problems due to the language barrier in communication. In this multilingual world with the
huge amount of information exchanged between various regions and in different languages in digitized format,
it has become necessary to find an automated process to convert from one language to another. Natural
Language Processing (NLP) is one of the hot areas of research that explores how computers can be utilizing to
understand and manipulate natural language text or speech. In the Proposed system a Hybrid approach to
transliterate the proper nouns from Punjabi to Hindi is developed. Hybrid approach in the proposed system is a
combination of Direct Mapping, Rule based approach and Statistical Machine Translation approach (SMT).
Proposed system is tested on various proper nouns from different domains and accuracy of the proposed system
is very good.
A Survey of Various Methods for Text SummarizationIJERD Editor
Document summarization means retrieved short and important text from the source document. In this paper, we studied various techniques. Plenty of techniques have been developed on English summarization and other Indian languages but very less efforts have been taken for Hindi language. Here, we discusses various techniques in which so many features are included such as time and memory consumption, efficiency, accuracy, ambiguity, redundancy.
A template based algorithm for automatic summarization and dialogue managemen...eSAT Journals
Abstract This paper describes an automated approach for extracting significant and useful events from unstructured text. The goal of research is to come out with a methodology which helps in extracting important events such as dates, places, and subjects of interest. It would be also convenient if the methodology helps in presenting the users with a shorter version of the text which contain all non-trivial information. We also discuss implementation of algorithms which exactly does this task, developed by us. Key Words: Cosine Similarity, Information, Natural Language, Summarization, Text Mining
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
Cyberbullying is currently one of the most important research fields. The majority of researchers have contributed to research on bully text identification in English texts or comments, due to the scarcity of data; analyzing Tamil textstemming is frequently a tedious job. Tamil is a morphologically diverse and agglutinative language. The creation of a Tamil stemmer is not an easy undertaking. After examining the major difficulties encountered, proposed the rule-based iterative preprocessing algorithm (RBIPA). In this attempt, Tamil morphemes and lemmas were extracted using the suffix stripping technique and a supervised machine learning algorithm for classify the word based for pronouns and proper nouns. The novelty of proposed system is developing a preprocessing algorithm for iterative stemming; lemmatize process to discovering exact words from the Tamil Language comments. RBIPA shows 84.96% of accuracy in the given Test Dataset which hasa total of 13000 words.
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
Cyberbullying is currently one of the most important research fields. The majority of researchers have contributed to research on bully text identification in English texts or comments, due to the scarcity of data; analyzing Tamil textstemming is frequently a tedious job. Tamil is a morphologically diverse and agglutinative language. The creation of a Tamil stemmer is not an easy undertaking. After examining the major difficulties encountered, proposed the rule-based iterative preprocessing algorithm (RBIPA). In this attempt, Tamil morphemes and lemmas were extracted using the suffix stripping technique and a supervised machine learning algorithm for classify the word based for pronouns and proper nouns. The novelty of proposed system is developing a preprocessing algorithm for iterative stemming; lemmatize process to discovering exact words from the Tamil Language comments. RBIPA shows 84.96% of accuracy in the given Test Dataset which hasa total of 13000 words.
Class Diagram Extraction from Textual Requirements Using NLP Techniquesiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...ijnlc
Machine Translation for Indian languages is an emerging research area. Transliteration is one such module that we design while designing a translation system. Transliteration means mapping of source language text into the target language. Simple mapping decreases the efficiency of overall translation system. We propose the use of stemming and part-of-speech tagging for transliteration. The effectiveness of translation can be improved if we use part-of-speech tagging and stemming assisted transliteration. We have shown that much of the content in Gujarati gets transliterated while being processed for translation to Hindi language.
The Use of Java Swing’s Components to Develop a WidgetWaqas Tariq
Widget is a kind of application provides a single service such as a map, news feed, simple clock, battery-life indicators, etc. This kind of interactive software object has been developed to facilitate user interface (UI) design. A user interface (UI) function may be implemented using different widgets with the same function. In this article, we present the widget as a platform that is generally used in various applications, such as in desktop, web browser, and mobile phone. We also describe a visual menu of Java Swing’s components that will be used to establish widget. It will assume that we have successfully compiled and run a program that uses Swing components.
3D Human Hand Posture Reconstruction Using a Single 2D ImageWaqas Tariq
Passive sensing of the 3D geometric posture of the human hand has been studied extensively over the past decade. However, these research efforts have been hampered by the computational complexity caused by inverse kinematics and 3D reconstruction. In this paper, our objective focuses on 3D hand posture estimation based on a single 2D image with aim of robotic applications. We introduce the human hand model with 27 degrees of freedom (DOFs) and analyze some of its constraints to reduce the DOFs without any significant degradation of performance. A novel algorithm to estimate the 3D hand posture from eight 2D projected feature points is proposed. Experimental results using real images confirm that our algorithm gives good estimates of the 3D hand pose. Keywords: 3D hand posture estimation; Model-based approach; Gesture recognition; human- computer interface; machine vision.
Camera as Mouse and Keyboard for Handicap Person with Troubleshooting Ability...Waqas Tariq
Camera mouse has been widely used for handicap person to interact with computer. The utmost important of the use of camera mouse is must be able to replace all roles of typical mouse and keyboard. It must be able to provide all mouse click events and keyboard functions (include all shortcut keys) when it is used by handicap person. Also, the use of camera mouse must allow users troubleshooting by themselves. Moreover, it must be able to eliminate neck fatigue effect when it is used during long period. In this paper, we propose camera mouse system with timer as left click event and blinking as right click event. Also, we modify original screen keyboard layout by add two additional buttons (button “drag/ drop” is used to do drag and drop of mouse events and another button is used to call task manager (for troubleshooting)) and change behavior of CTRL, ALT, SHIFT, and CAPS LOCK keys in order to provide shortcut keys of keyboard. Also, we develop recovery method which allows users go from camera and then come back again in order to eliminate neck fatigue effect. The experiments which involve several users have been done in our laboratory. The results show that the use of our camera mouse able to allow users do typing, left and right click events, drag and drop events, and troubleshooting without hand. By implement this system, handicap person can use computer more comfortable and reduce the dryness of eyes.
A Proposed Web Accessibility Framework for the Arab DisabledWaqas Tariq
The Web is providing unprecedented access to information and interaction for people with disabilities. This paper presents a Web accessibility framework which offers the ease of the Web accessing for the disabled Arab users and facilitates their lifelong learning as well. The proposed framework system provides the disabled Arab user with an easy means of access using their mother language so they don’t have to overcome the barrier of learning the target-spoken language. This framework is based on analyzing the web page meta-language, extracting its content and reformulating it in a suitable format for the disabled users. The basic objective of this framework is supporting the equal rights of the Arab disabled people for their access to the education and training with non disabled people. Key Words : Arabic Moon code, Arabic Sign Language, Deaf, Deaf-blind, E-learning Interactivity, Moon code, Web accessibility , Web framework , Web System, WWW.
Real Time Blinking Detection Based on Gabor FilterWaqas Tariq
New method of blinking detection is proposed. The utmost important of blinking detections method is robust against different users, noise, and also change of eye shape. In this paper, we propose blinking detections method by measuring of distance between two arcs of eye (upper part and lower part). We detect eye arcs by apply Gabor filter onto eye image. As we know that Gabor filter has advantage on image processing application since it able to extract spatial localized spectral features, such line, arch, and other shape are more easily detected. After two of eye arcs are detected, we measure the distance between both by using connected labeling method. The open eye is marked by the distance between two arcs is more than threshold and otherwise, the closed eye is marked by the distance less than threshold. The experiment result shows that our proposed method robust enough against different users, noise, and eye shape changes with perfectly accuracy.
Computer Input with Human Eyes-Only Using Two Purkinje Images Which Works in ...Waqas Tariq
A method for computer input with human eyes-only using two Purkinje images which works in a real time basis without calibration is proposed. Experimental results shows that cornea curvature can be estimated by using two light sources derived Purkinje images so that no calibration for reducing person-to-person difference of cornea curvature. It is found that the proposed system allows usersf movements of 30 degrees in roll direction and 15 degrees in pitch direction utilizing detected face attitude which is derived from the face plane consisting three feature points on the face, two eyes and nose or mouth. Also it is found that the proposed system does work in a real time basis.
Toward a More Robust Usability concept with Perceived Enjoyment in the contex...Waqas Tariq
Mobile multimedia service is relatively new but has quickly dominated people¡¯s lives, especially among young people. To explain this popularity, this study applies and modifies the Technology Acceptance Model (TAM) to propose a research model and conduct an empirical study. The goal of study is to examine the role of Perceived Enjoyment (PE) and what determinants can contribute to PE in the context of using mobile multimedia service. The result indicates that PE is influencing on Perceived Usefulness (PU) and Perceived Ease of Use (PEOU) and directly Behavior Intention (BI). Aesthetics and flow are key determinants to explain Perceived Enjoyment (PE) in mobile multimedia usage.
Collaborative Learning of Organisational KnolwedgeWaqas Tariq
This paper presents recent research into methods used in Australian Indigenous Knowledge sharing and looks at how these can support the creation of suitable collaborative envi- ronments for timely organisational learning. The protocols and practices as used today and in the past by Indigenous communities are presented and discussed in relation to their relevance to a personalised system of knowledge sharing in modern organisational cultures. This research focuses on user models, knowledge acquisition and integration of data for constructivist learning in a networked repository of or- ganisational knowledge. The data collected in the repository is searched to provide collections of up-to-date and relevant material for training in a work environment. The aim is to improve knowledge collection and sharing in a team envi- ronment. This knowledge can then be collated into a story or workflow that represents the present knowledge in the organisation.
Our research aims to propose a global approach for specification, design and verification of context awareness Human Computer Interface (HCI). This is a Model Based Design approach (MBD). This methodology describes the ubiquitous environment by ontologies. OWL is the standard used for this purpose. The specification and modeling of Human-Computer Interaction are based on Petri nets (PN). This raises the question of representation of Petri nets with XML. We use for this purpose, the standard of modeling PNML. In this paper, we propose an extension of this standard for specification, generation and verification of HCI. This extension is a methodological approach for the construction of PNML with Petri nets. The design principle uses the concept of composition of elementary structures of Petri nets as PNML Modular. The objective is to obtain a valid interface through verification of properties of elementary Petri nets represented with PNML.
Development of Sign Signal Translation System Based on Altera’s FPGA DE2 BoardWaqas Tariq
The main aim of this paper is to build a system that is capable of detecting and recognizing the hand gesture in an image captured by using a camera. The system is built based on Altera’s FPGA DE2 board, which contains a Nios II soft core processor. Image processing techniques and a simple but effective algorithm are implemented to achieve this purpose. Image processing techniques are used to smooth the image in order to ease the subsequent processes in translating the hand sign signal. The algorithm is built for translating the numerical hand sign signal and the result are displayed on the seven segment display. Altera’s Quartus II, SOPC Builder and Nios II EDS software are used to construct the system. By using SOPC Builder, the related components on the DE2 board can be interconnected easily and orderly compared to traditional method that requires lengthy source code and time consuming. Quartus II is used to compile and download the design to the DE2 board. Then, under Nios II EDS, C programming language is used to code the hand sign translation algorithm. Being able to recognize the hand sign signal from images can helps human in controlling a robot and other applications which require only a simple set of instructions provided a CMOS sensor is included in the system.
An overview on Advanced Research Works on Brain-Computer InterfaceWaqas Tariq
A brain–computer interface (BCI) is a proficient result in the research field of human- computer synergy, where direct articulation between brain and an external device occurs resulting in augmenting, assisting and repairing human cognitive. Advanced works like generating brain-computer interface switch technologies for intermittent (or asynchronous) control in natural environments or developing brain-computer interface by Fuzzy logic Systems or by implementing wavelet theory to drive its efficacies are still going on and some useful results has also been found out. The requirements to develop this brain machine interface is also growing day by day i.e. like neuropsychological rehabilitation, emotion control, etc. An overview on the control theory and some advanced works on the field of brain machine interface are shown in this paper.
Exploring the Relationship Between Mobile Phone and Senior Citizens: A Malays...Waqas Tariq
There is growing ageing phenomena with the rise of ageing population throughout the world. According to the World Health Organization (2002), the growing ageing population indicates 694 million, or 223% is expected for people aged 60 and over, since 1970 and 2025.The growth is especially significant in some advanced countries such as North America, Japan, Italy, Germany, United Kingdom and so forth. This growing older adult population has significantly impact the social-culture, lifestyle, healthcare system, economy, infrastructure and government policy of a nation. However, there are limited research studies on the perception and usage of a mobile phone and its service for senior citizens in a developing nation like Malaysia. This paper explores the relationship between mobile phones and senior citizens in Malaysia from the perspective of a developing country. We conducted an exploratory study using contextual interviews with 5 senior citizens of how they perceive their mobile phones. This paper reveals 4 interesting themes from this preliminary study, in addition to the findings of the desirable mobile requirements for local senior citizens with respect of health, safety and communication purposes. The findings of this study bring interesting insight to local telecommunication industries as a whole, and will also serve as groundwork for more in-depth study in the future.
Principles of Good Screen Design in WebsitesWaqas Tariq
Visual techniques for proper arrangement of the elements on the user screen have helped the designers to make the screen look good and attractive. Several visual techniques emphasize the arrangement and ordering of the screen elements based on particular criteria for best appearance of the screen. This paper investigates few significant visual techniques in various web user interfaces and showcases the results for better understanding and their presence.
Virtual teams are used more and more by companies and other organizations to receive benefits. They are a great way to enable teamwork in situations where people are not sitting in the same physical place at the same time. As companies seek to increase the use of virtual teams, a need exists to explore the context of these teams, the virtuality of a team and software that may help these teams working virtualy. Virtual teams have the same basic principles as traditional teams, but there is one big difference. This difference is the way the team members communicate. Instead of using the dynamics of in-office face-to-face exchange, they now rely on special communication channels enabled by modern technologies, such as e-mails, faxes, phone calls and teleconferences, virtual meetings etc. This is why this paper is focused on the issues regarding virtual teams, and how these teams are created and progressing in Albania.
Cognitive Approach Towards the Maintenance of Web-Sites Through Quality Evalu...Waqas Tariq
It is a well established fact that the Web-Applications require frequent maintenance because of cutting– edge business competitions. The authors have worked on quality evaluation of web-site of Indian ecommerce domain. As a result of that work they have made a quality-wise ranking of these sites. According to their work and also the survey done by various other groups Futurebazaar web-site is considered to be one of the best Indian e-shopping sites. In this research paper the authors are assessing the maintenance of the same site by incorporating the problems incurred during this evaluation. This exercise gives a real world maintainability problem of web-sites. This work will give a clear picture of all the quality metrics which are directly or indirectly related with the maintainability of the web-site.
USEFul: A Framework to Mainstream Web Site Usability through Automated Evalua...Waqas Tariq
A paradox has been observed whereby web site usability is proven to be an essential element in a web site, yet at the same time there exist an abundance of web pages with poor usability. This discrepancy is the result of limitations that are currently preventing web developers in the commercial sector from producing usable web sites. In this paper we propose a framework whose objective is to alleviate this problem by automating certain aspects of the usability evaluation process. Mainstreaming comes as a result of automation, therefore enabling a non-expert in the field of usability to conduct the evaluation. This results in reducing the costs associated with such evaluation. Additionally, the framework allows the flexibility of adding, modifying or deleting guidelines without altering the code that references them since the guidelines and the code are two separate components. A comparison of the evaluation results carried out using the framework against published evaluations of web sites carried out by web site usability professionals reveals that the framework is able to automatically identify the majority of usability violations. Due to the consistency with which it evaluates, it identified additional guideline-related violations that were not identified by the human evaluators.
Robot Arm Utilized Having Meal Support System Based on Computer Input by Huma...Waqas Tariq
A robot arm utilized having meal support system based on computer input by human eyes only is proposed. The proposed system is developed for handicap/disabled persons as well as elderly persons and tested with able persons with several shapes and size of eyes under a variety of illumination conditions. The test results with normal persons show the proposed system does work well for selection of the desired foods and for retrieve the foods as appropriate as usersf requirements. It is found that the proposed system is 21% much faster than the manually controlled robotics.
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
Interface on Usability Testing Indonesia Official Tourism WebsiteWaqas Tariq
Ministry of Tourism and Creative Economy of The Republic of Indonesia must meet the wide audience various needs and should reach people from all levels of society around the world to provide Indonesia tourism and travel information. This article will gives the details in the evolution of one important component of Indonesia Official Tourism Website as it has grown in functionality and usefulness over several years of use by a live, unrestricted community. We chose this website to see the website interface design and usability and to popularize Indonesia tourism and travel highlights. The analysis done by looking at the criteria specified for usability testing. Usability testing measures are the ease of use (effectiveness, efficiency, consistency and interface design), easy to learn, errors and syntax which is related to the human computer interaction. The purpose of this article is to test the usability level of the website, analyze the website interface design, and provide suggestions for improvements in Indonesia Official Tourism Website of analysis we have done before.
Monitoring and Visualisation Approach for Collaboration Production Line Envir...Waqas Tariq
In this paper, a tool, called SPMonitor, to monitor and visualize of run-time execution productive processes is proposed. SPMonitor enables dynamically visualizing and monitoring workflows running in a system. It displays versatile information about currently executed workflows providing better understanding about processes and the general functionality of the domain. Moreover, SPMonitor enhances cooperation between different stakeholders by offering extensive communication and problem solving features that allow actors concerned to react more efficiently to different anomalies that may occur during a workflow execution. The ideas discussed are validated through the study of real case related to airbus assembly lines.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Design and Development of a Malayalam to English Translator- A Transfer Based Approach
1. Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 1
Design and Development of a Malayalam to English Translator-
A Transfer Based Approach
Latha R Nair latha5074@gmail.com
Assistant Professor
School of Engineering
Cochin University of Science and Technology
Kochi,Kerala,682022, India
David Peter S davidpeter@cusat.ac.in
Professor
School of Engineering
Cochin University of Science and Technology
Kochi,Kerala,682022,India
Renjith P Ravindran
School of Engineering renjithforever@gmail.com
Cochin University of Science and Technology
Kochi,Kerala,682022,India
Abstract
This paper describes a transfer based scheme for translating Malayalam, a Dravidian language,
to English. The input to the system is a Malayalam sentence and he ouput isits equivalent English
sentence. The system comprises of a preprocessor for splitting the compound words, a
morphological parser for context disambiguation and chunking, a syntactic structure transfer
module and a bilingual dictionary. All the modules are morpheme based to reduce dictionary size.
The system does not rely on a stochastic approach and it is based on a rule-based architecture
along with various linguistic knowledge components of both Malayalam and English. The system
uses two sets of rules: rules for Malayalam morphology and rules for syntactic structure transfer
from Malayalam to English. The system is designed using artificial intelligence techniques and
can easily be modified to build translation systems for other language pairs.
Keywords: Malayalam Language, Transfer Based Approach, Machine Translation,
Morphological Parser.
1. INTRODUCTION
Work in the area of Machine translation in India has been going on for several decades.
Promising translation technology began to emerge by 1970 with the developments in the field of
artificial intelligence and computational linguistics. Machine Translation Systems in certain well-
defined domains have been successfully developed. Translation of gazette notifications, office
memorandums,and circulars has been done successfully by Mantra system developed by centre
for development for advanced computing(CDAC), Pune. Most of the systems developed are for
Hindi, the official language of India. This paper describes a translator for translating sentences in
Malayalam a Dravidian Language to English developed on a rule based architecture combined
with linguistic knowledge components of both Malayalam and English. The system has a
preprocessor for splitting the compound words, morphological parser for context disambiguation
and chunking and a bilingual dictionary. A set of rules for Malayalam morphology and rules for
syntactic structure transfer from Malayalam to English have been incorporated in the system.
Some of the organizations which are involved in the development of translation systems are:
Indian Institute of Technology (Kanpur), Center for Development of Advanced Computing (CDAC)
(Mumbai), CDAC(Pune), Indian Institute of Information Technology (Hyderabad) . They are
2. Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 2
engaged in development of MT systems under projects sponsored by Department of
Electronics, state governments etc. since 1990[1,2]. Research on MT systems between Indian
and foreign languages and also between Indian languages are going on in these institutions.
The two major goals in any translation system development wrk are accuracy of translation and
speed. Accuracy-wise, smart tools for handling transfer grammar and translation standards
including equivalent words, expressions, phrases and styles in the target language are to be
developed. The grammar should be optimized with a view to obtaining a single correct parse and
hence a single translated output. Speed-wise, innovative use of corpus analysis, efficient parsing
algorithm, design of efficient Data Structure and run-time frequency-based rearrangement of the
grammar which substantially reduces the parsing and generation time are required [3]. A fully
automatic Machine translation system should have different modules such as morphological
analyzer, Part of speech tagger, chunker, Named entity recognizer, word sense disambiguator,
syntactic transfer module and target word generator [3]. The different techniques used for
translation differs in the number of modules used and also the way these modules are
implemented. Both rule based and statistical approaches have been tried in the implementation of
each of these modules.
The various approaches used in the MT systems for Indian languages are: Direct machine
translation systems, Rule based systems and Corpus based systems. Rule based systems do not
use any intermediate representation. This is done on a word by word translation using a bilingual
dictionary usually followed by some syntactic arrangement. [4, 5,6] 2) Rule based translation
which produces an intermediate representation, which may be a parse tree or some abstract
representation. The target language text is generated from the intermediate representation. Of
the two rule based methods, Interlingua and transfer based approach, transfer based systems are
more flexible and it can be extended to language pairs in a multilingual environment. The
Interlingua based systems can be used for multilingual translation [7]. The amount of analysis
needed in Interlingua approach is more than that in a transfer based approach. The universal
networking language has been proposed as the Interlingua by the United Nations University for
overcoming the language barrier[8]. Corpus based MT is fully automatic and requires less human
labour than rule based approaches. The disadvantage is that they need sentence aligned parallel
text for each language pair and this method can not be employed where these corpora are not
available [9, 10].
2. PREVIOUS WORK
English to Hindi MT system Mantra, developed by Applied Artificial Intelligence (AAI) group of
CDAC, Bangalore, in 1999 uses transfer based approach. The system translates domain specific
documents in the field of personal administration; specifically gazette notifications, office orders,
office memorandums and circulars. It is based on lexicalized tree adjoining grammar (LTAG) to
represent English and Hindi grammar which are used to parse source English sentences and for
structural transfer from English to Hindi [2]. This system also works well on other language pairs
such as English-Bengali, English-Telugu, English-Gujarati , Hindi-English etc and also between
Indian language pairs such as Hindi-Bengali and Hindi-Punjabi. The Mantra approach is general
but the lexicon and grammar have been limited to the specific domain of personal Administration.
It uses preprocessing tools like phrase marker, named entity recognizer, spell and grammatical
checker. It uses Earley’s style bottom up parsing algorithm for parsing. The system provides
online addition of grammar rule. The system produces multiple translation results in the case of
multiple correct parses.
English to Kannada MT system has been developed at Resource centre for Indian Language
Technology Solutions (RC_ILTS), University of Hyderabad by Dr. K. Narayan Murthy [2]. This
also uses a transfer based approach and it can be applied to the domain of government circulars.
The project is funded by Karnataka government. This system uses Universal Clause Structure
Grammar (UCSG) formalism [15]. The technique is applied to English_ Telugu translation as well.
3. Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 3
Other systems developed using this approach are : Matra- English to Hindi MTS developed by
CDAC, Pune, Sakti- English to Marathi, Hindi and Telugu developed by IISc Bangalore and IIIT
Hyderabad, Anubaad- English to Bengali developed by CDAC, Kolkata, English to Malayalam
MTS developed by Amrita Institute of Technology.
It is found that translation between structurally similar languages like Hindi and Punjabi can be
developed easily than translation systems between Indian languages and English which differ in
the syntactic structure. The proposed translation system translates Malayalam sentences to
English sentences. Since there is a wide difference in English and Malayalam sentences the
system needs an additional modules for parsing and syntactic reordering.
3. DEVELOPMENT AND IMPLEMENTATION OF TRANSFER BASED
MACHINE TRANSLATION SYSTEM
A transfer based MT system has been developed with the following system modules 1. A
preprocessor for splitting the compound words [13] 2. a morphological parser for context
disambiguation and chunking 3. A transfer module which transfers the source language structure
representation to a target language representation. 4. A generation module which generates
target language text using target language structure. Block diagram of the same is shown in fig 1.
The grammar rules for Malayalam and some of the transfer rules for transferring source parse
tree to target parse tree are stored in two separate files. Some of the transfer rules are embedded
in the source code. The sentences stored in a source file are read one by one by the input
module and given to the preprocessor module. The final translated output is stored in another file.
FIGURE 1: Block diagram of a transfer based system
3.1 Compound Word Splitter Module
Morphological variations for words occur in Malayalam due to inflections, derivations and word
compounding. Malayalam is an agglutinative language where words of different syntactic
categories are combined to form a single word. Formation of new words by combining a noun and
a noun, noun and adjective, verb and noun, adverb and verb, adjective and noun and in some
cases all the words of an entire sentence to reflect the semantics of the sentence are very
common. The complexity of compounding in Malayalam language can be understood from the
following example.
The English version being Seetha’s cat ate a rat
4. Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 4
The constituent words in 1 are to be separated before any further processing. Splitting has been
done at morpheme level to reduce dictionary space. The above sentence gets split as shown in 2
morpheme by morpheme translation for the sentence at 2 is :
Seetha ‘s cat a mouse (null) ate
The morpheme sequence will be as in 2 above. The sequence of morphemes is given to the
parser for chunking and word sense disambiguation. The set of inflectional suffixes for nouns and
verbs and derivational suffix for adjectives are based on previous works [11, 12]. Due to the
ambiguity in the splitting rules the system generates multiple splits for the same input sentence
and the split with least number of constituents is fed to parser.
3.2 Parser Module
Parser takes input from the splitter and does the following tasks. It groups the input sequence of
morphemes into chunks [14, 15] and performs word sense disambiguation based on morpheme
tags [16]. The chunking process finds the basic units for tree reordering. The word sense
disambiguation is required as a morpheme can have multiple tags. The parser uses a depth first
approach with backtracking [17]. The output of the parser is a parse tree for the next module. The
parser uses the syntax rules for the morpheme sequences in Malayalam sentences in the regular
expression form. A set syntax rules in the regular expression form are shown below:
1. S-> NP*VP
2. NP-> ADJ*NP | N NA
3. VP ->ADV* V VA| V VA
Rule 1 implies that a simple sentence is a sequence of noun chunks followed by a verb chunk.
Based on the second rule, a noun chunk consists of a set of adjectives followed by a noun and
suffixes like case, gender and number for nouns. According to the third rule a verb chunk consist
of a sequence of adverbs followed by a verb. Only a subset of such rules derived is shown above.
The chunks selected form groups for structural transfer to form target language structure.
A sample sentence and the parse tree generated for the sentence using the grammar rules are
shown below:
Input sentence:
English version: The police thought that the thieves who stole the chain went into forest in the
night.
Output of the splitter:
English version:chain stole theif ‘s night in forest to went that police thought
Output of the parser:
5. Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 5
English version: CS(NC(S(NG(ADJC(S(N (chain ) V(stole) RP) NG(N(theif) PL(‘s))) NG(N(night)
NA(in)) NG(N(forest) NA(to)) V(went )) NCA(that)) S(N(police) V(thought)))
The corresponding parse tree generated is shown in Fig.2
FIGURE 2: Generated Parse tree
3.3 Syntactic Structure Transfer Module
The transfer module transfers the source language structure representation to a target language
representation. This module needs the sub tree rearrangement rules by which the source
6. Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 6
a
b
FIGURE 3: a. Parse tree after structural transfer b. Corresponding parse tree in English
7. Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 7
language sentence syntax tree can be transformed into target language sentence syntax tree.
The system performs most of the commonly needed reordering for Malayalam to English
translation. The tree after reordering for the above Malayalam sentence using the transfer
grammar rules using the transfer rule identified is shown in fig3(a) and its corresponding English
tree is shown in fig3(b).A set of transfer rules used by the system are shown in Table I.
Malayalam
structure
English structure
1 PP: NG P PP: P NG
2 VG: ADV V VG: V ADV
TABLE 1: A set of system transfer rules
According to the first rule the order of case suffix and noun chunk should be interchanged in a
prepositional chunk. The Second rule accords that in verbal chunk the adverb and verb should be
interchanged.
3.4 Target Sentence Generator Module
The generation module generates target language text using target language structure [18]. This
uses inter chunk dependency rules and intra chunk dependency rules. It involves lexical transfer
of verbs, transfer of auxiliary verb for tense, aspect and mood and transfer of gender, number and
person information. A depth first traversal of the target parse tree generates the following English
sentence
Input Malayalam sentence:
Correct English translation: The police thought that the thieves who stole the chain went into the
forest in the night
Sentence generated by the system:
The Police thought that thieves who stole the chain went in night to forest.
3.5 Cross lingual Dictionary
The dictionary includes most of the commonly occurring verbs, nouns, pronouns, adjectives,
inflectional and derivational suffixes, clause suffixes etc. Each entry in the file has three fields: the
root word (morpheme), the morpheme tag and its translation. The verbs in past tense have their
root words stored along with them. Since the system works with morphemes, the space required
for the dictionary is less.
TABLE 2: Lexicon
8. Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 8
Presently the system works for sentences which contain upto two adverbial or adjectival clauses
which is commonly found in Malayalam texts. The system can be modified to handle other
sentences by adding appropriate grammar rules and transfer rules to the rule database. As the
parser is a general parser, it can handle sentences of any depth.
3.6. Implementation and Testing of the System
The system was implemented in Python language and tested with a source file which contains
1000 sentences. The sentences which follow the grammar rules were translated. A group of
results are tabulated in table 3.
4. RESULTS AND DISCUSSION
The system was tested with more than 1000 different kinds of sentences with and without
subordinate clauses which follows the identified morpheme sequences. The system returned
correct meaningful translations in most of the cases. A group of sample input sentences with the
tabulated outputs are shown in table 3 to give a correct picture of the results obtained..In around
20% of sentences the system returned the exact English version of the input sentences. In
balance translations the output sentences were meaningful but had small shortcomings due to
the following reasons:
i) The positioning of articles is not considered.
ii) Many inter chunk and intra chunk dependencies are not considered.
iii) The lexicon stores only the common translation for polysemous words.
The system takes care of word sense disambiguation based on lexical category successfully.
The compound nouns are also not handled by the system as the shallow parser cannot group
them using the current set of rules. The system output can be enhanced including rules which
can take care of the above shortcomings.
5. CONCLUSION
Various MT groups have used different formalisms best suited to their applications. Of them
transfer based systems are more flexible and it can be extended to language pairs in a
multilingual environment. A transfer based MT system has been developed for Malayalam, a
Dravidian Language which comprises of a preprocessor for splitting the compound words, a
morphological parser for context disambiguation and chunking, a syntactic structure transfer
module and a bilingual dictionary. The system was tested successfully for more than 1000
different types of sentences wherein the system returned true results for sentences which contain
two subordinate clauses. Even for sentences with more than two subordinate clauses the system
9. Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 9
TABLE 3: Group of Tabulated results
10. Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 10
returned translated output sentences which could give basic understanding of the input
sentences. More rules can be added to make the system to give exact translation of input
sentences in all cases. Additional modules like finding and replacing collocations, finding and
replacing named entities can also be added to the basic translator. The results obtained are
encouraging. The work can be extended to create a full fledged machine translator from any
Dravidian language to English since they all exhibit structural homogeneity.
6. REFERENCES
[1] P Dubey et al. “Overcoming the Digital Divide through Machine Translation”.
TranslationJournal., Vol.15, 2011, http://translationjournal.net/journal/55mt_india.htm [Dec
12, 2011].
[2] B.K.Murthy, W.R Deshpande ., “Language technology in India: past, present and future”,
1998,
[3] http://www.cicc.or.jp/english/hyoujyunka/mlit3/7-12.html [Dec 11,2011]
[4] S.Lalithadevi, P.Pralayankar , V.Kavitha. ”Translation of Hindi se to Tamil in a MT System”.
Information systems for Indian languages, Berlin Heidelberg: Springer-Verlag, 2011, pp. 246–
249.
[5] V Goyal, G S Lehal. “Advances in Machine Translation Systems”. Language In India, Vol. 9,
No. 11, 2009, pp. 138-150 .
[6] G.S.Josan , G.S. Lehal. “Evaluation of Hindi to Punjabi Machine Translation System”.
International Journal of Computer Science Issues, vol4 no1, 2009, pp 243-257.
[7] V.Goyal, G.S. Lehal . “Web Based Hindi to Punjabi Machine Translation System”. Journal of
Emerging Technologies in Web Intelligence. Vol.2., 2010, pp.148-151.
[8] S.K.Goutam. “The EB-Anubad translator: A hybrid scheme”. Journal of Zhejiang University
Science, Vol.6, 2005, pp.1047-1050.
[9] D.S Parikh P.Bhattacharyya “Interlingua Based English Hindi Machine Translation and
Language Divergence”, Machine Translation , Vol.16, 2001, pp.251-304.
[10]R.M.K. Sinha. “A hybridized EBMT system for Hindi to English Translation”. CSI Journal,
volume 37 no. 4, 2007, pp.3-9.
[11]10. R.M.K. Sinha. “Designing Multi-lingual Machine- Translation System: Some
Perspectives”. International Conference on Machine Learning: Models, Technologies &
Applications (MLMTA 2007), 2007 , pp. 244-249.
[12]11. S..M Idicula, D. S. Peter. “A morphological processor for Malayalam language”.
South Asia Research. vol. 27 (2): 2007, pp.173-186.
[13]12. L. Pandian, T.V.Geetha “Morpheme based Language Model for Tamil Part of Speech
Tagging” Polibits (38) , 2008, pp.19-26 .
[14]13. L.R.Nair, D.S. Peter. “Development of a rule based learning system for splitting
compound words in Malayalam language”. IEEE Recent advances in intelligent and
computational systems(RAICS), 2011, pp.751-755.
11. Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 11
[15]14. L.R.Nair, D.S. Peter, “Shallow parser for Malayalam Language using finite state
cascades”, 4th
international congress on image and signal processing, China, 2011, pp.2464-
2467.
[16]15. S. Abney. “Partial parsing via finite state cascades”. Journal of Natural Language
Engineering, 2(4), 1996, pp. 337-344.
[17]16. D.Jurafsky ,J.H Martin. Speech and natural language processing. India: Pearson
Education, 2000,pp 657-671.
[18]17. E. Rich, K. Knight, S. B Nair. Artificial Intelligence. New Delhi,India: The Tata McGraw
Hill, 2009 pp 295- 300.
[19]18. S.L.Devi, P.Pralayankar. “Verb Transfer in a Tamil to Hindi Machine Translation
System”. International Conference on Asian Language Processing. Harbin, China, 2010,
pp.261-264.