More than hundreds of millions of people of almost all levels of education and attitudes from
different country communicate with each other for different using various languages. Machine
translation is highly demanding due to increasing the usage of web based Communication. One of
the major problem of Bengali translation is identified a naming word from a sentence, which is
relatively simple in English language, because such entities start with a capital letter. In Bangla we
do not have concept of small or capital letters and there is huge no. of different naming entity
available in Bangla. Thus we find difficulties in understanding whether a word is a naming word
(proper noun) or not. Here we have introduced a new approach to identify naming word from a
Bengali sentence for UNL without storing huge no. of naming entity in word dictionary. The goal is
to make possible Bangla sentence conversion to UNL and vice versa with minimal storing word in
dictionary.
A New Approach: Automatically Identify Proper Noun from Bengali Sentence for ...Syeful Islam
More than hundreds of millions of people of almost all levels of education and attitudes from different country communicate with
each other for different using various languages. Machine translation is highly demanding due to increasing the usage of web
based Communication. One of the major problem of Bengali translation is identified a naming word from a sentence, which is
relatively simple in English language, because such entities start with a capital letter. In Bangla we do not have concept of small
or capital letters and there is huge no. of different naming entity available in Bangla. Thus we find difficulties in understanding
whether a word is a proper noun or not. Here we have introduce a new approach to identify proper noun from a Bengali sentence
for UNL without storing huge no. of naming entity in word dictionary. The goal is to make possible Bangla sentence conversion
to UNL and vice versa with minimal storing word in dictionary.
Design Analysis Rules to Identify Proper Noun from Bengali Sentence for Univ...Syeful Islam
Abstract—Now-a-days hundreds of millions of people of
almost all levels of education and attitudes from different
country communicate with each other for different
purposes and perform their jobs on internet or other
communication medium using various languages. Not all
people know all language; therefore it is very difficult to
communicate or works on various languages. In this
situation the computer scientist introduce various inter
language translation program (Machine translation). UNL
is such kind of inter language translation program. One of
the major problem of UNL is identified a name from a
sentence, which is relatively simple in English language,
because such entities start with a capital letter. In Bangla
we do not have concept of small or capital letters. Thus
we find difficulties in understanding whether a word is a
proper noun or not. Here we have proposed analysis rules
to identify proper noun from a sentence and established
post converter which translate the name entity from
Bangla to UNL. The goal is to make possible Bangla
sentence conversion to UNL and vice versa. UNL system
prove that the theoretical analysis of our proposed system
able to identify proper noun from Bangla sentence and
produce relative Universal word for UNL.
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...Syeful Islam
More than hundreds of millions of people of almost all levels of education and attitudes from different country communicate with each other for different purposes using various languages. Machine translation is highly demanding due to increasing the usage of web based Communication. One of the major problem of Bengali translation is identified a naming word from a sentence, which is relatively simple in English language, because such entities start with a capital letter. In Bangla we do not have concept of small or capital letters and there is huge no. of different naming entity available in Bangla. Thus we find difficulties in understanding whether a word is a naming word or not. Here we have introduced a new approach to identify naming word from a Bengali sentence for machine translation system without storing huge no. of naming entity in word dictionary. The goal is to make possible Bangla sentence conversion with minimal storing word in dictionary.
Outlining Bangla Word Dictionary for Universal Networking LanguageIOSR Journals
This document discusses outlining a Bangla word dictionary for the Universal Networking Language (UNL) system. UNL is an artificial language that allows computers to process information across languages. The authors have been working to include Bangla in the UNL system. They describe the format of a UNL dictionary entry, which includes the headword (Bangla word), universal word, and grammatical attributes. Simply searching the UNL knowledge base for universal words is not effective, so they propose finding universal words based on existing translations of Bangla texts to English and their UNL expressions. The goal is to develop a Bangla dictionary to facilitate converting Bangla sentences to UNL format.
Dictionary Entries for Bangla Consonant Ended Roots in Universal Networking L...Waqas Tariq
The Universal Networking Language (UNL) deals with the communication across nations of different languages and involves with many different related discipline such as linguistics, epistemology, computer science etc. It helps to overcome the language barrier among people of different nations to solve problems emerging from current globalization trends and geopolitical interdependence. We are working to include Bangla language in the UNL system so that Bangla language can be converted to UNL expressions. As a part of this process currently we are working on Bangla Consonant Ended Verb Roots and trying to develop lexical or dictionary entries for the Consonant Ended Verb Roots. In this paper, we have presented our work by describing Bangla verb, Verb root, Verbal Inflections and then finally showed the dictionary entries for the consonant ended roots.
People across the globe have access to materials such as journals, articles, adverts etc. via the internet. However
many of these resources come in diverse nature of languages. Although, English language seems most suitable to
most people, some readers do believe that working on materials in one’s native language is more enjoyable than in
other languages. Researches have shown that Arabic language has not been prominent in terms of online materials
and the few existing are most times ignored due to the peculiar nature of its various characters and constructs.
Hence, a proper study of its relationship with English language with a view to bringing people closer to its
understanding becomes necessary. The system scenarios were modeled and implemented using Unified Modeling
Language and Microsoft C# respectively in a way that the expected set of characters of the language of interest was
automatically formed with respect to a given input. The procedural steps were properly followed in the development
and running of the code using Context-Free Rule Based Technique with the availability of hardware required as
clearly described in the design. The system’s workability was tested with different source texts as inputs and in each
case the resulting outputs were very effective with respect to the translation process. The design here is expected to
serve as a tool for assisting beginners in these two languages and so, showcases a one-to-one form of
correspondence, hence, more rules and functions for ensuring a more robust are expected in future works.
Script to Sentiment : on future of Language TechnologyMysore latestJaganadh Gopinadhan
The document discusses developments in the field of human language technology (HLT) and its future applications. It notes that HLT is no longer confined to academia and is becoming integrated into information and communication technology products and services in daily life. The document provides an overview of developments in text and speech processing, including machine translation systems and spell checkers. It also discusses the role of open source tools and frameworks in advancing research in HLT, particularly for Indian languages.
The document discusses the field of computational linguistics, defining it as the scientific study of language from a computational perspective. It involves providing computational models of linguistic phenomena and using computational techniques and linguistic theories to solve problems in natural language processing. Computational linguistics aims to automatically process and understand natural language by constructing computer programs. The field has its roots in the 1940s-1950s with the development of code breaking machines and computers. Major conferences and journals in the field are associated with the Association for Computational Linguistics.
A New Approach: Automatically Identify Proper Noun from Bengali Sentence for ...Syeful Islam
More than hundreds of millions of people of almost all levels of education and attitudes from different country communicate with
each other for different using various languages. Machine translation is highly demanding due to increasing the usage of web
based Communication. One of the major problem of Bengali translation is identified a naming word from a sentence, which is
relatively simple in English language, because such entities start with a capital letter. In Bangla we do not have concept of small
or capital letters and there is huge no. of different naming entity available in Bangla. Thus we find difficulties in understanding
whether a word is a proper noun or not. Here we have introduce a new approach to identify proper noun from a Bengali sentence
for UNL without storing huge no. of naming entity in word dictionary. The goal is to make possible Bangla sentence conversion
to UNL and vice versa with minimal storing word in dictionary.
Design Analysis Rules to Identify Proper Noun from Bengali Sentence for Univ...Syeful Islam
Abstract—Now-a-days hundreds of millions of people of
almost all levels of education and attitudes from different
country communicate with each other for different
purposes and perform their jobs on internet or other
communication medium using various languages. Not all
people know all language; therefore it is very difficult to
communicate or works on various languages. In this
situation the computer scientist introduce various inter
language translation program (Machine translation). UNL
is such kind of inter language translation program. One of
the major problem of UNL is identified a name from a
sentence, which is relatively simple in English language,
because such entities start with a capital letter. In Bangla
we do not have concept of small or capital letters. Thus
we find difficulties in understanding whether a word is a
proper noun or not. Here we have proposed analysis rules
to identify proper noun from a sentence and established
post converter which translate the name entity from
Bangla to UNL. The goal is to make possible Bangla
sentence conversion to UNL and vice versa. UNL system
prove that the theoretical analysis of our proposed system
able to identify proper noun from Bangla sentence and
produce relative Universal word for UNL.
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...Syeful Islam
More than hundreds of millions of people of almost all levels of education and attitudes from different country communicate with each other for different purposes using various languages. Machine translation is highly demanding due to increasing the usage of web based Communication. One of the major problem of Bengali translation is identified a naming word from a sentence, which is relatively simple in English language, because such entities start with a capital letter. In Bangla we do not have concept of small or capital letters and there is huge no. of different naming entity available in Bangla. Thus we find difficulties in understanding whether a word is a naming word or not. Here we have introduced a new approach to identify naming word from a Bengali sentence for machine translation system without storing huge no. of naming entity in word dictionary. The goal is to make possible Bangla sentence conversion with minimal storing word in dictionary.
Outlining Bangla Word Dictionary for Universal Networking LanguageIOSR Journals
This document discusses outlining a Bangla word dictionary for the Universal Networking Language (UNL) system. UNL is an artificial language that allows computers to process information across languages. The authors have been working to include Bangla in the UNL system. They describe the format of a UNL dictionary entry, which includes the headword (Bangla word), universal word, and grammatical attributes. Simply searching the UNL knowledge base for universal words is not effective, so they propose finding universal words based on existing translations of Bangla texts to English and their UNL expressions. The goal is to develop a Bangla dictionary to facilitate converting Bangla sentences to UNL format.
Dictionary Entries for Bangla Consonant Ended Roots in Universal Networking L...Waqas Tariq
The Universal Networking Language (UNL) deals with the communication across nations of different languages and involves with many different related discipline such as linguistics, epistemology, computer science etc. It helps to overcome the language barrier among people of different nations to solve problems emerging from current globalization trends and geopolitical interdependence. We are working to include Bangla language in the UNL system so that Bangla language can be converted to UNL expressions. As a part of this process currently we are working on Bangla Consonant Ended Verb Roots and trying to develop lexical or dictionary entries for the Consonant Ended Verb Roots. In this paper, we have presented our work by describing Bangla verb, Verb root, Verbal Inflections and then finally showed the dictionary entries for the consonant ended roots.
People across the globe have access to materials such as journals, articles, adverts etc. via the internet. However
many of these resources come in diverse nature of languages. Although, English language seems most suitable to
most people, some readers do believe that working on materials in one’s native language is more enjoyable than in
other languages. Researches have shown that Arabic language has not been prominent in terms of online materials
and the few existing are most times ignored due to the peculiar nature of its various characters and constructs.
Hence, a proper study of its relationship with English language with a view to bringing people closer to its
understanding becomes necessary. The system scenarios were modeled and implemented using Unified Modeling
Language and Microsoft C# respectively in a way that the expected set of characters of the language of interest was
automatically formed with respect to a given input. The procedural steps were properly followed in the development
and running of the code using Context-Free Rule Based Technique with the availability of hardware required as
clearly described in the design. The system’s workability was tested with different source texts as inputs and in each
case the resulting outputs were very effective with respect to the translation process. The design here is expected to
serve as a tool for assisting beginners in these two languages and so, showcases a one-to-one form of
correspondence, hence, more rules and functions for ensuring a more robust are expected in future works.
Script to Sentiment : on future of Language TechnologyMysore latestJaganadh Gopinadhan
The document discusses developments in the field of human language technology (HLT) and its future applications. It notes that HLT is no longer confined to academia and is becoming integrated into information and communication technology products and services in daily life. The document provides an overview of developments in text and speech processing, including machine translation systems and spell checkers. It also discusses the role of open source tools and frameworks in advancing research in HLT, particularly for Indian languages.
The document discusses the field of computational linguistics, defining it as the scientific study of language from a computational perspective. It involves providing computational models of linguistic phenomena and using computational techniques and linguistic theories to solve problems in natural language processing. Computational linguistics aims to automatically process and understand natural language by constructing computer programs. The field has its roots in the 1940s-1950s with the development of code breaking machines and computers. Major conferences and journals in the field are associated with the Association for Computational Linguistics.
This document discusses computational linguistics, including its origins, main application areas, and approaches. Computational linguistics originated from efforts in the 1950s to use computers for machine translation between languages. It aims to develop natural language processing applications like machine translation, speech recognition, and grammar checking. Research employs various approaches including developmental, structural, production-focused, and comprehension-focused methods. The field involves both computer science and linguistics.
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
Phonetic typing using the English alphabet has become widely popular nowadays for social media and chat services. As a result, a text containing various English and Bangla words and phrases has become increasingly common. Existing transliteration tools display poor performance for such texts. This paper proposes a robust Three-stage Hybrid Transliteration (THT) framework that can transliterate both English words and phonetic typed Bangla words satisfactorily. This is achieved by adopting a hybrid approach of dictionary-based and rule-based techniques. Experimental results confirm superiority of THT as it significantly outperforms the benchmark transliteration tool.
Students attitude towards teachers code switching code mixingSamar Rukh
This document summarizes a study that examined business students' attitudes toward their teachers' use of code-switching between English and the local language (Urdu) in class, and the impact of this on the students' English language learning. Quantitative and qualitative data was collected through questionnaires given to 100 business students and 6 English teachers at universities in Sargodha, Pakistan. The findings from the student questionnaire showed that most students had a positive attitude toward their teachers' code-switching and believed it helped their understanding and strengthened their English. The teacher questionnaire explored the teachers' views, with most believing code-switching facilitated clearer communication and instruction. In conclusion, the study found that business students generally viewed teachers' code-switch
ARABIC LANGUAGE CHALLENGES IN TEXT BASED CONVERSATIONAL AGENTS COMPARED TO TH...ijcsit
This document discusses the challenges of building conversational agents (CAs) in Arabic compared to English. It outlines three main approaches to building CAs - natural language processing, sentence similarity measures, and pattern matching - and explores how each approach presents different challenges for Arabic versus English. Some key challenges for Arabic include its complex morphology system involving roots, affixes and patterns; omission of short vowels leading to ambiguity; and diglossia between modern standardized Arabic, classical Arabic, and various dialects. The document argues these features make it harder to understand and analyze user utterances in Arabic CAs compared to English CAs.
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...kevig
This study proposes a method to develop neural models of the morphological analyzer for Japanese Hiragana sentences using the Bi-LSTM CRF model. Morphological analysis is a technique that divides text data into words and assigns information such as parts of speech. In Japanese natural language processing systems, this technique plays an essential role in downstream applications because the Japanese language does not have word delimiters between words. Hiragana is a type of Japanese phonogramic characters, which is used for texts for children or people who cannot read Chinese characters. Morphological analysis of Hiragana sentences is more difficult than that of ordinary Japanese sentences because there is less information for dividing. For morphological analysis of Hiragana sentences, we demonstrated the effectiveness of fine-tuning using a model based on ordinary Japanese text and examined the influence of training data on texts of various genres.
MACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGEJomy Jose
This document discusses machine translation with reference to the Malayalam language. It notes that while Google Translate provides services for many languages, it does not currently support Malayalam, which is spoken by 38 million people in Kerala, India. The document reviews literature on machine translation efforts for Indian languages and discusses the need for machine translation between languages in India given the government's use of English and Hindi alongside state languages. It provides background on the Malayalam language and compares the languages supported by Google Translate to their numbers of native speakers.
Hidden markov model based part of speech tagger for sinhala languageijnlc
In this paper we present a fundamental lexical semantics of Sinhala language and a Hidden Markov Model (HMM) based Part of Speech (POS) Tagger for Sinhala language. In any Natural Language processing task, Part of Speech is a very vital topic, which involves analysing of the construction, behaviour and the dynamics of the language, which the knowledge could utilized in computational linguistics analysis and automation applications. Though Sinhala is a morphologically rich and agglutinative language, in which words are inflected with various grammatical features, tagging is very essential for further analysis of the language. Our research is based on statistical based approach, in which the tagging process is done by computing the tag sequence probability and the word-likelihood probability from the given corpus, where the linguistic knowledge is automatically extracted from the annotated corpus. The current tagger could reach more than 90% of accuracy for known words.
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.
Computational linguistics originated in the 1950s with a focus on machine translation. It aims to use computers to process and understand human language. There are two main goals - theoretical, to better understand how humans use language; and practical, to build tools like machine translation. It draws from linguistics, computer science, psychology, and logic. Some key applications include spelling/grammar checkers, style checkers, information retrieval systems, and computer-assisted language learning tools. However, natural language processing poses challenges due to issues with phonology, morphology, syntax, semantics, and pragmatics.
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
This document provides an introduction to machine translation and different approaches to machine translation. It discusses the history of machine translation, beginning in the 1950s. It then describes four main approaches to machine translation: direct machine translation, rule-based machine translation, corpus-based machine translation, and knowledge-based machine translation. For each approach, it provides a brief overview and example. It focuses in more depth on direct machine translation and rule-based machine translation, explaining their process and limitations.
Lecture 1: Semantic Analysis in Language TechnologyMarina Santini
This document provides an introduction to a course on semantic analysis in language technology taught at Uppsala University in Sweden. It outlines the course website, contact information for the instructor, intended learning outcomes, required readings, assignments and examination. The course focuses on applying semantic analysis methods in natural language processing tasks like sentiment analysis, information extraction, word sense disambiguation and predicate-argument extraction. It will introduce students to representing and modeling meaning in language through formal logics and semantic frameworks.
The main-principles-of-text-to-speech-synthesis-systemCemal Ardil
This document discusses text-to-speech synthesis systems. It provides background on the history and development of such systems over three generations from 1962 to the present. It describes some of the main challenges in developing speech synthesis for different languages. The document then focuses on specifics of the Azerbaijani language and outlines the approach used in the text-to-speech synthesis system developed by the authors, which combines concatenative synthesis and formant synthesis methods.
This document defines and describes various terms and concepts related to corpus linguistics and natural language processing. It defines acronyms for various corpora and projects. It also defines key concepts like alignment, annotation, ambiguity, balanced corpora, concordancing, part-of-speech tagging, and probabilistic tagging using n-grams.
The document discusses natural language processing (NLP) for Tamil to Hindi conversion. It introduces the Universal Networking Language (UNL) as an intermediate representation to express information across languages. UNL allows text to be converted to different languages like converting a webpage to various natural languages. The document then discusses the advantages of developing machine translation between Tamil and other languages, particularly English and Hindi. It outlines the components needed for a Tamil-Hindi machine translation system, including morphological analyzers for Tamil and Hindi, a word mapping unit, and generators.
Computational linguistics is defined as the scientific study and engineering of language from a computational perspective. It originated in the 1950s with the goal of using computers to automatically translate texts between languages. There are two main approaches: rule-based systems which explicitly encode linguistic rules and data-driven systems which use statistical and machine learning methods on large datasets. Computational linguistics is applied in many areas including machine translation, speech recognition, natural language interfaces, and information extraction from text documents.
Development of analysis rules to identify proper noun from bengali sentence f...Syeful Islam
- Today the regional economies, societies, cultures and educations are integrated through a globe-spanning network of communication and trade.
- This globalization trend evokes for a homogeneous platform so that each member of the platform can apprehend what other intimates and perpetuates the discussion in a mellifluous way.
- However the barriers of languages throughout the world are continuously obviating the whole world from congregating into a single domain of sharing knowledge and information.
- Therefore researcher works on various languages and tries to give a platform where multi lingual people can communicate through their native language.
- Researcher analyze the language structure and form structural grammar and rules which used to translate one language to other.
- From the last few years several language-specific translation systems have been proposed.
- Since these systems are based on specific source and target languages, these have their own limitations.
- As a consequence United Nations University/Institute of Advanced Studies (UNU/IAS) were decided to develop an inter-language translation program .
- The corollary of their continuous research leads a common form of languages known as Universal Networking Language (UNL) and introduces UNL system.
- UNL system is an initiative to overcome the problem of language pairs in automated translation. UNL is an artificial language that is based on Interlingua approach.UNL acts as an intermediate form computer semantic language whereby any text written in a particular language is converted to text of any other forms of languages.
- UNL system consists of major three components:
- language resources
- software for processing language resources (parser) and
- supporting tools for maintaining and operating language processing software or developing language resources.
- The parser of UNL system take input sentence and start parsing based on rules and convert it into corresponding universal word from word dictionary.
- The challenge in detection of named is that such expressions are hard to analyze using UNL because they belong to the open class of expressions, i.e., there is an infinite variety and new expressions are constantly being invented.
- Bengali is the seventh popular language in the world, second in India and the national language of Bangladesh.
- So this is an important problem since search queries on UNL dictionary for proper nouns while all proper nouns(names) cannot be exhaustively maintained in the dictionary for automatic identification.
In this research project we do this task , Proper noun detection and conversion.
The document discusses AIESEC's official expansion at Prasetiya Mulya Business School. It includes greetings from the Travelism Account Manager, the Local Committee President, and Advisor of AIESEC at the business school. The Student Engagement Manager of the undergraduate business school acknowledges the correspondence.
This document contains information about 39 individuals, including their name, native place, date of birth, age, height, education, job, marital status, address and phone numbers. For each person, there is an entry with these details. The document appears to be listing participants or members of some group organized by BHATIYA PARICHAY MILAN in Rajkot, India.
This document discusses computational linguistics, including its origins, main application areas, and approaches. Computational linguistics originated from efforts in the 1950s to use computers for machine translation between languages. It aims to develop natural language processing applications like machine translation, speech recognition, and grammar checking. Research employs various approaches including developmental, structural, production-focused, and comprehension-focused methods. The field involves both computer science and linguistics.
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
Phonetic typing using the English alphabet has become widely popular nowadays for social media and chat services. As a result, a text containing various English and Bangla words and phrases has become increasingly common. Existing transliteration tools display poor performance for such texts. This paper proposes a robust Three-stage Hybrid Transliteration (THT) framework that can transliterate both English words and phonetic typed Bangla words satisfactorily. This is achieved by adopting a hybrid approach of dictionary-based and rule-based techniques. Experimental results confirm superiority of THT as it significantly outperforms the benchmark transliteration tool.
Students attitude towards teachers code switching code mixingSamar Rukh
This document summarizes a study that examined business students' attitudes toward their teachers' use of code-switching between English and the local language (Urdu) in class, and the impact of this on the students' English language learning. Quantitative and qualitative data was collected through questionnaires given to 100 business students and 6 English teachers at universities in Sargodha, Pakistan. The findings from the student questionnaire showed that most students had a positive attitude toward their teachers' code-switching and believed it helped their understanding and strengthened their English. The teacher questionnaire explored the teachers' views, with most believing code-switching facilitated clearer communication and instruction. In conclusion, the study found that business students generally viewed teachers' code-switch
ARABIC LANGUAGE CHALLENGES IN TEXT BASED CONVERSATIONAL AGENTS COMPARED TO TH...ijcsit
This document discusses the challenges of building conversational agents (CAs) in Arabic compared to English. It outlines three main approaches to building CAs - natural language processing, sentence similarity measures, and pattern matching - and explores how each approach presents different challenges for Arabic versus English. Some key challenges for Arabic include its complex morphology system involving roots, affixes and patterns; omission of short vowels leading to ambiguity; and diglossia between modern standardized Arabic, classical Arabic, and various dialects. The document argues these features make it harder to understand and analyze user utterances in Arabic CAs compared to English CAs.
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...kevig
This study proposes a method to develop neural models of the morphological analyzer for Japanese Hiragana sentences using the Bi-LSTM CRF model. Morphological analysis is a technique that divides text data into words and assigns information such as parts of speech. In Japanese natural language processing systems, this technique plays an essential role in downstream applications because the Japanese language does not have word delimiters between words. Hiragana is a type of Japanese phonogramic characters, which is used for texts for children or people who cannot read Chinese characters. Morphological analysis of Hiragana sentences is more difficult than that of ordinary Japanese sentences because there is less information for dividing. For morphological analysis of Hiragana sentences, we demonstrated the effectiveness of fine-tuning using a model based on ordinary Japanese text and examined the influence of training data on texts of various genres.
MACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGEJomy Jose
This document discusses machine translation with reference to the Malayalam language. It notes that while Google Translate provides services for many languages, it does not currently support Malayalam, which is spoken by 38 million people in Kerala, India. The document reviews literature on machine translation efforts for Indian languages and discusses the need for machine translation between languages in India given the government's use of English and Hindi alongside state languages. It provides background on the Malayalam language and compares the languages supported by Google Translate to their numbers of native speakers.
Hidden markov model based part of speech tagger for sinhala languageijnlc
In this paper we present a fundamental lexical semantics of Sinhala language and a Hidden Markov Model (HMM) based Part of Speech (POS) Tagger for Sinhala language. In any Natural Language processing task, Part of Speech is a very vital topic, which involves analysing of the construction, behaviour and the dynamics of the language, which the knowledge could utilized in computational linguistics analysis and automation applications. Though Sinhala is a morphologically rich and agglutinative language, in which words are inflected with various grammatical features, tagging is very essential for further analysis of the language. Our research is based on statistical based approach, in which the tagging process is done by computing the tag sequence probability and the word-likelihood probability from the given corpus, where the linguistic knowledge is automatically extracted from the annotated corpus. The current tagger could reach more than 90% of accuracy for known words.
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.
Computational linguistics originated in the 1950s with a focus on machine translation. It aims to use computers to process and understand human language. There are two main goals - theoretical, to better understand how humans use language; and practical, to build tools like machine translation. It draws from linguistics, computer science, psychology, and logic. Some key applications include spelling/grammar checkers, style checkers, information retrieval systems, and computer-assisted language learning tools. However, natural language processing poses challenges due to issues with phonology, morphology, syntax, semantics, and pragmatics.
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
This document provides an introduction to machine translation and different approaches to machine translation. It discusses the history of machine translation, beginning in the 1950s. It then describes four main approaches to machine translation: direct machine translation, rule-based machine translation, corpus-based machine translation, and knowledge-based machine translation. For each approach, it provides a brief overview and example. It focuses in more depth on direct machine translation and rule-based machine translation, explaining their process and limitations.
Lecture 1: Semantic Analysis in Language TechnologyMarina Santini
This document provides an introduction to a course on semantic analysis in language technology taught at Uppsala University in Sweden. It outlines the course website, contact information for the instructor, intended learning outcomes, required readings, assignments and examination. The course focuses on applying semantic analysis methods in natural language processing tasks like sentiment analysis, information extraction, word sense disambiguation and predicate-argument extraction. It will introduce students to representing and modeling meaning in language through formal logics and semantic frameworks.
The main-principles-of-text-to-speech-synthesis-systemCemal Ardil
This document discusses text-to-speech synthesis systems. It provides background on the history and development of such systems over three generations from 1962 to the present. It describes some of the main challenges in developing speech synthesis for different languages. The document then focuses on specifics of the Azerbaijani language and outlines the approach used in the text-to-speech synthesis system developed by the authors, which combines concatenative synthesis and formant synthesis methods.
This document defines and describes various terms and concepts related to corpus linguistics and natural language processing. It defines acronyms for various corpora and projects. It also defines key concepts like alignment, annotation, ambiguity, balanced corpora, concordancing, part-of-speech tagging, and probabilistic tagging using n-grams.
The document discusses natural language processing (NLP) for Tamil to Hindi conversion. It introduces the Universal Networking Language (UNL) as an intermediate representation to express information across languages. UNL allows text to be converted to different languages like converting a webpage to various natural languages. The document then discusses the advantages of developing machine translation between Tamil and other languages, particularly English and Hindi. It outlines the components needed for a Tamil-Hindi machine translation system, including morphological analyzers for Tamil and Hindi, a word mapping unit, and generators.
Computational linguistics is defined as the scientific study and engineering of language from a computational perspective. It originated in the 1950s with the goal of using computers to automatically translate texts between languages. There are two main approaches: rule-based systems which explicitly encode linguistic rules and data-driven systems which use statistical and machine learning methods on large datasets. Computational linguistics is applied in many areas including machine translation, speech recognition, natural language interfaces, and information extraction from text documents.
Development of analysis rules to identify proper noun from bengali sentence f...Syeful Islam
- Today the regional economies, societies, cultures and educations are integrated through a globe-spanning network of communication and trade.
- This globalization trend evokes for a homogeneous platform so that each member of the platform can apprehend what other intimates and perpetuates the discussion in a mellifluous way.
- However the barriers of languages throughout the world are continuously obviating the whole world from congregating into a single domain of sharing knowledge and information.
- Therefore researcher works on various languages and tries to give a platform where multi lingual people can communicate through their native language.
- Researcher analyze the language structure and form structural grammar and rules which used to translate one language to other.
- From the last few years several language-specific translation systems have been proposed.
- Since these systems are based on specific source and target languages, these have their own limitations.
- As a consequence United Nations University/Institute of Advanced Studies (UNU/IAS) were decided to develop an inter-language translation program .
- The corollary of their continuous research leads a common form of languages known as Universal Networking Language (UNL) and introduces UNL system.
- UNL system is an initiative to overcome the problem of language pairs in automated translation. UNL is an artificial language that is based on Interlingua approach.UNL acts as an intermediate form computer semantic language whereby any text written in a particular language is converted to text of any other forms of languages.
- UNL system consists of major three components:
- language resources
- software for processing language resources (parser) and
- supporting tools for maintaining and operating language processing software or developing language resources.
- The parser of UNL system take input sentence and start parsing based on rules and convert it into corresponding universal word from word dictionary.
- The challenge in detection of named is that such expressions are hard to analyze using UNL because they belong to the open class of expressions, i.e., there is an infinite variety and new expressions are constantly being invented.
- Bengali is the seventh popular language in the world, second in India and the national language of Bangladesh.
- So this is an important problem since search queries on UNL dictionary for proper nouns while all proper nouns(names) cannot be exhaustively maintained in the dictionary for automatic identification.
In this research project we do this task , Proper noun detection and conversion.
The document discusses AIESEC's official expansion at Prasetiya Mulya Business School. It includes greetings from the Travelism Account Manager, the Local Committee President, and Advisor of AIESEC at the business school. The Student Engagement Manager of the undergraduate business school acknowledges the correspondence.
This document contains information about 39 individuals, including their name, native place, date of birth, age, height, education, job, marital status, address and phone numbers. For each person, there is an entry with these details. The document appears to be listing participants or members of some group organized by BHATIYA PARICHAY MILAN in Rajkot, India.
Nora is introduced as the main character. The document provides a brief introduction to Nora by stating her name but does not include any other descriptive details about her. The summary is limited since the original document only includes Nora's name without any other context or information.
Looking back at your Preliminary task, what do you feel you have learnt in th...salesian2014as
The document discusses the progression in filmmaking skills from the preliminary task to the final product. Areas of improvement included camera work through better composition and use of rules of thirds. Editing improved through invisible cuts, extended pre-recording, and developing the narrative. Music and mise en scene were not utilized in the preliminary but added important elements to the final film. Better planning and production scheduling led to stronger time management.
This document summarizes key aspects of the industrialization of America in the late 19th century. It discusses how technological innovations like electricity drove mass production. It also describes how railroads transformed into large corporations through government assistance and cut-throat practices. As industry grew, cities expanded rapidly as immigrants and rural residents were drawn to new job opportunities. Within cities, immigrant neighborhoods clustered near factories while the wealthy lived in new suburbs. Long workdays in dangerous factories employed many immigrants and children in repetitive jobs. As working conditions deteriorated, laborers increasingly organized strikes and protests to improve wages and hours.
This document discusses the conventions and themes commonly found in gangster-crime genre media. It identifies stock narratives around corrupt police, revenge, crime families, and star-crossed lovers. Common stock characters are mentioned like mafia bosses, recruits, rivals, and hoodlums. Settings often include cities, nightclubs, mansions, and casinos. Iconography may include drugs, crime, money, suits, fast cars, and guns. The document then evaluates how the described media product did or did not use, develop, or challenge these conventions.
Question 1 media studies evaluation mark final salesian2014as
The document discusses how the media product uses conventions of horror films. It creates the opening two minutes of a horror film set in modern-day Surrey. It uses stock narratives, characters, and settings common to the genre. These include an innocent family, a possessed child antagonist, and ordinary suburban settings. The narrative combines elements of mystery and a normal family. Video cameras are used to make it seem more realistic. Negative space is utilized but wasn't entirely effective at startling viewers so a "sting" was added to draw attention.
This document provides a summary of the status of drawings and documents submitted for the project to upgrade the Wadi Bani Auf Primary Substation in Oman. It includes a table listing 31 civil drawings with their planned and actual status mostly marked as approved. Another table lists 15 electrical drawings and their status, with some approved with comments. A third table lists 13 electrical calculation documents and their partial status. Finally, it provides an overall summary of the total planned and actual quantities and remarks.
The document discusses and debunks 5 common myths about hearing aids. Myth 1 is that hearing aids are not effective, but a study found they significantly improved speech recognition and reduced communication problems. Myth 2 is that hearing aids are big and bulky, but they now come in various small, nearly invisible sizes. Myth 3 is that hearing aids are too expensive, but they range in price depending on needs and features. Myth 4 is that hearing aids can be bought online more cheaply, but they need to be professionally fitted to individual hearing loss. Myth 5 is that hearing aids are uncomfortable and hard to use, but digital aids are easy to use and can even be controlled by phone.
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...Franck Michel
Presentateion of a collective article we submited at the First Semantic Web for Scientific History workshop (SW4SH) co-located with ESWC 2015.
Link to the article: https://hal.archives-ouvertes.fr/hal-01146638v1
Gedung I TK Al-Amin beroperasi sejak 16 Juni 2009. Jumlah murid meningkat sehingga warung nasi diubah menjadi Gedung II TK Al-Amin untuk menampung peningkatan jumlah siswa.
Guadalajara was founded by the Andalusians in the 8th century and was an important city for the powerful Mendoza family during the 14th century. The city saw significant damage during the Spanish Civil War from 1936-1939 but has since rebuilt. Nowadays, Guadalajara is a popular day trip destination from Madrid known for its historic architecture along Calle Mayor, the Infantado Palace built by the Mendoza family, and green spaces like Concord Park.
Syed Kamran Raza Trimzi is an organized and highly motivated individual with excellent communication skills. He has over 15 years of experience in project management, quality management, and student support services. His career includes roles managing projects at nonprofit organizations in Pakistan and the UK, as well as roles in technical analysis, computer operations, and IT management. He has strong skills in strategic thinking, decision-making, management, and communication.
Trees are tall woody plants that grow from seeds and develop roots, stems, leaves and flowers. They produce oxygen and have historically provided humans with essentials like food, shelter, medicine and tools. Today, trees continue to be important for modern lifestyles but many are cut down for business and infrastructure, decreasing forests and contributing to problems like pollution and floods.
El documento compara los dispositivos instalados en motores diésel y vehículos a gasolina para controlar las emisiones. Los motores diésel utilizan filtros de partículas y sistemas de reducción catalítica selectiva, mientras que los vehículos a gasolina usan catalizadores de tres vías y sistemas de diagnóstico a bordo.
Dokumen tersebut merangkum ciri-ciri dan klasifikasi hewan moluska. Moluska memiliki tubuh lunak dan habitat di air laut, air tawar, dan darat. Mereka diklasifikasikan ke dalam lima kelas berdasarkan struktur tubuhnya, yaitu Gastropoda, Cephalopoda, Bivalvia, Scaphopoda, dan Polyplacophora. Gastropoda adalah kelas moluska berkaki tunggal dan dapat berenang atau melata, sedangkan Cephal
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...ijnlc
The document discusses the implementation of a natural language interface (NLization) framework for Punjabi using the EUGENE system. It focuses on generating Punjabi sentences from UNL representations for verbs, pronouns, and determiners. Rules and dictionaries are added to EUGENE to analyze the UNL syntax and semantics and output corresponding Punjabi sentences without human intervention. Examples are provided to illustrate the NLization process for sentences containing different parts of speech.
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...ijnlc
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translation
result based on huge amount of aligned parallel text corpora in both the source and target languages.
Although Bodo is a recognized natural language of India and co-official languages of Assam, still the
machine readable information of Bodo language is very low. Therefore, to expand the computerized
information of the language, English to Bodo SMT system has been developed. But this paper mainly
focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using
Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator
tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora.
Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques
in the SMT system.
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...kevig
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translation result based on huge amount of aligned parallel text corpora in both the source and target languages. Although Bodo is a recognized natural language of India and co-official languages of Assam, still the machine readable information of Bodo language is very low. Therefore, to expand the computerized information of the language, English to Bodo SMT system has been developed. But this paper mainly focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora. Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques in the SMT system.
A Survey Of Current Datasets For Code-Switching ResearchJim Webb
This document provides a survey of existing datasets for code-switching research. It begins by defining code-switching and discussing its increased prevalence in social media interactions. It then proposes quality metrics for evaluating code-switching datasets, including number of words, vocabulary size, number of sentences, and average sentence length. The document reviews available datasets categorized by common NLP tasks like language identification, named entity recognition, sentiment analysis, and machine translation. Several datasets for language pairs like English-Hindi, Spanish-English, and Mandarin-English are discussed. In conclusion, the survey finds that while interest in code-switching research is growing, availability of suitable annotated datasets remains limited.
We start with a linguistic discussion of language, its properties, and the study of language in philosophy and linguistics. We then investigate natural languages, controlled languages, and artificial languages to emphasise the human ability to control and construct languages. At the end, we arrive at the notion of software languages as means to communicate software between people.
Summer Research Project (Anusaaraka) ReportAnwar Jameel
This document discusses Anusaaraka, a machine translation tool being developed to translate between English and Hindi. It uses principles from Panini's grammar to map word groups and constructions between the languages. Where differences exist, extra notation is added to preserve source language information. The output is presented in layers to show the translation process. It aims to bridge the language barrier by allowing users to access text in their preferred Indian language.
Key features include faithfully representing the source text, reversibility of the translation process through layered output, and transparency by allowing users to trace the translation steps. It was developed by combining traditional Indian linguistic principles with modern technologies.
Development of Bi-Directional English To Yoruba Translator for Real-Time Mobi...CSCJournals
- The document describes the development of a bi-directional English to Yoruba translator for real-time mobile chatting.
- It uses a hybrid machine translation approach combining rule-based and corpus-based methods. Parallel corpora and computational rules were developed and stored in databases. Dictionaries containing words in both languages and their parts of speech were also created.
- The system was implemented using PHP for server-side scripting, JSON for data interchange, and an Android app for the user interface. Testing showed 95% accuracy compared to Google Translate.
An Android Communication Platform between Hearing Impaired and General PeopleAfif Bin Kamrul
This document proposes developing an Android application to facilitate communication between hearing impaired and general people in Bangladesh through Bangla speech-to-sign language and sign language-to-text translation. It aims to recognize Bangla speech and convert it to animated sign language displays and develop a sign language keyboard to type Bangla text. The methodology involves using speech recognition APIs to convert speech to text, tagging parts of speech, looking up signs from an animated database, and displaying them sequentially via a virtual agent. It will also design a keyboard with buttons for Bangla characters and their corresponding signs.
Syracuse UniversitySURFACEThe School of Information Studie.docxdeanmtaylor1545
Syracuse University
SURFACE
The School of Information Studies Faculty
Scholarship
School of Information Studies (iSchool)
2001
Natural Language Processing
Elizabeth D. Liddy
Syracuse University, [email protected]
Follow this and additional works at: http://surface.syr.edu/istpub
Part of the Library and Information Science Commons, and the Linguistics Commons
This Book Chapter is brought to you for free and open access by the School of Information Studies (iSchool) at SURFACE. It has been accepted for
inclusion in The School of Information Studies Faculty Scholarship by an authorized administrator of SURFACE. For more information, please contact
[email protected]
Recommended Citation
Liddy, E.D. 2001. Natural Language Processing. In Encyclopedia of Library and Information Science, 2nd Ed. NY. Marcel Decker, Inc.
http://surface.syr.edu?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://surface.syr.edu/istpub?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://surface.syr.edu/istpub?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://surface.syr.edu/ischool?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://surface.syr.edu/istpub?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://network.bepress.com/hgg/discipline/1018?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://network.bepress.com/hgg/discipline/371?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
mailto:[email protected]
Natural Language Processing
1
INTRODUCTION
Natural Language Processing (NLP) is the computerized approach to analyzing text that
is based on both a set of theories and a set of technologies. And, being a very active area
of research and development, there is not a single agreed-upon definition that would
satisfy everyone, but there are some aspects, which would be part of any knowledgeable
person’s definition. The definition I offer is:
Definition: Natural Language Processing is a theoretically motivated range of
computational techniques for analyzing and representing naturally occurring texts
at one or more levels of linguistic analysis for the purpose of achieving human-like
language processing for a range of tasks or applications.
Several elements of this definition can be further detailed. Firstly the imprecise notion of
‘range of computational techniques’ is necessary because there are multiple methods or
techniques from which to choose to accomplish a particular type of language analysis.
‘Naturally occurring texts’ can be of any language, mode, genre, etc. The texts can be
oral or written. The only requirement is that they be in a language used by humans to
communicate to one another. Also, the text being analyzed should not be specifically
constru.
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
Phonetic typing using the English alphabet has become widely popular nowadays for social media and chat
services. As a result, a text containing various English and Bangla words and phrases has become
increasingly common. Existing transliteration tools display poor performance for such texts. This paper
proposes a robust Three-stage Hybrid Transliteration (THT) framework that can transliterate both English
words and phonetic typed Bangla words satisfactorily. This is achieved by adopting a hybrid approach of
dictionary-based and rule-based techniques. Experimental results confirm superiority of THT as it
significantly outperforms the benchmark transliteration tool.
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURESmlaij
The proposed methodology presented in the paper deals with solving the problem of multilingual speech
recognition. Current text and speech recognition and translation methods have a very low accuracy in
translating sentences which contain a mixture of two or more different languages. The paper proposes a
novel approach to tackling this problem and highlights some of the drawbacks of current recognition and
translation methods.
Design and Development of Morphological Analyzer for Tigrigna Verbs using Hyb...kevig
Morphological analyzer is the basic for various high level NLP applications such as information retrieval, spell checking, grammar checking, machine translation, speech recognition, POS tagging and automatic sentence construction. This paper is carefully designed for design and analysis of morphological analyzer Tigrigna verbs using hybrid of memory learning and rules based approaches. The experiment have conducted using python 3 where TiMBL algorithms IB2 and TRIBL2, and Finite State Transducer rules are used. The performance of the system has been evaluated using 10 fold cross validation technique. Testing conducted using optimized parameter settings for regular verbs and linguistic rules of the Tigrigna language allomorph and phonology for the irregular verbs. The accuracy on the memory based approach with optimized parameters of TiMBL algorithm IB2 and TRIBL2 was 93.24% and 92.31%, respectively. Finally, the hybrid approach had the actual performance of 95.6% using linguistic rules for handling irregular and copula verbs.
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEijnlc
Manipuri is both a minority and morphologically rich language with genetic features similar to Tibeto Burman languages. It has Subject-Object-Verb (SOV) order, agglutinative verb morphology and is monosyllabic. Morphology and syntax are not clearly distinguished in this language. Natural Language
Processing (NLP) is a useful research field of computer science that deals with processing of a large amount of natural language corpus. The NLP applications encompass E-Dictionary, Morphological Analyzer, Reduplicated Multi-Word Expression (RMWE), Named Entity Recognition (NER), Part of Speech
(POS) Tagging, Machine Translation (MT), Word Net, Word Sense Disambiguation (WSD) etc. In this paper, we present a study on the advancements in NLP applications for Manipuri language, at the same time presenting a comparison table of the approaches and techniques adopted and the results obtained of each of the applications followed by a detail discussion of each work.
DESIGN AND DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR TIGRIGNA VERBS USING HYB...kevig
Morphological analyzer is the base for various high-level NLP applications such as information retrieval,
spell checking, grammar checking, machine translation, speech recognition, POS tagging and automatic
sentence construction. This paper is carefully designed for design and analysis of morphological analyzer
Tigrigna verbs using hybrid of memory learning and rules based approaches. The experiment have
conducted using Python 3 where TiMBL algorithms IB2 and TRIBL2, and Finite State Transducer rules
are used. The performance of the system has been evaluated using 10 fold cross validation technique.
Testing was conducted using optimized parameter settings for regular verbs and linguistic rules of the
Tigrigna language allomorph and phonology for the irregular verbs. The accuracy of the memory based
approach with optimized parameters of TiMBL algorithm IB2 and TRIBL2 was 93.24% and 92.31%,
respectively. Finally, the hybrid approach had an actual performance of 95.6% using linguistic rules for
handling irregular and copula verbs.
This document describes a tool to convert audio/text to Indian sign language using Python libraries. It discusses using natural language processing and machine learning algorithms to take text or audio as input and output the corresponding sign language video. The tool is being developed as a website to help deaf and hard of hearing people in India communicate. It covers related work on sign language recognition and conversion tools. It then describes the methodology which includes audio to text conversion, searching a database of sign language video clips, and combining clips to generate the output video. Screenshots of the frontend website and examples of inputs and outputs are provided. Future work discussed includes improving the UI and adding mobile apps to make the tool cross-platform.
Role of language engineering to preserve endangered languagesDr. Amit Kumar Jha
Role of Language Engineering to Preserve Endangered Languages discusses how language engineering can help preserve endangered languages through documentation and digitization. Language engineering is the application of computer science to develop language-related software and hardware. It involves techniques like speech and text processing to develop systems that can understand, interpret, and generate human language. Documenting endangered languages through recording speech samples and collecting texts is important for preservation. Language engineering makes this documentation process easier through tools like speech-to-text, text-to-speech, and transcription tools. It also allows for digital storage of language data, which helps preserve languages for longer as digital data is more durable than other forms of storage. Developing applications that use endangered languages, like translation systems,
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEkevig
Manipuri is both a minority and morphologically rich language with genetic features similar to Tibeto Burman languages. It has Subject-Object-Verb (SOV) order, agglutinative verb morphology and ismonosyllabic. Morphology and syntax are not clearly distinguished in this language. Natural Language Processing (NLP) is a useful research field of computer science that deals with processing of a large amount of natural language corpus. The NLP applications encompass E-Dictionary, Morphological
Analyzer, Reduplicated Multi-Word Expression (RMWE), Named Entity Recognition (NER), Part of Speech (POS) Tagging, Machine Translation (MT), Word Net, Word Sense Disambiguation (WSD) etc. In this paper, we present a study on the advancements in NLP applications for Manipuri language, at the same time presenting a comparison table of the approaches and techniques adopted and the results obtained of each of the applications followed by a detail discussion of each work.
UNL-ization of Numbers and Ordinals in Punjabi with IANijnlc
In the field of Natural Language Processing, Universal Networking Language (UNL) has been an area of
immense interest among researchers during last couple of years. Universal Networking Language (UNL) is
an artificial Language used for representing information in a natural-language-independent format. This
paper presents UNL-ization of Punjabi sentences with the help of different examples, containing numbers
and ordinals written in words, using IAN (Interactive Analyzer) tool. In UNL approach, UNL-ization is a
process of converting natural language resource to UNL and NL-ization, is a process of generating a
natural language resource out of a UNL graph. IAN processes input sentences with the help of TRules and
Dictionary entries. The proposed system performs the UNL-ization of up to fourteen digit number and
ordinals, written in words in Punjabi language, with the help of 104 dictionary entries and 67 TRules. The
system is tested on a sample of 150 random Punjabi Numbers and Ordinals, written in words, and its FMeasure
comes out to be 1.000 (on a scale of 0 to 1).
The document presents an unsupervised approach for developing stemmers for Urdu and Marathi languages. It uses frequency-based and length-based stripping approaches to automatically learn and generate suffix rules from word corpora without any linguistic knowledge. The frequency-based approach achieved 85.36% accuracy for Urdu and 82.5% for Marathi, outperforming the length-based approach. The stemmer could improve information retrieval for languages by reducing morphological variants to word stems.
Similar to A New Approach: Automatically Identify Naming Word from Bengali Sentence for Universal Networking language (20)
MobiManager helps you to manage, control, and secure company-owned Android devices. It allows you to distribute apps and contents across a wide range of mobile devices from the cloud.
MobiManager platform consists of overlapping defense and security mechanisms that protect against intrusion, malware, and more malicious threats.
Business benefit of Mobile device managementSyeful Islam
MobiManager - Cloud-based mobile device management solution which helps small and large businesses to
• Secure
• Monitor
• Manage
• Support mobile devices, deployed across organization for work purpose.
INfor- Mobile cloud Solution
Bring your essentials wherever your Mobile takes you. INforcloud puts your data at your fingertips, under your control. Store your documents, calendar, contacts and photos on a server at home, at one of our providers or in a data center you know.
Hidden Object Detection for Computer Vision Based Test Automation SystemSyeful Islam
This document proposes methods to help computer vision-based automation tools more easily detect and interact with hidden objects in graphical user interfaces (GUIs) during software testing. Current vision-based automation systems rely only on image recognition and mouse events, which makes it difficult to find and access hidden GUI elements like dropdown lists or slider positions. The proposed methods add the use of keyboard shortcuts along with image clicks to interact with visible objects and reveal hidden objects, allowing automation tools to more quickly and accurately access hidden objects as part of testing test cases. This should enhance the capabilities of vision-based automation systems.
Disordered Brain Modeling Using Artificial Network SOFMSyeful Islam
Autism is known as a neurobiological developmental disorder which affects language,
communication, and cognitive skill. In the case of autism attention shift impairment and
strong familiarity preference are considered to be prime deficiencies. Attention shift
impairment is one of the most seen behavioral disorders found in autistic patients. We
have model this behavior by employing self-organizing feature map (SOFM).
Disordered Brain Modeling Using Artificial Network SOFMSyeful Islam
This document summarizes a research paper that models attention shift impairment in autistic individuals using an artificial neural network called Self Organizing Feature Map (SOFM). SOFM is an unsupervised learning algorithm that can train autistic people to recognize and learn objects. The document provides background on autism as a neurodevelopmental disorder characterized by social and communication difficulties. It describes research showing differences in brain structure and function in autistic individuals, pointing to the cortex being differently wired. It also summarizes treatment approaches and long-term outcomes for autism.
It covers:
- What is jQuery?
- Why jQuery?
- How include jQuery in your web page
- Creating and manipulating elements
- Events
- Animations and effects
- Talking to the server
- jQuery UI
- Writing plugins
- Breaking news around new releases
- Using the CDN
An Algorithm for Electronic Money Transaction Security (Three Layer Security)...Syeful Islam
In the era ofinternet, most ofthe people all over the world completed their transaction
on internet. Though the user of electronic transaction or E-money transaction system
increase rapidly but the majority person are concern about the security of this system.
The growth in online transactions has resulted in a greater demand for fast and accurate
user identification and authentication. Conventional method of identification based on
possession of ID cards or exclusive knowledge like a social security number or a
password are not all together reliable. Identification and authentication by individuals'
biometric characteristics is becoming an accepted procedure that is slowly replacing the
most popular identification procedure – passwords. Among all the biometrics, fingerprint
based identification is one of the most mature and proven technique. Along with the
combination of conventional system, biometric security, Global positioning system(GPS)
and mobile messaging we have design an algorithm which increase security ofelectronic
transaction and more reliable to user. A three layer security model to enhancing security
ofelectronic transaction is proposed in this paper.
Performance Evaluation of Finite Queue Switching Under Two-Dimensional M/G/1...Syeful Islam
Abstract—In this paper we consider a local area network (LAN) of dual mode service
where one is a token bus and the other is a carrier sense multiple access with a collision
detection (CSMA/CD) bus. The objective of the paper is to find the overall cell/packet
dropping probability of a dual mode LAN for finitelength queue M/G/1(m) traffic. Here, the
offered traffic of the LAN is taken to be the equivalent carried traffic of a one-millisecond
delay. The concept of a tabular solution for two-dimensional Poisson’s traffic of circuit
switching is adapted here to find the cell dropping probability of the dual mode packet
service. Although the work is done for the traffic of similar bandwidth, it can be extended
for the case of a dissimilar bandwidth of a circuit switched network.
This tutorial will help you to understand about Java OOP’S concepts with examples. Let’s discuss about what are the features of Object Oriented Programming. Writing object-oriented programs involves creating classes, creating objects from those classes, and creating applications, which are stand-alone executable programs that use those objects.
Emergency Notification:
It is an emergency case to notify some important person or other concern person when we fall in any problem such as accident or attacked by criminal or any other incidence. On other hands it also needs to track victim location to help him from incidence.
Already some concept grows about emergency notification some people try to give solution. But they can’t think the whole scenario or their solution is not realistic or they use this concept to resolve other purpose.
To resolve this problem here I proposed an interactive, user friendly and more efficient solution by smartphone. I have given an idea about emergency notification system in Samsung phone, which can solve this.
Here I want to mention some common incidence and how our phone helps us in that incident. In our regular life we can fall into various incidents such as
- Accident
- Attacked by snatcher, criminal or terrorist
- Suddenly fall into sick (Heart attack, stroke or others) etc.
In the case of accident if victim fatally injured or there is any situation that, victim can’t get enough time to call or notify someone or group of people to help him. This situation can happen if anyone fall in sudden sick and he/she want to call his/her relatives to help him. But they cannot capable to talk. If anyone attacked by snatcher, criminal or terrorist, then it is emergency to informed police or other concern person. But it is impossible to inform police by call.
To overcome this problem I proposed a system which can notify concern person automatically if user fall any incident. There is also need to track the location of victims because the location of victims can be changed and it is emergency to know the latest location where victim exist otherwise it is difficult to find out the victim and help him.
Business Value:.
- This is more realistic feature and more users friendly and efficient.
- I hope this also minimize our life risk or our tension when we are out of home or away from family members or there’s.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
UiPath Test Automation using UiPath Test Suite series, part 5
A New Approach: Automatically Identify Naming Word from Bengali Sentence for Universal Networking language
1. International Journal of Information Technology Vol. 21 No. 1 2015
1
Abstract
More than hundreds of millions of people of almost all levels of education and attitudes from
different country communicate with each other for different using various languages. Machine
translation is highly demanding due to increasing the usage of web based Communication. One of
the major problem of Bengali translation is identified a naming word from a sentence, which is
relatively simple in English language, because such entities start with a capital letter. In Bangla we
do not have concept of small or capital letters and there is huge no. of different naming entity
available in Bangla. Thus we find difficulties in understanding whether a word is a naming word
(proper noun) or not. Here we have introduced a new approach to identify naming word from a
Bengali sentence for UNL without storing huge no. of naming entity in word dictionary. The goal is
to make possible Bangla sentence conversion to UNL and vice versa with minimal storing word in
dictionary.
Keywords: UNL, Rule based analysis, Morphological Analysis, Post Converted, Head word,
Knowledge base.
1. Introduction
Today the demand of inter communication between all levels of peoples in various country is
highly increased. This globalization trend evokes for a homogeneous platform so that each member
of the platform can apprehend what other intimates and perpetuates the discussion in a mellifluous
way. However the barriers of languages throughout the world are continuously obviating the whole
world from congregating into a single domain of sharing knowledge and information. Therefore
researcher works on various languages and tries to give a platform where multi lingual people can
communicate through their native language. Researcher analyze the language structure and form
structural grammar and rules which used to translate one language to other. From the very
A New Approach: Automatically Identify Naming Word
from Bengali Sentence for Universal Networking
language
Syeful Islam and Jugal Krishna Das
Computer Science and Engineering, Jahangirnagar
University, Savar, Dhaka-1213, Bangladesh.
syefulislam@yahoo.com
2. Syeful Islam and Jugal Krishna Das
2
beginning the Indian linguist Panini proposed vyaakaran (a set of rules by which the language is
analyzed) and gives the structure for Sanskrit language. After the era of Panini various linguist
works on language and proposed various technique. But the most modern theory proposed by the
American linguist Noam Chomsky is universal grammar which is the base of modern language
translation program.
From the last few years several language-specific translation systems have been proposed. Since
these systems are based on specific source and target languages, these have their own limitations.
As a consequence United Nations University/Institute of Advanced Studies (UNU/IAS) were
decided to develop an inter-language translation program [1]. The corollary of their continuous
research leads a common form of languages known as Universal Networking Language (UNL) and
introduces UNL system. UNL system is an initiative to overcome the problem of language pairs in
automated translation. UNL is an artificial language that is based on Interlingua approach. UNL
acts as an intermediate form computer semantic language whereby any text written in a particular
language is converted to text of any other forms of languages [2]-[3].
UNL system consists of major three components: language resources, software for processing
language resources (parser) and supporting tools for maintaining and operating language processing
software or developing language resources. The parser of UNL system take input sentence and start
parsing based on rules and convert it into corresponding universal word from word dictionary. The
challenge in detection of named is that such expressions are hard to analyze using UNL because
they belong to the open class of expressions, i.e., there is an infinite variety and new expressions are
constantly being invented. Bengali is the seventh popular language in the world, second in India
and the national language of Bangladesh. So this is an important problem since search queries on
UNL dictionary for naming word while all naming word (proper noun) cannot be exhaustively
maintained in the dictionary for automatic identification.
This work aims at attacking this problem for Bangla language, especially on the human names
detection from Bengali sentence without storing naming word in dictionary.
This research paper is organized as follows: Section 2 deals with the problem domain and
Section 3 provides the theoretical analysis of The Universal Networking Language. In section 4 the
functioning of En-Converter is described. Section 5 Introduce the new approach to identify naming
word from a Bengali sentence for UNL. Section 6 gives the results corresponding to analysis.
Finally Section 7 draws conclusions with some remarks on future works
2. Problem Domain
UNL is an electronic language for computers. It intermediates understanding among different
natural languages. UNL represent sentences in the form of logical expressions, without ambiguity.
These expressions are not for humans to read, but for computers. It would be hard for users to
understand, and they would not need to, unless they are UNL experts. Thus, UNL is an intermediate
language to be used through the internet, which allows communication among people of different
languages using their mother tongue. Adding UNL to the network platforms will change the
existing communication landscape. The purpose of introducing UNL in communication network is
3. International Journal of Information Technology Vol. 21 No. 1 2015
3
to achieve accurate exchange of information between different languages. Information has to be
readable and understandable by users. Information expressed in UNL can be converted into the
user’s native language with higher quality and fewer mistakes than the computer translation
systems. In addition, UNL, unlike natural language, is free from ambiguities. Currently, the UNL
includes 16 languages, which are the six official languages of the United Nations (Arabic, Chinese,
English, French, Russian and Spanish), in addition to the ten other widely spoken languages
(German, Hindi, Italian, Indonesian, Japanese, Latvian, Mongol, Portuguese, Swahili and Thai).
Researchers already start works on Bengali language to include it with UNL system. Human
language like Bangla is very rich in inflections, vibhakties (suffix) and karakas, and often they are
ambiguous also. That is why Bangla parsing task becomes very difficult. At the same time, it is not
easy to provide necessary semantic, pragmatic and world knowledge that we humans often use
while we parse and understand various Bangla sentences. Bangla consists of total eighty-nine part-
of-speech tags. Bangla grammatical structure generally follows the structure: subject-object-verb
(S-O-V) structure [4-5]. We also get useful parts of speech (POS) information from various
inflections at morphological parsing. But the major problem is identifying the name from sentence
and convert is very difficult. In this section we try to clear the problem domain and define some
point why the processing of naming word is difficult.
In terms of native speakers, Bengali is the seventh most spoken language in the world, second in
India and the national language of Bangladesh. Under the project of UNL society the Bengali
linguist works on Bangla language. They already introduce some rules for UNL.
Named identification in other languages in general but Bengali in particular is difficult and
challenging as:
There is huge no. of naming word available in Bangla and it’s not wise decision to store
all of naming word in word dictionary. It causes slow performance.
Unlike English and most of the European languages, Bengali lacks capitalization
information.
Bengali person names are more diverse compared to the other languages and a lot of these
words can be found in the dictionary with some other specific meanings.
Bengali is a highly inflectional language providing one of the richest and most challenging
sets of linguistic and statistical features resulting in long and complex word forms.
Bengali is a relatively free order language.
In Bengali language conversion, En-Converter parse the sentence word by word and find word
from dictionary and apply rules. When En-Converter doesn’t find any word from dictionary then
En-Converter creates a temporary entry for this word. In the maximum case the temporary entry is
name word. We apply some rules to ensure that this words which is not in dictionary (temporary
entry) are naming word.
The later sections we discuss about a new technique of identifies the naming word from Bangla
sentence and defines a post converter for convert the Bangla name to universal word.
4. Syeful Islam and Jugal Krishna Das
4
3. Universal Networking Language
The Internet has emerged as the global information infrastructure, revolutionizing access to
information, as well as the speed by which it is transmitted and received. With the technology of
electronic mail, for example, people may communicate rapidly over long distances. Not all users,
however, can use their own language for communication.
The Universal Networking Language (UNL) is an artificial language in the form of semantic
network for computers to express and exchange every kind of information. Since the advent of
computers, researchers around the world have worked towards developing a system that would
overcome language barriers. While lots of different systems have been developed by various
organizations, each has their special representation of a given language. This results in
incompatibilities between systems. Then, it is impossible to break language barriers in all over the
world, even if we get together all the results in one system.
Against this backdrop, the concept of UNL as a common language for all computer systems was
born. With the approach of UNL, the results of the past research and development can be applied to
the present development, and make the infrastructure of future research and development.
The UNL consists of Universal words (UWs), Relations, Attributes, and UNL Knowledge Base.
The Universal words constitute the vocabulary of the UNL, Relations and attribute constitutes the
syntax of the UNL and UNL Knowledge Base constitutes the semantics of the UNL. The UNL
expresses information or knowledge in the form of semantic network with hyper-node. UNL
semantic network is made up of a set of binary relations, each binary relation is composed of a
relation and two UWs that old the relation [6].
4. Function of En-Converter
To convert Bangla sentences into UNL form, we use En-Converter (EnCo), a universal converter
system provided by the UNL project. The EnCo is a language independent parser, which provides a
framework for morphological, syntactic and semantic analysis synchronously. Natural Language
texts are analyzed sentence by sentence by using a knowledge rich lexicon and by interpreting the
analysis rules. En-Converter generates UNL expressions from sentences (or lists of words of
sentences) of a native language by applying En-conversion rules. In addition to the fundamental
function of En-conversion, it checks the formats of rules, and outputs massages for any errors. It
also outputs the information required for each stage of En-conversion in different levels. With these
facilities, a rule developer can easily develop and improve rules by using En-Converter [7].
First, En-Converter converts rules from text format into binary format, or loads the binary format
En-conversion rules. The rule checker works while converting rules. Once the binary format rules
are made, they are stored automatically and can be used directly the next time without conversion.
It is possible to choose to convert new text format rules or to use existing binary format rules.
Convert or load the rules.
Secondly, En-Converter inputs a string or a list of morphemes/words of a sentence s native
language.
Input a sentence.
5. International Journal of Information Technology Vol. 21 No. 1 2015
5
Then it starts a apply rules to the Node-list from the initial state (Figure 1).
En-Converter applies En-conversion rules to the Node-list its windows. The process of rule
application is to find a suitable rule and to take actions or operate on the Node-list in order to create
a syntactic tree and UNL network using the nodes in the Analysis windows. If a string appears in
the window, the system will retrieve the word dictionary and apply the rule to the candidates of
word entries. If a word satisfies the conditions required for the window of a rule, this word is
selected and the rule application succeeds. This process will be continued until tree and UNL
network are completed and only the entry node remains in the Node-list.
Apply the rules and retrieve the Word Dictionary.
Figure 1. Sentence information is represented as a hyper-graph.
Finally the UNL network (Node-net) is outputted to the output file in the binary relation format of
UNL expression.
Output the UNL expressions.
With the exception of the first process of rule conversion and loading, once En-Converter starts to
work, it will repeat the other processes for all input sentences. It is possible to choose which and
how many sentences are to be En-converted.
4.1. Temporary Entries
In the following two cases, En-Converter creates a temporary entry by assigning the attribute
"TEMP"("TEMP" is the abbreviation for "Temporary").
Except for an Arabic numeral or a blank space, if En-Converter cannot retrieve any
dictionary entry from the Word Dictionary for the rest of the character string, or
When a rule requiring the attribute "TEMP" is applied to the rest of the character string.
The temporary entry has the following format and it’s UW, shown inside the double quotation
"and", is assign to be the same as its headword (HW).
[HW] {} "HW" (TEMP) <, 0, 0>;
The next chapter we proposed the technique of identifies the naming word from Bangla sentence
and defines a post converter for convert the Bangla name to universal word.
5. Automatically Naming word Identification Approach
The naming word conversion is relatively simple in English language, because such entities start
with a capital letter. In Bangla we do not have concept of small or capital letters. Thus we find
6. Syeful Islam and Jugal Krishna Das
6
difficulties in understanding whether a word is a naming word or not. For example, the Bangla
word "BISWAS" can be a proper noun (i.e., a family name of a person) as well as an abstract noun
(with the meaning of faith in English). For example, in order to understand the following Bangla
sentence, we must need an intelligent parser. A parser takes the Bangla sentence as input and parses
every sentence according to various rules [8]-[9].
Here we proposed a method for UNL to naming word conversion which is a combination of
dictionary-based, rule-based approaches. In this approach, UNL En-Converter identifies the naming
word using rules and morphological analysis. The approaches are sequentially described here and
demonstrated in Figure 2.
Figure 2. Flow chart of proper noun (নাম বাচক বিশেষ্য পদ) conversion
5.1. Naming word detection approach
In UNL system firstly take an input sentence and parse it word by word and search it from
dictionary the relative word. If not found it try to recognize that the temporary word is a naming
word based on define rules. If this process is fail then morphological analysis is used. The
approaches of naming word detection are sequentially described here and demonstrated in Figure 3.
Here we describe the process in three steps.
1) Dictionary based analysis for naming word detection
2) Rule-based analysis for naming word detection
3) Morphological Analysis
Figure 3. Naming word (proper noun) identification Technique
7. International Journal of Information Technology Vol. 21 No. 1 2015
7
5.1.1. Dictionary based analysis for naming word detection: If a dictionary is maintained
where we try to attach most commonly used naming word and it is known as Name dictionary. Here
we describe the dictionary based naming word (proper noun) detection technique sequentially.
Firstly the input sentence is processed on en-converter which finds the word on word dictionary.
If the word is found then it is converted into relative universal word. If not in dictionary then En-
Converter creates a temporary entry for this word.
Secondly the en-converter finds the word with flag TEMP into name dictionary. If it is found
then it is conceder as the word may be noun and apply rules to ensure.
Finally if the word is not in name dictionary then it sends into morphological analyzer to
conform that the word is naming word (proper noun).
5.1.2. Rule-based analysis for naming word detection: Rule-based approaches rely on some
rules, one or more of which is to be satisfied by the test word. Some words which use in Bangla
sentence as a part of name. Here we take a technique to identify naming word using such word (part
of name). Firstly we make dictionary entry with pof (part of name) and other relevant attribute. To
identify naming word from Bangla sentence use pof word some typical rules are given below.
Rule 1- If the parser find the following word like ম োঃ, োঃ, ম স ম্মৎ, ম ছ ম্মৎ, আোঃ, আব্দল, আব্দর,
ব োঃ, ব স, ব শসস, মিগ , বিবি, ডক্টর, ডোঃ, আলী, মেখ, স্ব ী, সসয়দ, মরভ শরন্ড, শ্রী, শ্রীযক্ত etc. then the word is
considered as the first word of name and set the status of the word first part of name (FPN). The
word or collection of words after FPN with status TEMP is also considered as part of name.
Rule 2- If the parser find the following word (title words and mid-name words to human names)
like ম ৌধরী, ব য় , ব ঞ , শট প ধয য়, খপ ধয য়, খ ন, ম শসন, ম ছ ইন, র ন, ম স ইন, ম ষ্, মি স, িস,
ব ত্র, র য়, সরক র, খ ন, আ শ দ, র ন, ক etc. and কু র, ন্দ্র, রঞ্জন, মেখর, প্রস দ, আলী, আল etc. after
temporary entry word. Then last part of name (LPN) and temporary entry word along with such
words are likely to constitute a multi-word name (proper noun). For example, রবি িস ক, পল্লি কু র
বল্লক are all name.
Rule 3- If there are two or more words in a sequence that represent the characters or spell like
the characters of Bangla or English, then they belong to the name. For example, বি এ (BA), এ এ
(MA), এ বি বি এস (MBBS) are all name. Note that the rule will not distinguish between a proper
name and common name.
Rule 4- If a substring like ি ি, দ দ , দ , স শ ি, ক কু, গঞ্জ, গ্র , পর, গড়, নগর occurs at the end of
the temporary word then temporary word is likely to be a name.
Rule 5- If a word which is in temporary entry ended with এ- র, রা, এর, র, র, রা, এরা, কে, কের, কে,
য় then the word is likely to be a name.
Rule 6- If a word like- সরনী, করাড, স্ট্রিট, কেন, থানা, স্কুে, স্ট্রিেযােয়, েলেজ, নেী, কেে, হ্রে, সাগর,
মহাসাগর, পাহাড়, পিবত is found after temporary word then NW along with such word may belong to
name. For example - বিজয় সরনী, র শসল স্ট্রিট, ব ম্বক পাহাড় all are name.
8. Syeful Islam and Jugal Krishna Das
8
Rule 7- If the sentence containing িশলন, িলশলন, িলল, শুনল, সল, বলখল, বলখশলন,মখশলন, মদখল
after temporary word then the temporary word is likely to be a proper noun.
Rule 8- If at the end of word there are suffixes like ট , খ ন , খ বন, ট শত, ট য়, টিশক, ট শক, টকুন, গুল ,
গুশল , গুবল etc., and then word is usually not a proper noun.
5.1.3. Morphological Analysis for naming word detection: When previous two steps fail to
identify naming word or there is confusion about the word is proper noun or not then we applying
morphological analysis to sure that the unknown word is proper noun. In this case we conceder the
structure of words and the position of word in the sentence and identify the word type [10].
Rule 1- Proper noun always ended with 1st, 2nd, 5th and 6th verbal inflexions (Bibhakti). So
when parser fined an unknown word with 1st, 2nd, 5th and 6th bibhakti then the word may be
proper noun [11-12].
Rule 2- From sentence structure if parser fined an unknown word is in the position of karti
kaarak and word is ended with 1st bibhakti then it is conceder as a proper noun.
Rule 3- If the unknown word position is in the position of karma kaarak and it is indirect object
and word is ended with 2nd bibhakti then it is conceder as a proper noun. But for direct object it is
not a proper noun.
Rule 4- If any sentence contains more than one word ended with 1st bibhakti then the first word
is with flag unknown word then must be karti kaarak and the word is proper noun.
Rule 5- In case of apaadaan kaarak, if any word in the sentence ended with 5th bibhakti and the
word is unknown then it must be noun. If most of all other’s noun are in word dictionary so
unknown word ended with 5th bibhakti must be proper noun.
Rule 6- In case of adhikaran kaarak, if any word in the sentence ended with 7th bibhakti and the
word is unknown then it may be the name of place. If the word is not in dictionary then it is
conceder as proper noun (name of any pace).
In that time, when En-Converter identify an word or collection of word as a proper noun and En-
Converter convert into UNL expression it keep track temporary word using custom UNL relation
tpl and tpr. We use two relation “tpr” and “tpl” to identify the word which should converted. The
relation “tpr” use when En-Converter finds the temporary word after pof (part of name) attribute
and “tpr” use when En-Converter finds the temporary word before pof (part of name ) attribute.
When En-Converter found “tpr” relation then it converts the word which is after blank space. For
the case of “tpl” it converts the first word. If proper noun contains only one word with attribute
TEMP then using this TEMP attribute converter convert.
5.2. Function of Post Converter
In previous section we identify naming word from bangle sentence and convert the sentence into
corresponding intermediate UNL expression. But there is little bit problem, En-Converter convert
those word which is in word dictionary. In the case of temporary word which is already identified
as a proper noun or part of proper noun cannot converted and it is same as in Bengali sentence. It is
difficult to other people who cannot read bangle language. So it must be converted into English for
9. International Journal of Information Technology Vol. 21 No. 1 2015
9
UNL expression. Post converters do this conversion. The function of post converter demonstrated
in Figure 4.
Figure 4. Function of Post Converter
5.3. Proposed En-conversion process
Here I use simple phonetic bangle to English translation method. We use same En-Converter
which is used in UNL. Firstly need to create dictionary for Bengali to English conversion. Then
rules are define for converter which uses these rules for conversion. When En-Converter found
“tpr” relation then it converts the word which is after blank space. For the case of “tpl” it converts
the first word. The functions of post converter are sequentially described here and architecture of
Post converter demonstrated in Figure 5.
Figure 5. Architecture of Post converter (Bengali to English)
5.3.1 Algorithm: How post converter converts Bangla word for intermediate UNL expression to
English for final UNL expression. The process are describe step by step-
Step 1: In first step the UNL expression is the inputs of post converter for find the final UNL
expression.
Step 2: In second step post converter read the full expression and fined relation tpr or tpl. The
relation tpr and tpl are used to identify the word which should convert.
Step 3: At third step Post converter collect Bangla word which are converted within this post
converter using the help of above two relations. When En-Converter found “tpr” relation then it
collects the word which is after blank space and for tpl it collects the first word.
Step 4: In this steps converter convert the word applying rules and finding the corresponding
English syllable or word form syllable dictionary.
Step 5: In this step we get the converted word which is placed on final UNL expression.
10. Syeful Islam and Jugal Krishna Das
10
Step 6: This steps is the final steps which generate final UNL expression.
Here we have listed some dictionary entries for post converter. Table 1 shows the Bengali vowel
and table 2 shows the shows the Bengali consonant and the corresponding entries in dictionary. In
table 3 it shows some dictionary entries for consonant plus vowel (kar). Here we only try to present
how post converter converts Bengali to English. In future we define the full phonetics for Bengali
to English conversion.
Table1. Dictionary entries for Bengali vowel
Bangla vowel Dictionary entries
অ [অ]{} "a" (TEMP) <.,0,0>
আ [আ]{} "a" (TEMP) <.,0,0>
ই [ই]{} "i" (TEMP) <.,0,0>
ঈ [ঈ]{} "ei" (TEMP) <.,0,0>
উ [উ]{} "oo" (TEMP)
<.,0,0>
ঊ [ঊ]{} "u" (TEMP) <.,0,0>
ঋ [ঋ]{} "rri" (TEMP)
<.,0,0>
এ [এ]{} "a" (TEMP) <.,0,0>
ঐ [ঐ]{} "oi" (TEMP) <.,0,0>
ও [ও]{} "o" (TEMP) <.,0,0>
ঔ [ঔ]{} "ou" (TEMP)
<.,0,0>
Table 2. Dictionary Entries for Bengali Consonant
Bangla
consonant
Dictionary Entries
ে [ে]{} "k" (TEMP) <.,0,0>
খ [খ]{} "kh" (TEMP)
<.,0,0>
গ [গ]{} "g" (TEMP) <.,0,0>
ঘ [ঘ]{} "gh" (TEMP)
<.,0,0>
ঙ [ঙ]{} "ng" (TEMP)
<.,0,0>
12. Syeful Islam and Jugal Krishna Das
12
য় [য়]{} "y" (TEMP) <.,0,0>
ৎ [ৎ]{} "t" (TEMP) <.,0,0>
Table.3. Dictionary Entries for Bengali Consonant plus Bengali Kar
Bangla parts of
word
Dictionary entries
ো [ো]{} "ka" (TEMP)
<.,0,0>
বক [বক]{} "ki" (TEMP)
<.,0,0>
কী [কী]{} "kei" (TEMP)
<.,0,0>
কু [কু]{} "koo" (TEMP)
<.,0,0>
কূ [কূ]{} "ku" (TEMP)
<.,0,0>
কৃ [কৃ ]{} krriu" (TEMP)
<.,0,0>
মক [মক]{} "ka" (TEMP)
<.,0,0>
সক [সক]{} "koi" (TEMP)
<.,0,0>
মক [মক ]{} "ko" (TEMP)
<.,0,0>
মকৌ [মকৌ]{} "kou" (TEMP)
<.,0,0>
ক্র [ক্র]{} "kra" (TEMP)
<.,0,0>
কয [কয]{} "kka" (TEMP)
<.,0,0>
Similarly for all consonant it should need to entries in word dictionary.
Example 1-
Let’s an intermediate UNL expression-
{unl}
13. International Journal of Information Technology Vol. 21 No. 1 2015
13
agt(read(icl>see>do,agt>person,obj>information).@entry.@present.@progress,েস্ট্ররম: TEMP
:05)
{/unl}
Post converter firstly read the full sentence. When it finds the word “েস্ট্ররম” with attribute TEMP
converter collect this word and push it into post converter. Then applying rules it convert into UNL
word “karim”. Converter parses “েস্ট্ররম” letter by letter.
ে = ে + অ -> Ka
স্ট্রর = -> ri
ম -> m
That’s mean “েস্ট্ররম” converted in “Karim”
Thus Post converter converts all proper nouns Bengali to English. Here we only try to present
how post converter converts Bengali to English. In future we define the full for Bengali to English
conversion.
6. Result Analysis
To convert any Bangla sentence we have used the following files.
Input file
Rules file
Dictionary
We have used an Encoder (EnCoL33.exe) and here I present some print screen of en-conversion.
Screen print shows the Encoder that produces Bangla to UNL expression or UNL to Bangla (figure
6).
Figure 6. Encoder for En-conversion
14. Syeful Islam and Jugal Krishna Das
14
When users click on convert button it generates corresponding UNL expression. The bellow
screen shows this operation (figure 7)
Figure 7. En-conversion
Based on the three steps pronoun detection technique we define rules for UNL system which
identify the naming word (proper noun) from a Bengali sentence and create relative UNL
expression from the sentence.
6.1. Example:
This case is the form of a noun or pronoun used in the subject or predicate nominative. It
denotes the agent of the action stated by the verb. For example, “েস্ট্ররম পস্ট্রড়লেলছ” pronounce as
Karim Poritechhe means “Karim is reading”. Here subject Karim initiates an action. So, agent (agt)
relation is made between subject “Karim” and verb “read”.
The following dictionary entries are needed for conversion the sentence
[েস্ট্ররম] {} “karim (icl>name, iof>person,com>male)(N)
[পড়] {} “read (icl>see>do,agt>person,obj>information) (ROOT,CEND,^ALT)
[ইলেলছ] {} “” (VI, CEND, CEG1, PRS, PRG, 3P)
; Where, N denotes noun, ROOT for verb root, CEND for Consonant Ended Root, ^ALT for not
alternative, VI is attribute for verbal inflexion, CEG1 for Consonant Ended Group 1, PRS for
present tense, PRG for progress means continuous tense and 3P for third person.
But েস্ট্ররম is the name of a person. So it is not needed to entry on dictionary. To convert this
sentence into UNL expression when En-Converter finds word which is not in dictionaries then it
creates a temporary entry. So if েস্ট্ররম is not in dictionary then the temporary entry as like:
[েস্ট্ররম]{} "েস্ট্ররম " (TEMP) <.,0,0>; [ ]{}
15. International Journal of Information Technology Vol. 21 No. 1 2015
15
Applied rules:
R{:::}{^PRON,^N,^VERB,^ROOT,^ADJ,^ADV,^ABY,^KBIV,^BIV:+N,+PROP::}(BLK)P10;
After applying these rules it is conceder as a proper noun. And temporary entry converted as:
[েস্ট্ররম]{}" েস্ট্ররম " (N,PROP,TEMP) <.,0,0>;.[ ]{}
To convert this sentence into UNL expression morphological analysis is made between “পড়” (por)
and “ইলেলছ” (itechhe) and semantic analysis is made between “েস্ট্ররম” (karim) and “পস্ট্রড়লেলছ”
(poritechhe)
Rule of morphological analysis:
After applying some right shift rules when verb root “পড়” (por) comes in the LAW and verbal
inflexion “ইলেলছ” (itechhe) comes in the RAW then following rule is applied to perform
morphological analysis:
+{ROOT,CEND,^ALT,^VERB:+VERB,-ROOT,+@::}{VI, CEND:::}P10;
Rule of semantic analysis:
After morphological analysis when “েস্ট্ররম” appears in the LAW and verb “পস্ট্রড়লেলছ” (poritechhe)
appears in the RAW the following agent (agt) relation is made between “েস্ট্ররম” (karim) and
“পস্ট্রড়লেলছ” (poritechhe) to complete the semantic analysis.
> {N,SUBJ::agt:}{VERB,#AGT:+&@present,+&@progress::}P1; Where, N denotes the attributes
for nouns or pronouns
The intermediate language UNL form shows that the word “েস্ট্ররম” same as before conversion.
When the sentence “েস্ট্ররম পস্ট্রড়লেলছ” converted into another language then the word “েস্ট্ররম” is not
understandable. Because of, in UNL expression the word “েস্ট্ররম” is Bengali word.
So this expression is invoked into Post Converter which converts this word into English using
proposed simple phonetic method.
Post Converter takes the UNL expression and converts the proper noun “েস্ট্ররম” Bengali to English
“karim”.
ে = ে + অ -> Ka
স্ট্রর -> ri
ম -> m
That’s mean “েস্ট্ররম” converted in “karim”
{unl}
agt(read(icl>see>do,agt>person,obj>information).@entry.@present.@progress, েস্ট্ররম:TEMP: 05)
{/unl}
16. Syeful Islam and Jugal Krishna Das
16
{unl}
agt(read(icl>see>do,agt>person,obj>information).@entry.@present.@progress, karim 05)
{/unl}
After conversion the final UNL expression is like as-
Thus the UNL converter and Post converter can identify any naming word from Bengali sentence
and convert it corresponding UNL expression.
7. Conclusion
Here we have defined a procedure to identified naming word (proper noun) from Bengali
sentence and conversion method from bangle to UNL expression. We have also demonstrated how
UNL converter identified naming word from Bengali sentence and the UNL expression conversion
by taking a sentence as an example. Here in chapter 5 we demonstrate examples briefly. Here we
define our work as two parts, firstly identified a naming word (proper noun) from Bengali sentence
and secondly convert this naming word into UNL form. In the second parts we use a converter
named as post converter which use simple phonetic method to convert Bengali to English and we
only mention the simple procedure not complete. Our future plan in this regards is give the
complete rules for post converter. We will also works on Bengali language and give the complete
analysis rules for en-conversion program to convert Bangla to UNL language and generation rules
for de-conversion program to convert any UNL documents to Bangla language.
9. References
[1] http://www.undl.org last accessed on July 23, 2014.
[2] H. Uchida, M. Zhu, Della Santa, “A Gift for a Millennium” The United Nation University,
Tokyo, Japan, 2000.
[3] H. Uchida, M. Zhu, “The Universal Networking Language (UNL) Specification” Version 3.0,
Technical Report, United Nations University, Tokyo, 1998.
[4] D.C. Shuniti Kumar, “Bhasha-Prakash Bangala Vyakaran”, Rupa and Company Prokashoni,
Calcutta, July 1999, pp.170-175
[5] D.S. Rameswar, “Shadharan Vasha Biggan and Bangla Vasha”, Pustok Biponi Prokashoni,
November 1996, pp.358-377.
[6] R. T. Martins, L. H. M. Rino, M. D. G. V. Nunes, and O.N. Oliveira, “The UNL distinctive
features: interfaces from a NL-UNL enconverting task”.
[7] EnConverter Specifications, version 3.3, UNL Center/ UNDL Foundation, Tokyo, Japan 2002.
[8] S. Abdel-Rahim, A.A. Libdeh, F. Sawalha, M.K. Odeh, “Universal Networking
Language(UNL) a Means to Bridge the Digital Divide”, Computer Technology Training and
Industrial Studies Center, Royal Scientific Sciety, March 2002
17. International Journal of Information Technology Vol. 21 No. 1 2015
17
[9] Md. N.Y. Ali, J.K. Das, S.M. A. Al-Mamu, A.M. Nurannabi, “Morphological Analysis of
Bangla Words for Universal Networking Language”, Third International Conference on Digital
Information Management (ICDIM 2008), London, England. pp. 532-537
[10] S. Dashgupta, N. Khan, D.S.H. Pavel, A.I. Sarkar, M. Khan, “Morphological Analysis of
Inflecting Compound words in Bangla”, International Conference on Computer, and
Communication Engineering (ICCIT), Dhaka, 2005, pp. 110-117.
[11] H. Azad, “Bakkotottyo”, Second edition, 1994, Dhaka.
[12] J. Parikh, J. Khot, S. Dave, P. Bhattacharyya, “Predicate Preserving Parsing”, Department of
Computer Science and Engineering , Indian Institute of Technology, Bombay.
Authors
Md. Syeful Islam obtained his B.Sc. and M.Sc. in Computer Science and
Engineering from Jahangirnagar University, Dhaka, Bangladesh in 2010 and 2011
respectively. He is now working as a Senior Software Engineer at Samsung R&D
Institute Bangladesh. Previously he worked as a software consultant in the Micro-
Finance solutions Department of Southtech Ltd. in Dhaka, Bangladesh. His
research interests are in Natural Language processing, AI, embedded computer
systems and sensor networks, distributed Computing and big data analysis.
Dr. Jugal Krishna Das obtained his M.Sc. in Computer Engineering from
Donetsk Technical University, Ukraine in 1989, and Ph.D. from Glushkov Institute
of Cybernetics, Kiev in 1993. He works as a professor in the department of
Computer Science and Engineering, Jahangirnagar University, Bangladesh. His
research interests are in Natural Language processing, distributed computing and
Computer Networking.