Ali Akbar Dehkhoda, the prominent lexicographer, describes a person who has difficulty in grasping knowledge as someone who “Cannot understand something without knowing all its details.” If the knowledge required by somebody is in a language other than the person’s mother tongue, access to this knowledge will surely meet special difficulties resulting from the person’s lack of mastery over the second
language. Any project that can monitor knowledge sources written in English and change them into the
user’s language by employing a simple understandable model is capable of being a knowledge-based
project with a world view regarding text simplification. This article creates a knowledge system,
investigates some algorithms for analyzing contents of complex texts, and presents solutions for changing
such texts simple and understandable ones. Texts are automatically analyzed and their ambiguous points
are identified by software, but it is the author or the human agent who makes decisions concerning
omission of the ambiguities or correction of the texts.
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGEScsandit
ABSTRACT
Natural Language processing is an interdisciplinary branch of linguistic and computer science studied under the Artificial Intelligence (AI) that gave birth to an allied area called
‘Computational Linguistic’ which focuses on processing of natural languages on computational devices. A natural language consists of a large number of sentences which are linguistic units involving one or more words linked together in accordance with a set of predefined rules called grammar. Grammar checking is the task of validating sentences syntactically and is a prominent tool within language engineering. Our review draws on the recent development of various grammar checkers to look at past, present and the future in a new light. Our review covers grammar checkers of many languages with the aim of seeking their approaches, methodologies for developing new tool and system as a whole. The survey concludes with the discussion of various features included in existing grammar checkers of foreign languages as well as a few Indian Languages.
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTSijnlc
The evolution of information Technology has led to the collection of large amount of data, the volume of
which has increased to the extent that in last two years the data produced is greater than all the data ever
recorded in human history. This has necessitated use of machines to understand, interpret and apply data,
without manual involvement. A lot of these texts are available in transliterated code-mixed form, which due
to the complexity are very difficult to analyze. The work already performed in this area is progressing at
great pace and this work hopes to be a way to push that work further. The designed system is an effort
which classifies Hindi as well as Marathi text transliterated (Romanized) documents automatically using
supervised learning methods (KNN), Naïve Bayes and Support Vector Machine (SVM)) and ontology based
classification; and results are compared to in order to decide which methodology is better suited in
handling of these documents. As we will see, the plain machine learning algorithm applications are just as
or in many cases are much better in performance than the more analytical approach.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Functional English Design for Domestic Migrant Workersidhasaeful
This paper aimed at: (1) describing the content of Functional English Design (FED) materials and (2) describing the appropriateness of the FEDas the English training materials for the migrant workers' candidates (MWC). This study used ADDIE (Analysing, Designing, Developing, Implementing and Evaluating) model involving totally 200 MWC in the 4 PPTKIS (namely authorized private boards in which duties serves the Indonesian workers' placement and protection abroad).The data were taken from the documentation, the trainees’ English training achievements using the FED and peer-debriefing. The gathered data was analyzed using: Content Analysis and Mean-difference computation of the trainees' test results descriptively. This study found: (1) the content of the FEDthatdeveloped“Imparting and seeking factual information” with “Minimum–adequate language Functions” was matched with the trainees needs and (2) the FED was appropriate to use as an alternative English materials since it was designed based on the result of needs analysis beside the test result in significant improvement i.e. the Mean Difference of the oral pre and post-test was 2.25 within the scoring standard scale of 0-10, while the Md of the written pre-post-test was 13.35 within the scoring standard scale of 0-100. Besides, the peers debriefing stated that the FED was recommended for use in the 4 investigated PPTKIS.
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...Syeful Islam
More than hundreds of millions of people of almost all levels of education and attitudes from different country communicate with each other for different purposes using various languages. Machine translation is highly demanding due to increasing the usage of web based Communication. One of the major problem of Bengali translation is identified a naming word from a sentence, which is relatively simple in English language, because such entities start with a capital letter. In Bangla we do not have concept of small or capital letters and there is huge no. of different naming entity available in Bangla. Thus we find difficulties in understanding whether a word is a naming word or not. Here we have introduced a new approach to identify naming word from a Bengali sentence for machine translation system without storing huge no. of naming entity in word dictionary. The goal is to make possible Bangla sentence conversion with minimal storing word in dictionary.
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGEScsandit
ABSTRACT
Natural Language processing is an interdisciplinary branch of linguistic and computer science studied under the Artificial Intelligence (AI) that gave birth to an allied area called
‘Computational Linguistic’ which focuses on processing of natural languages on computational devices. A natural language consists of a large number of sentences which are linguistic units involving one or more words linked together in accordance with a set of predefined rules called grammar. Grammar checking is the task of validating sentences syntactically and is a prominent tool within language engineering. Our review draws on the recent development of various grammar checkers to look at past, present and the future in a new light. Our review covers grammar checkers of many languages with the aim of seeking their approaches, methodologies for developing new tool and system as a whole. The survey concludes with the discussion of various features included in existing grammar checkers of foreign languages as well as a few Indian Languages.
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTSijnlc
The evolution of information Technology has led to the collection of large amount of data, the volume of
which has increased to the extent that in last two years the data produced is greater than all the data ever
recorded in human history. This has necessitated use of machines to understand, interpret and apply data,
without manual involvement. A lot of these texts are available in transliterated code-mixed form, which due
to the complexity are very difficult to analyze. The work already performed in this area is progressing at
great pace and this work hopes to be a way to push that work further. The designed system is an effort
which classifies Hindi as well as Marathi text transliterated (Romanized) documents automatically using
supervised learning methods (KNN), Naïve Bayes and Support Vector Machine (SVM)) and ontology based
classification; and results are compared to in order to decide which methodology is better suited in
handling of these documents. As we will see, the plain machine learning algorithm applications are just as
or in many cases are much better in performance than the more analytical approach.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Functional English Design for Domestic Migrant Workersidhasaeful
This paper aimed at: (1) describing the content of Functional English Design (FED) materials and (2) describing the appropriateness of the FEDas the English training materials for the migrant workers' candidates (MWC). This study used ADDIE (Analysing, Designing, Developing, Implementing and Evaluating) model involving totally 200 MWC in the 4 PPTKIS (namely authorized private boards in which duties serves the Indonesian workers' placement and protection abroad).The data were taken from the documentation, the trainees’ English training achievements using the FED and peer-debriefing. The gathered data was analyzed using: Content Analysis and Mean-difference computation of the trainees' test results descriptively. This study found: (1) the content of the FEDthatdeveloped“Imparting and seeking factual information” with “Minimum–adequate language Functions” was matched with the trainees needs and (2) the FED was appropriate to use as an alternative English materials since it was designed based on the result of needs analysis beside the test result in significant improvement i.e. the Mean Difference of the oral pre and post-test was 2.25 within the scoring standard scale of 0-10, while the Md of the written pre-post-test was 13.35 within the scoring standard scale of 0-100. Besides, the peers debriefing stated that the FED was recommended for use in the 4 investigated PPTKIS.
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...Syeful Islam
More than hundreds of millions of people of almost all levels of education and attitudes from different country communicate with each other for different purposes using various languages. Machine translation is highly demanding due to increasing the usage of web based Communication. One of the major problem of Bengali translation is identified a naming word from a sentence, which is relatively simple in English language, because such entities start with a capital letter. In Bangla we do not have concept of small or capital letters and there is huge no. of different naming entity available in Bangla. Thus we find difficulties in understanding whether a word is a naming word or not. Here we have introduced a new approach to identify naming word from a Bengali sentence for machine translation system without storing huge no. of naming entity in word dictionary. The goal is to make possible Bangla sentence conversion with minimal storing word in dictionary.
Exploiting rules for resolving ambiguity in marathi language texteSAT Journals
Abstract
Natural language ambiguity is a situation involving some words having multiple meanings/senses. This paper discusses natural
language ambiguity and its types. Further we propose a knowledge based solution to resolve various types of ambiguity occurring
in Marathi language text. The task of resolving semantic and lexical ambiguity occurring in words to obtain the actual sense is
referred as Word Sense Disambiguation (WSD). Marathi language is the official and commonly spoken language of Maharashtra
state in India. Plenty of words in Marathi are spelled same as well as uttered same but are semantically (meaning-wise/ sensewise)
different. During the automatic translation, these words lead to ambiguity. Our method successfully removes the ambiguity
by identifying the correct sense of the given text from the predefined possible senses available in Marathi Wordnet using word and
sentence rules. The method is applicable only for word level ambiguity. Structural ambiguity is not handled by this system. This
system may be successfully used as a subsystem in other Natural Language Processing (NLP) applications.
Key Words: Word Sense Disambiguation, Natural Language Processing, Marathi, Marathi Wordnet, ambiguity,
knowledge based
Crowdsourcing in developing repository of phrase definition in Bahasa IndonesiaTELKOMNIKA JOURNAL
Language repository is valuable as a reference in using the language, its preservation, and in
developing and implementation of natural language processing algorithms. Bahasa Indonesia is one of
natural languages that hardly has repository despite its large number of speakers and previous attempts to
build ones. We devised a way to develop repository of phrase definition in Bahasa using a kind of
crowdsourcing and investigated its implementation. An application add-on was inserted to an information
system that manages final year projects of undergraduate students. The add-on invites students to
participate in writing keyword definition and validating definition. Investigation in a period of six months
reveals that about 25% of application users take parts into the voluntary activities either as definition
writers and/or validators. During the period, about 1200 phrase definitions were added into the repository
and in average each definition is validated by two participants. The activity is supported by users that are
well aware of the tasks, and have positive perception about the work, despite different reasons that
motivate their contribution
CITIZENSHIP ACT (591) OF GHANA AS A LOGIC THEOREM AND ITS SEMANTIC IMPLICATIONSkevig
This paper presents excerpts of the natural text of the Citizenship Act of the Republic of Ghana, ACT 2000
(591), as a logic theorem in First Order Logic (FOL) language. The formalism of this piece of law is done
to allow for semantic analysis of the text by machines. The results of this research also deals with the
problem of ambiguities in the legal text, and reveals the semantic consequence of the natural construction
or textual structure of the piece of law; one major problem of the semantics of legal text has been the
problem of ambiguities, which largely affects interpretations that are alluded to legal statements. Some
constructed sentences sometimes do not reflect the semantic intentions of the composers of the sentences,
due to inherent ambiguities or technical faults in the structure of some parts of the statements. The
formalism of the Act in this paper provides logical proofs and deductions which are asserted to establish
clarity and to reveal semantic errors, or otherwise preserve the semantic intentions of the Act.
CITIZENSHIP ACT (591) OF GHANA AS A LOGIC THEOREM AND ITS SEMANTIC IMPLICATIONSijnlc
This paper presents excerpts of the natural text of the Citizenship Act of the Republic of Ghana, ACT 2000 (591), as a logic theorem in First Order Logic (FOL) language. The formalism of this piece of law is done to allow for semantic analysis of the text by machines. The results of this research also deals with the problem of ambiguities in the legal text, andreveals the semantic consequence of the natural construction or textual structure of the piece of law;one major problem ofthe semantics of legal text has been the problem of ambiguities, which largely affects interpretations that are alluded to legal statements. Some constructed sentences sometimesdo not reflect the semantic intentions of the composers of the sentences, due to inherent ambiguities or technical faults in the structure of some parts of the statements.The formalism of the Actin this paper provides logical proofs and deductions which are asserted to establish clarity and to reveal semantic errors, or otherwise preserve the semantic intentions of the Act
Automatic classification of bengali sentences based on sense definitions pres...ijctcm
Based on the sense definition of words available in the Bengali WordNet, an attempt is made to classify the
Bengali sentences automatically into different groups in accordance with their underlying senses. The input
sentences are collected from 50 different categories of the Bengali text corpus developed in the TDIL
project of the Govt. of India, while information about the different senses of particular ambiguous lexical
item is collected from Bengali WordNet. In an experimental basis we have used Naive Bayes probabilistic
model as a useful classifier of sentences. We have applied the algorithm over 1747 sentences that contain a
particular Bengali lexical item which, because of its ambiguous nature, is able to trigger different senses
that render sentences in different meanings. In our experiment we have achieved around 84% accurate
result on the sense classification over the total input sentences. We have analyzed those residual sentences
that did not comply with our experiment and did affect the results to note that in many cases, wrong
syntactic structures and less semantic information are the main hurdles in semantic classification of
sentences. The applicational relevance of this study is attested in automatic text classification, machine
learning, information extraction, and word sense disambiguation
A New Approach: Automatically Identify Proper Noun from Bengali Sentence for ...Syeful Islam
More than hundreds of millions of people of almost all levels of education and attitudes from different country communicate with
each other for different using various languages. Machine translation is highly demanding due to increasing the usage of web
based Communication. One of the major problem of Bengali translation is identified a naming word from a sentence, which is
relatively simple in English language, because such entities start with a capital letter. In Bangla we do not have concept of small
or capital letters and there is huge no. of different naming entity available in Bangla. Thus we find difficulties in understanding
whether a word is a proper noun or not. Here we have introduce a new approach to identify proper noun from a Bengali sentence
for UNL without storing huge no. of naming entity in word dictionary. The goal is to make possible Bangla sentence conversion
to UNL and vice versa with minimal storing word in dictionary.
Natural Language Processing Theory, Applications and Difficultiesijtsrd
The promise of a powerful computing device to help people in productivity as well as in recreation can only be realized with proper human machine communication. Automatic recognition and understanding of spoken language is the first step toward natural human machine interaction. Research in this field has produced remarkable results, leading to many exciting expectations and new challenges. This field is known as Natural language Processing. In this paper the natural language generation and Natural language understanding is discussed. Difficulties in NLU, applications and comparison with structured programming language are also discussed here. Mrs. Anjali Gharat | Mrs. Helina Tandel | Mr. Ketan Bagade "Natural Language Processing Theory, Applications and Difficulties" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-6 , October 2019, URL: https://www.ijtsrd.com/papers/ijtsrd28092.pdf Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/28092/natural-language-processing-theory-applications-and-difficulties/mrs-anjali-gharat
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Information retrieval (IR) system aims to retrieve
relevant documents to a user query where the query is a set of
keywords. Cross-language information retrieval (CLIR) is a
retrieval process in which the user fires queries in one language to
retrieve information from another language. The growing
requirement on the Internet for users to access information
expressed in language other than their own has led to Cross
Language Information Retrieval (CLIR) becoming established as
a major topic in IR.
Robust extended tokenization framework for romanian by semantic parallel text...ijnlc
Tokenization is considered a solved problem when reduced to just word borders identification, punctuation
and white spaces handling. Obtaining a high quality outcome from this process is essential for subsequent
NLP piped processes (POS-tagging, WSD). In this paper we claim that to obtain this quality we need to use
in the tokenization disambiguation process all linguistic, morphosyntactic, and semantic-level word-related
information as necessary. We also claim that semantic disambiguation performs much better in a bilingual
context than in a monolingual one. Then we prove that for the disambiguation purposes the bilingual text
provided by high profile on-line machine translation services performs almost to the same level with
human-originated parallel texts (Gold standard). Finally we claim that the tokenization algorithm
incorporated in TORO can be used as a criterion for on-line machine translation services comparative
quality assessment and we provide a setup for this purpose.
Design Analysis Rules to Identify Proper Noun from Bengali Sentence for Univ...Syeful Islam
Abstract—Now-a-days hundreds of millions of people of
almost all levels of education and attitudes from different
country communicate with each other for different
purposes and perform their jobs on internet or other
communication medium using various languages. Not all
people know all language; therefore it is very difficult to
communicate or works on various languages. In this
situation the computer scientist introduce various inter
language translation program (Machine translation). UNL
is such kind of inter language translation program. One of
the major problem of UNL is identified a name from a
sentence, which is relatively simple in English language,
because such entities start with a capital letter. In Bangla
we do not have concept of small or capital letters. Thus
we find difficulties in understanding whether a word is a
proper noun or not. Here we have proposed analysis rules
to identify proper noun from a sentence and established
post converter which translate the name entity from
Bangla to UNL. The goal is to make possible Bangla
sentence conversion to UNL and vice versa. UNL system
prove that the theoretical analysis of our proposed system
able to identify proper noun from Bangla sentence and
produce relative Universal word for UNL.
Exploiting rules for resolving ambiguity in marathi language texteSAT Journals
Abstract
Natural language ambiguity is a situation involving some words having multiple meanings/senses. This paper discusses natural
language ambiguity and its types. Further we propose a knowledge based solution to resolve various types of ambiguity occurring
in Marathi language text. The task of resolving semantic and lexical ambiguity occurring in words to obtain the actual sense is
referred as Word Sense Disambiguation (WSD). Marathi language is the official and commonly spoken language of Maharashtra
state in India. Plenty of words in Marathi are spelled same as well as uttered same but are semantically (meaning-wise/ sensewise)
different. During the automatic translation, these words lead to ambiguity. Our method successfully removes the ambiguity
by identifying the correct sense of the given text from the predefined possible senses available in Marathi Wordnet using word and
sentence rules. The method is applicable only for word level ambiguity. Structural ambiguity is not handled by this system. This
system may be successfully used as a subsystem in other Natural Language Processing (NLP) applications.
Key Words: Word Sense Disambiguation, Natural Language Processing, Marathi, Marathi Wordnet, ambiguity,
knowledge based
Crowdsourcing in developing repository of phrase definition in Bahasa IndonesiaTELKOMNIKA JOURNAL
Language repository is valuable as a reference in using the language, its preservation, and in
developing and implementation of natural language processing algorithms. Bahasa Indonesia is one of
natural languages that hardly has repository despite its large number of speakers and previous attempts to
build ones. We devised a way to develop repository of phrase definition in Bahasa using a kind of
crowdsourcing and investigated its implementation. An application add-on was inserted to an information
system that manages final year projects of undergraduate students. The add-on invites students to
participate in writing keyword definition and validating definition. Investigation in a period of six months
reveals that about 25% of application users take parts into the voluntary activities either as definition
writers and/or validators. During the period, about 1200 phrase definitions were added into the repository
and in average each definition is validated by two participants. The activity is supported by users that are
well aware of the tasks, and have positive perception about the work, despite different reasons that
motivate their contribution
CITIZENSHIP ACT (591) OF GHANA AS A LOGIC THEOREM AND ITS SEMANTIC IMPLICATIONSkevig
This paper presents excerpts of the natural text of the Citizenship Act of the Republic of Ghana, ACT 2000
(591), as a logic theorem in First Order Logic (FOL) language. The formalism of this piece of law is done
to allow for semantic analysis of the text by machines. The results of this research also deals with the
problem of ambiguities in the legal text, and reveals the semantic consequence of the natural construction
or textual structure of the piece of law; one major problem of the semantics of legal text has been the
problem of ambiguities, which largely affects interpretations that are alluded to legal statements. Some
constructed sentences sometimes do not reflect the semantic intentions of the composers of the sentences,
due to inherent ambiguities or technical faults in the structure of some parts of the statements. The
formalism of the Act in this paper provides logical proofs and deductions which are asserted to establish
clarity and to reveal semantic errors, or otherwise preserve the semantic intentions of the Act.
CITIZENSHIP ACT (591) OF GHANA AS A LOGIC THEOREM AND ITS SEMANTIC IMPLICATIONSijnlc
This paper presents excerpts of the natural text of the Citizenship Act of the Republic of Ghana, ACT 2000 (591), as a logic theorem in First Order Logic (FOL) language. The formalism of this piece of law is done to allow for semantic analysis of the text by machines. The results of this research also deals with the problem of ambiguities in the legal text, andreveals the semantic consequence of the natural construction or textual structure of the piece of law;one major problem ofthe semantics of legal text has been the problem of ambiguities, which largely affects interpretations that are alluded to legal statements. Some constructed sentences sometimesdo not reflect the semantic intentions of the composers of the sentences, due to inherent ambiguities or technical faults in the structure of some parts of the statements.The formalism of the Actin this paper provides logical proofs and deductions which are asserted to establish clarity and to reveal semantic errors, or otherwise preserve the semantic intentions of the Act
Automatic classification of bengali sentences based on sense definitions pres...ijctcm
Based on the sense definition of words available in the Bengali WordNet, an attempt is made to classify the
Bengali sentences automatically into different groups in accordance with their underlying senses. The input
sentences are collected from 50 different categories of the Bengali text corpus developed in the TDIL
project of the Govt. of India, while information about the different senses of particular ambiguous lexical
item is collected from Bengali WordNet. In an experimental basis we have used Naive Bayes probabilistic
model as a useful classifier of sentences. We have applied the algorithm over 1747 sentences that contain a
particular Bengali lexical item which, because of its ambiguous nature, is able to trigger different senses
that render sentences in different meanings. In our experiment we have achieved around 84% accurate
result on the sense classification over the total input sentences. We have analyzed those residual sentences
that did not comply with our experiment and did affect the results to note that in many cases, wrong
syntactic structures and less semantic information are the main hurdles in semantic classification of
sentences. The applicational relevance of this study is attested in automatic text classification, machine
learning, information extraction, and word sense disambiguation
A New Approach: Automatically Identify Proper Noun from Bengali Sentence for ...Syeful Islam
More than hundreds of millions of people of almost all levels of education and attitudes from different country communicate with
each other for different using various languages. Machine translation is highly demanding due to increasing the usage of web
based Communication. One of the major problem of Bengali translation is identified a naming word from a sentence, which is
relatively simple in English language, because such entities start with a capital letter. In Bangla we do not have concept of small
or capital letters and there is huge no. of different naming entity available in Bangla. Thus we find difficulties in understanding
whether a word is a proper noun or not. Here we have introduce a new approach to identify proper noun from a Bengali sentence
for UNL without storing huge no. of naming entity in word dictionary. The goal is to make possible Bangla sentence conversion
to UNL and vice versa with minimal storing word in dictionary.
Natural Language Processing Theory, Applications and Difficultiesijtsrd
The promise of a powerful computing device to help people in productivity as well as in recreation can only be realized with proper human machine communication. Automatic recognition and understanding of spoken language is the first step toward natural human machine interaction. Research in this field has produced remarkable results, leading to many exciting expectations and new challenges. This field is known as Natural language Processing. In this paper the natural language generation and Natural language understanding is discussed. Difficulties in NLU, applications and comparison with structured programming language are also discussed here. Mrs. Anjali Gharat | Mrs. Helina Tandel | Mr. Ketan Bagade "Natural Language Processing Theory, Applications and Difficulties" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-6 , October 2019, URL: https://www.ijtsrd.com/papers/ijtsrd28092.pdf Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/28092/natural-language-processing-theory-applications-and-difficulties/mrs-anjali-gharat
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Information retrieval (IR) system aims to retrieve
relevant documents to a user query where the query is a set of
keywords. Cross-language information retrieval (CLIR) is a
retrieval process in which the user fires queries in one language to
retrieve information from another language. The growing
requirement on the Internet for users to access information
expressed in language other than their own has led to Cross
Language Information Retrieval (CLIR) becoming established as
a major topic in IR.
Robust extended tokenization framework for romanian by semantic parallel text...ijnlc
Tokenization is considered a solved problem when reduced to just word borders identification, punctuation
and white spaces handling. Obtaining a high quality outcome from this process is essential for subsequent
NLP piped processes (POS-tagging, WSD). In this paper we claim that to obtain this quality we need to use
in the tokenization disambiguation process all linguistic, morphosyntactic, and semantic-level word-related
information as necessary. We also claim that semantic disambiguation performs much better in a bilingual
context than in a monolingual one. Then we prove that for the disambiguation purposes the bilingual text
provided by high profile on-line machine translation services performs almost to the same level with
human-originated parallel texts (Gold standard). Finally we claim that the tokenization algorithm
incorporated in TORO can be used as a criterion for on-line machine translation services comparative
quality assessment and we provide a setup for this purpose.
Design Analysis Rules to Identify Proper Noun from Bengali Sentence for Univ...Syeful Islam
Abstract—Now-a-days hundreds of millions of people of
almost all levels of education and attitudes from different
country communicate with each other for different
purposes and perform their jobs on internet or other
communication medium using various languages. Not all
people know all language; therefore it is very difficult to
communicate or works on various languages. In this
situation the computer scientist introduce various inter
language translation program (Machine translation). UNL
is such kind of inter language translation program. One of
the major problem of UNL is identified a name from a
sentence, which is relatively simple in English language,
because such entities start with a capital letter. In Bangla
we do not have concept of small or capital letters. Thus
we find difficulties in understanding whether a word is a
proper noun or not. Here we have proposed analysis rules
to identify proper noun from a sentence and established
post converter which translate the name entity from
Bangla to UNL. The goal is to make possible Bangla
sentence conversion to UNL and vice versa. UNL system
prove that the theoretical analysis of our proposed system
able to identify proper noun from Bangla sentence and
produce relative Universal word for UNL.
Once we create a business model canvas for our business idea, how do we proceed in making a financial plan? How do we assess if it's in principle financially feasible? How do we create income statements, cash flows and do break-even analysis. This workshop was aimed to bridge the gap between BMC and financial plan aimed to aid entrepreneurs struggling with financial statements. If you found this helpful or have suggestions I would love to read it in the comments section.
Personally designed (content + graphics design), officially accredited AgilePM® (Agile Project Management) Foundation courseware.
AgilePM® is a Registered Trade Mark of Dynamic Systems Development Method Limited.
Trademarks are properties of the holders, who are not affiliated with courseware author.
Factors Responsible for Poor English Reading Comprehension at Secondary LevelBahram Kazemian
The present study shows factors responsible for poor English reading comprehension at secondary school level students. The purpose of this study is to explore those factors and to suggest remedies how to strengthen English reading comprehension of the students. English is the 2nd language of Pakistani students and Kachru (1996) places it in the outer circle. Test and interviews are conducted to get the data. Different factors like poor command of vocabulary, habit of cramming, no interest to learn creativity in reading but the sole goal is just to pass the examination which are found responsible for poor English reading comprehension. Motivation to learn reading can develop reading comprehension skill of students.
The Role of Electronic Dictionary in Learning Processijtsrd
Dictionary is the treasure house of any language. Language learning by acquiring language skills is assuming more important in learning. If one could acquire listening skills, one could achieve other skills such as listening, speaking, reading and writing. Listening skill can be developed by using technical gadgets. In this new scenario using technical gadgets such as e library, language laboratory and e dictionary to acquire knowledge in language felt easy. But learners are still frustrating in learning a language. If learners are attracted by teaching methodology, they will acquire the knowledge in language easily. Learners must be motivated to use technological gadgets to learn language and can be given opportunity to develop their vocabulary by using electronic dictionary. The Modernity in language learning depends on introducing electronic medium for the learners to get accuracy, aptness and to save their time. This paper emphasizes on using electronic dictionary to learn a language. V. Selvakumar | Dr. A. Munian "The Role of Electronic Dictionary in Learning Process" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-6 | Issue-5 , August 2022, URL: https://www.ijtsrd.com/papers/ijtsrd50687.pdf Paper URL: https://www.ijtsrd.com/humanities-and-the-arts/other/50687/the-role-of-electronic-dictionary-in-learning-process/v-selvakumar
Domain Specific Terminology Extraction (ICICT 2006)IT Industry
Imran Sarwar Bajwa, M. Imran Siddique, M. Abbas Choudhary, [2006], "Automatic Domain Specific Terminology Extraction using a Decision Support System", in IEEE 4th International Conference on Information and Communication Technology (ICICT 2006), Cairo, Egypt. pp:651-659
Ontology-Based Text Simplification for DyslexicsIJMTST Journal
Ensuring high school students, university students and adults, having learning disabilities, as dyslexia with actual and easy to read learning and scientific content is of great importance in our days. The main problem, discussed in this paper is how to make growing scientific and learning content more accessible for peoples, having dyslexia, or other learning disabilities. We analyze many recent researches, related to text simplification and Controlled Natural Languages and its application for development of textual learning resources for dyslexic learners in several languages. We propose ontology-based methodology for development of easy to use by dyslexics Bulgarian language textual resources using English language text, text simplification and ontology management.
6. vol 11 no 1 iwan fauzi_the effectiveness of skimming_77.92 - copyFaisal Pak
REGISTER JOURNAL has the perspectives of languages and language teachings. This journal aims at presenting and discussing some outstanding issues dealing with language and language teachings
This journal encompasses original research articles, and short communications, including:
Phonology
Morphology
Syntax
Semantics
Pragmatics
Psycholinguistics
Sociolinguistics
Discourse Analysis
Linguistics in Education
Linguistics in Literature
Language Acquisitions
English Language Teaching (ELT)
English as Second Language (ESL)
English as Foreign Language (EFL)
English for Specific Purpose (ESP)
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
Online aptitude test management system project report.pdfKamal Acharya
The purpose of on-line aptitude test system is to take online test in an efficient manner and no time wasting for checking the paper. The main objective of on-line aptitude test system is to efficiently evaluate the candidate thoroughly through a fully automated system that not only saves lot of time but also gives fast results. For students they give papers according to their convenience and time and there is no need of using extra thing like paper, pen etc. This can be used in educational institutions as well as in corporate world. Can be used anywhere any time as it is a web based application (user Location doesn’t matter). No restriction that examiner has to be present when the candidate takes the test.
Every time when lecturers/professors need to conduct examinations they have to sit down think about the questions and then create a whole new set of questions for each and every exam. In some cases the professor may want to give an open book online exam that is the student can take the exam any time anywhere, but the student might have to answer the questions in a limited time period. The professor may want to change the sequence of questions for every student. The problem that a student has is whenever a date for the exam is declared the student has to take it and there is no way he can take it at some other time. This project will create an interface for the examiner to create and store questions in a repository. It will also create an interface for the student to take examinations at his convenience and the questions and/or exams may be timed. Thereby creating an application which can be used by examiners and examinee’s simultaneously.
Examination System is very useful for Teachers/Professors. As in the teaching profession, you are responsible for writing question papers. In the conventional method, you write the question paper on paper, keep question papers separate from answers and all this information you have to keep in a locker to avoid unauthorized access. Using the Examination System you can create a question paper and everything will be written to a single exam file in encrypted format. You can set the General and Administrator password to avoid unauthorized access to your question paper. Every time you start the examination, the program shuffles all the questions and selects them randomly from the database, which reduces the chances of memorizing the questions.
Online aptitude test management system project report.pdf
Software feasibility study to
1. International Journal of Database Management Systems ( IJDMS ) Vol.7, No.4, August 2015
DOI : 10.5121/ijdms.2015.7401 1
SOFTWARE FEASIBILITY STUDY TO
TRANSFORM COMPLEX SCIENTIFIC WRITTEN
KNOWLEDGE TO A CLEAR, RATIONALE AND
SIMPLE LANGUAGE
Manoj Khandelwal1
and Mohammad Jafarabad2
1
Faculty of Science and Technology, Federation University Australia, PO Box 663,
Ballarat, Victoria 3353, Australia
Phone: +61 3 5327 9821
2
Department of Computer Engineering, Young Researchers Club, Robatkarim Branch,
Islamic Azad University, RobatKarim, Iran.
ABSTRACT
Ali Akbar Dehkhoda, the prominent lexicographer, describes a person who has difficulty in grasping
knowledge as someone who “Cannot understand something without knowing all its details.” If the
knowledge required by somebody is in a language other than the person’s mother tongue, access to this
knowledge will surely meet special difficulties resulting from the person’s lack of mastery over the second
language. Any project that can monitor knowledge sources written in English and change them into the
user’s language by employing a simple understandable model is capable of being a knowledge-based
project with a world view regarding text simplification. This article creates a knowledge system,
investigates some algorithms for analyzing contents of complex texts, and presents solutions for changing
such texts simple and understandable ones. Texts are automatically analyzed and their ambiguous points
are identified by software, but it is the author or the human agent who makes decisions concerning
omission of the ambiguities or correction of the texts.
KEYWORDS
Written knowledge, complex texts, text simplification, ambiguous words
1. INTRODUCTION
Knowledge that is implemented and published in written form is created correspondent to the
literacy level of its author. Previous studies, the extent of proficiency in writing techniques, and
the viewpoints of the author of any knowledge, play an important role in the process of designing
and implementing documents related to the knowledge. In most cases, before written knowledge
(created by an expert in the related industry or a scientist) is published, it passes through a
process of correcting the writing of or the scientific errors in the text by the editorial board of the
journal, conference, or publications. Moreover, the educational level or the areas of study the
readers of the texts must have sometimes specified in the classification of some forms of
2. International Journal of Database Management Systems ( IJDMS ) Vol.7, No.4, August 2015
2
knowledge. The problems that will then arise for the readers are ambiguities and complexities in
the texts that make them difficult to understand.
Understanding knowledge written by a specialist is not an easy task for beginners because they
will need great mental activity and intelligence to adapt their level of knowledge to that of the
specialists. Difficult expressions or complex sentences containing unfamiliar words can prevent
grasp of the written knowledge or lead to misunderstanding it. Special arrangements are required
to prevent this. Moreover, if knowledge is to be written in a way as to follow all the details,
conventions, methods, and rules of formal writing, it will entail great costs for the organization
publishing this kind of knowledge. Therefore, projects for disambiguation of knowledge
containing texts must be executed in a way that most of the ambiguities present in these texts are
omitted at minimal cost.
Authors, scholars, and researchers present their personal knowledge through theoretical-applied
research articles in publications related to their field of study, or in book form, to be used by the
public. Disambiguation of educational texts can be performed through various processes.
Knowledge managers and users, or intelligent software, can discover ambiguities in texts. The
creator of the knowledge can also carry out correction of text ambiguities and complexities at
minimal cost. In this article, the iterative and augmentative method of software production is
studied and analyzed to execute the project for identifying ambiguities and literary complexities
in knowledge containing texts. The project will make it possible to implement software that can
identity ambiguous words or sentences and, in addition, play a considerable role in removing
these ambiguities through making suitable recommendations to the author of the knowledge
containing text.
2. BACKGROUND OF TEXT SIMPLIFICATION
Language is one of the most important of mental phenomena and has been in use for many years
among people living in various regions. At present, there are about 7000 languages in the world,
2400 of these are on the verge of extinction, and 840 are used only in New Guinea. Linguists
investigate languages in the domains of syntax, phonology, morphology, and semantics. Syntax
deals with sentence recognition and studies the rules by which words are combined and placed
next to each other. Phonology is concerned with the positions of chain and sub-chain phonetic
elements in the language system. Morphology is a part of language grammar that analyzes the
structure of words by recognizing their morphemes), while semantics studies meanings in human
languages.
The famous linguist Darrow has said, “There is a completely explicit, clear, and standard-based
relationship with great importance between language and its content, and knowledge, cognition,
and language are the most important constituents of learning” [9]. The necessary condition for
reducing the time it takes to understand the content of a written text, and to increase the speed of
learning, is for the text to be fluent, simple, and without ambiguities. If a text is written fluently,
understanding the text and correct thinking about the subject happen simultaneously with reading
it. If a text is complicated and difficult to grasp, only partial thinking about the subject takes place
during the reading, and the reader completes the thinking process by asking questions. The effort
required for finding answers to these questions makes the reader less inclined to receive the
knowledge (and this is inconsistent with the principles of knowledge management)
3. International Journal of Database Management Systems ( IJDMS ) Vol.7, No.4, August 2015
3
Due to the complexity of language structure, most text processing systems do not achieve
satisfactory success when put to use. Google Website Translation is among the most successful
text processing systems. Another site active in content analysis is the Online Summarization Site.
These sites apply their proprietary algorithms to analyze texts. So far, no successful online or
offline software has been successfully applied to turn complex texts into simple ones. However, a
series of research has been conducted by a team of researchers at Lexile Company that is
presented in the form of Lexile Framework.
Lexile offers a standard called LF that evaluates the difficulty of reading texts and the capacity of
individuals in reading them. The Lexile Level ranges from 0 to 1800, and each book is assigned
one of the numbers in this range. The difficulty of reading texts is measured based on a series of
mathematical formulae. Results of these calculations are obtained based on two features:
semantic difficulty based on words of the same family with respect to frequency of use in the
text, and syntactic complexity based on sentence length [2].
After the evaluations, the LF assigns a value to the difficulty of the text (Table 1). Based on the
Lexile indices, and considering the age and education level of each individual, it will be possible
to determine the capacity of each person in reading the text.
Table 1: Lexile standard
Rank Standard of the
difficulty
of texts
Standard of the
reading capacity of
individuals
1 200-400 Up to 300
2 300-500 140-500
3 500-700 330-700
4 650- 850 445-810
5 750-950 565-910
6 850-1050 665-1000
The LF recognizes the complexity of texts but has two main shortcomings. First, the whole text is
not processed, only one or a few sample paragraphs from the text are analyzed and, based on the
analysis, the difficulty of the whole text is shown by a score. Second, LF does not specify the
reasons for the difficulty of the text, while in the idea proposed by us the reasons for the difficulty
of the text must be specified for its author. Other research on text simplification has been carried
out as case studies and in the three areas of mathematics, computers, and literature. Among the
most important of them are various projects for identifying the role played by pronoun
antecedents in sentence complexity and the study of ambiguity in words that are spelled the same
but have different meanings.
4. International Journal of Database Management Systems ( IJDMS ) Vol.7, No.4, August 2015
4
3. STUDY OF THE IMPORTANCE AND THE GOAL OF SIMPLIFYING
TEXTS WRITTEN IN ENGLISH
English plays a basic and important role in the daily lives of people because it has become a
communication bridge between various societies and most international communications are
carried out in this language. Moreover, most reference books in various sciences are written in
English and specialists and students studying at various levels and in different fields need to have
sufficient knowledge of English to be able to use these references. Therefore, unfamiliarity with
this language causes problems in carrying out various duties, one of the most important of which
is the impossibility of receiving up-to-date and needed knowledge.
The main purpose in presenting this project of simplifying texts was to make reading texts in
English possible for those having university education but low levels of acquaintance with the
English language, although execution of this project can be useful, in many instances, for other
sectors of the society. Since one of the main mottos of the United Nations Organization
UNESCO is eradication of illiteracy in societies, simplification of knowledge containing texts
can also be considered one of the tools of realizing the high goals of this Organization. In this
section, two main classes of people are studied as target groups for easy reading of science
containing texts.
3.1. Those with low capacity in language learning
Research has shown more than 25% of the population in the Unites States does not acquire the
expected level of reading ability after 9 years of formal education. This figure reaches 45-50% in
several other countries. Illiteracy and inability to apply the basic vocabulary and comprehension
skills learned in reading techniques may be due to various factors such as insufficient education,
non-standard education, social problems, mild mental retardation, etc. People with such problems
are not able to read and understand applied texts available in advanced societies.
The goal in this article is to prepare the ground for faster comprehension of scientific texts by
people with university education. As expected, a native speaker of English with this level of
education is capable of sufficiently understanding such texts, but will certainly face difficulties
when reading a text in which the standard skills of the English language have not been observed.
The reason for this is that having academic education and being native speakers are not sufficient
conditions for understanding every book written in a special field. Experience and continuous
studies are other required conditions for comprehending texts, and they will help readers to read
complex texts.
3.2. Non-native speakers
Any language learned after the first one (the mother tongue) is called “second language.” Eric
Linenberg, a pioneering linguist and neurologist, called second language a language consciously
learned or spoken by a person after maturity. In most cases, people never reach a similar level of
fluency and comprehension in the second language as their mother tongues, and face problems in
finding correct words and grammatical structures. Learning a second language can be a life-long
process for many people. Psychologists have reached the conclusion the reason for this is that
there are separate centers in the brain for learning and for controlling skills in a second language.
It must also be noted that people weak in learning a new language are different from the mentally
retarded.
5. International Journal of Database Management Systems ( IJDMS ) Vol.7, No.4, August 2015
5
There are 1.3 billion people living in China and, in many cases, they access the required
knowledge only in sources that are in Chinese. Therefore, there is no great need to learn English
in China. Since English has been omitted from college entrance examinations, studying a
scientific text in English for graduates of the best universities in China is a difficult and costly
task [8]. Our proposed project to simplify scientific texts written in English, and to make them
understandable for people with low levels of proficiency in this language, can substantially help
this section of the world population.
Understanding written text for the blind is like learning a foreign language. These people have
limited understanding of textual metaphors. However, blind people can use easy-to-read materials
in which the Braille form and format is employed. A blind French person learns French Braille
and, hence, using easy-to-read materials helps this individual to read and understand texts written
in English Braille [1].
4. LEVELS OF SIMPLIFYING TEXTS
The need students and researchers have for reading specialized texts, and the various levels of
familiarity with the language, make it more difficult to set a defined level for simplifying texts.
There are various levels, classes, classifications, and models in each language. Considering the
level of language skills a person has depends on many factors such as age, occupation, level of
education, personal interests in the language, talent, etc., classification of people and mentioning
evidence for each various language level is difficult and, sometimes, seems to be impossible.
Therefore, examples are often not presented for each level. For example, it is not possible to say
what level of language proficiency a certain age group or all graduates of a given level of
education have reached. Therefore, complex study and research are needed to assign a level of
language proficiency to a group of people with one or more common characteristics [9].
Determining the level of English language skills in people is a very complex process. Various
models have been designed for classifying the levels of the English language. These models are
conventional and divided into the three classes of advanced, intermediate, and Beginner levels.
The advanced level is a high level of language proficiency in people possessing the capabilities of
the English language. People whose mother tongues are not English but have attained an
advanced level of proficiency in English and speak very much like native speakers are called near
native speakers. These people cannot be easily differentiated from native speakers.
People with the advanced level of skills in reading English texts can study and understand
English texts and do not need text simplification, although text simplification accelerates their
comprehension. In some cases, the presence of ambiguous and complex sentences in texts makes
their understanding difficult, even for native speakers. In these cases, only a linguistics specialist
can analyze these texts. Therefore, although text simplification will greatly help people with
intermediate or beginner level proficiency in the language, in many cases it will certainly solve
problems native speakers or people with advanced levels of proficiency in English face when
reading ambiguous or complex texts.
5. DESIGNING THE PROPOSED SOFTWARE SYSTEM
Considering various studies have been carried out for many years on correcting the grammars of
various languages, and especially that of English, with the goal of removing literary ambiguities,
all grammatical points of the language have been presented in thousands of books. Therefore, the
6. International Journal of Database Management Systems ( IJDMS ) Vol.7, No.4, August 2015
6
purpose of this project is not to carry out new linguistic studies, but rather previous studies are
analyzed to present an intelligent system for discovering the main factors contributing to text
complexity through changing the existing language rules into algorithms. Obviously, accurate
recognition of all factors influencing text complexity requires complex text processing systems
and, if facts regarding software and hardware infrastructure are not considered in the feasibility
study, problems will arise when implementing this project. Therefore, the whole idea behind this
project will be to use new algorithms within the scope of previous linguistic studies.
The proposed project includes producing software for analyzing texts and for finding ambiguities
in them. The software must read the text and then point out the ambiguities and complexities to
its author. The author can then edit the text and omit the challenging points by paying attention to
the recommendations made. This will help to turn the contents of books and articles into fluent
knowledge before their publication. Factors causing text complexity are now studied in the
following four groups:
5.1. Presence of a word in the text that has more than one common meaning
Presence of words with several meanings is directly related to text complexity. Depending on
their roles in sentences, some words take on special meanings. Sometimes, the presence of a
word with several meanings results in several interpretation of the texts. In daily conversations,
people naturally understand the meaning of a word with several meanings by paying attention to
the topic of the conversation. However, accurate understanding of cases where the presence of a
word with several meanings causes ambiguity and complexity is not easy in text processing by
computers [4]. Many applied programs have been developed for removing ambiguity from words
in written texts. Having insight into the domain of intertextuality substantially helps in
automatically recognizing whether a word is ambiguous for the reader or not.
In a single sentence, the meaning allocated to each word is usually determined by the two factors
of the syntactic format embedded in the word and the semantics of the words (syntactic
dependency) [3]. For example, in sentence (a) the word “absorb” must have a meaning close to
“financial affairs” because of its proximity to the word “cost” which is related to finance. On the
other hand, in sentence (b) it comes before the word “information” and, therefore, it can be
related to a meaning associated with the domain of “learning or of being informed”. In sentence
(c), the word “treated” is related to the medical field and has the meaning of “being cured,” while
in sentence (d) the same word is associated with the domain of human relations and has the
meaning of “behavior.”
a. The customer will absorb this cost. [Asset] (“pay”)
b. The customer will absorb this information. [Information] (“learn”)
c. Peter treated Mary with antibiotics. [with Medication] (“medical”)
d. Peter treated Mary with respect. [with Quality] (“human relations”)
The above sentences illustrate that determination of the domain related to each word can play a
major role in removing its ambiguity in the sentence. Of course, in this project we are only after
recognizing ambiguity and, therefore, if we find a word with several meanings in a sentence and
the domain close to this word is not clear, we need to specify the ambiguity present in the word.
The important point to be taken into account at this point is the frequency of the usage of each
meaning of the word in the text. That is, if a word has two meanings one of which is rarely used,
it should not be included among ambiguous words. However, if a phrase like “as well as” has two
7. International Journal of Database Management Systems ( IJDMS ) Vol.7, No.4, August 2015
7
different meanings that are used with relatively equal frequency, it may be ambiguous under
certain conditions [7].
5.2. Complexities related to pronouns
Presence of pronouns that are distanced from their antecedents in the text makes sentences
ambiguous. For example, if the antecedent of a pronoun comes a few sentences before the
pronoun, it will be very difficult for the reader to find the antecedent’s exact location. Moreover,
sometimes it is not clear in sentences exactly to what an adjective refers [5]. For example, in the
sentence “little girls and boys” it is not clear whether the writer meant “all little girls and boys” or
“little girls and little boys.” Many applied programs have been written for recognizing pronoun
antecedents in which accuracies of software performance regarding various kinds of pronouns
have been mentioned. The maximum accuracy is that of recognizing the antecedent for the
second person pronoun “you,” and the minimum those of recognizing the antecedents for the
pronouns “it” and “they.”
S. Lappin and M. McCord conducted a study only on personal, number, gender pronouns in
1990, and determined a governor for the sentence containing the pronouns [5]. For example, in
the sentence “I saw her,” the word “saw” is the governor for the words “I” and “her.” The
relationship between the pronouns in the sentence is then determined depending on the governing
literary type. In the section on pronoun recognition, it is impossible to analyze the data with
labeling it. Table 2 offers an example of classifying verbs. This classification has been employed
in data mining through thematic modeling.
5.3. Presence of several easy synonyms for a word
Simple synonyms and equivalents can be used in place of heavy and difficult ones to simplify a
text. For example, replacing “terminate” with “finish” can make understanding the text easier for
readers. It is recommended that the text be once matched with the essential database in an English
dictionary and, if suitable synonyms are found, they should replace the words in the text. This
way words can be included in the text that are familiar to most readers. Databases of essential
words for each specific field must be prepared in this stage. In thesis [10], the two sources of
everyday literature for children and search engines were used to find the essential words.
Table 2: Thematic modeling
General topics Examples of verbs
Cognition Think, analyze, judge…
Communications Tell, ask, teach …
Feelings Feel, love, fear …
Social affairs Participate, make,
establish …
5.4. The role played by passive sentences in text ambiguity
If we were to caution the author of an article to omit passive sentences, our recommendations for
the text would certainly increase. On the other hand, in some cases the author may not have the
necessary information and be forced to use passive sentences. Therefore, and considering the fact
that passive sentences are involved in text complexity, the proposed solution is to find the
percentage of passive sentences used in each paragraph. If a high percentage of passive sentences
are used in the text, the author can be informed to correct the text. Text processing software
8. International Journal of Database Management Systems ( IJDMS ) Vol.7, No.4, August 2015
8
recognizes the percentage of passive sentences very accurately. Research has shown passive
sentences are most commonly employed in daily conversations but used less frequently, and in
descending order, in stories, magazines, newspapers, and university textbooks. This shows
omission of passive sentences in university texts must receive special attention in order to remove
ambiguities from them.
6. FEASIBILITY OF THE PROPOSED SYSTEM
first point to consider is that it certainly will not be easy to simplify masses of articles (large-
volume data). For example, the IEEE Xplore site alone presents more than 20000 articles each
year in its research-university databases. Therefore, it is impossible to simplify the entire human
knowledge into simple texts. The only thing that can be done is to work case by case. Since we
plan to offer a system for making changes in texts to turn them into easy-to-read materials, we
will investigate the possibility of implementing some of the processes included in the system that
recommends simple reading.
Figures (eg, Figure 1) must be numbered consecutively, 1, 2, etc., from start to finish of the
paper, ignoring sections and subsections. Tables (eg, Table 1) are also numbered consecutively,
1, 2, etc., from start to finish of the paper, ignoring sections and subsections, and independently
from figures
Table 2: Feasibility of the proposed system
10. International Journal of Database Management Systems ( IJDMS ) Vol.7, No.4, August 2015
10
7. CONCLUSIONS
Google has reported that more than 85% of searches made by users have been for finding useless
information in the internet space. Moreover, according to the announcement made by the
International Publishers Association, only 60% of the books published in Europe in 2013
contained knowledge-enhancing materials, and the rest were attractive novels or were in the
entertainment or political domains, etc. The need people have for knowledge, and the fact that
people have become comfort seekers, have led to the path toward acquiring useless information
that most often lacks knowledge and is used by the public because it is cheap (or free) , rapidly
accessible, enjoyable, and easy to read. Therefore, if a project can be executed to turn knowledge-
containing complex texts into simple, explicit, expressive, and highly readable ones, it will
naturally be possible to take a long step toward making knowledge acquisition easy. In this
research, a preliminary project was studied for creating software to change complex texts into
simple ones and to remove their ambiguities.
REFERENCES
[1] C. Blake, J. Kampov, A. K. Orphanides, D. West, and C. Lown, “Query expansion, lexical
simplification and sentence selection strategies for multi-document summarization,” in Document
understanding conference (DUC-2007), 2007.
[2] J. Allanson and S. H. Fairclough, “A research agenda for computing,” Interacting with Computers,
2004.
[3] J. Medero and M. Ostendorf, “Analysis of vocabulary difficulty using wiktionary,” in Proc. SLaTE
Workshop, 2009.
[4] Korhonen, Y. Krymolowski, and Z. Marx. 2003. Clustering polysemic subcategorization frame
distributions semantically, pages64–71.
[5] Lappin and M. McCord. 1990. A syntactic filter on pronominal anaphora for slot grammar. In
Proceedings of 28th ACL, pages 135–142.
[6] Max, A. (2006). “Writing for language-impaired readers”. In Proceedings of the 7th International
Conference on Intelligent Text Processing and Computational Linguistics, pages 567–570, Mexico
City. Springer-Verlag.
[7] Petersen, S. E. and Ostendorf, M. (2007). “Text simplification for language learners: A corpus
analysis”. In Proceedings of the Speech and Language Technology for Education Workshop (SLaTE-
2007), pages 69–72, Pennsylvania, USA.
[8] S. Crossley, D. Allen, and D. McNamara, “Text simplification and comprehensible input: A case for
an intuitive approach,” Language Teaching Research, vol. 16, no. 1, pp. 89–108, 2012.
[9] S. E. Petersen and M. Ostendorf, “Text simplification for language learners: A corpus analysis,” in
Speech and Language Technology for Education workshop, 2007.
[10] Siddharthan, A. (2003). Syntactic Simplification and Text Cohesion. PhD thesis, Universityof
Cambridge.
[11] V. Hatzivassiloglou and K.R. McKeown. 1997. Predicting the semantic orientation of adjectives. In
Proceedings of ACL/EACL, pages 174–181.
[12] Witten, I., Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd
Edition. Morgan Kaufmann