Text Mining is the technique that helps users to find out useful information from a large amount of text documents on the web or database. Most popular text mining and classification methods have adopted term-based approaches. The term based approaches and the pattern-based method describing user preferences. This review paper analyse how the text mining work on the three level i.e sentence level, document level and feature level. In this paper we review the related work which is previously done. This paper also demonstrated that what are the problems arise while doing text mining done at the feature level. This paper presents the technique to text mining for the compound sentences.
This paper proposes Natural language based Discourse Analysis method used for extracting
information from the news article of different domain. The Discourse analysis used the Rhetorical Structure
theory which is used to find coherent group of text which are most prominent for extracting information
from text. RST theory used the Nucleus- Satellite concept for finding most prominent text from the text
document. After Discourse analysis the text analysis has been done for extracting domain related object
and relates this object. For extracting the information knowledge based system has been used which
consist of domain dictionary .The domain dictionary has a bag of words for domain. The system is
evaluated according gold-of-art analysis and human decision for extracted information.
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
Â
Natural Language Processing (NLP) techniques are one of the most used techniques in the field of computer applications. It has become one of the vast and advanced techniques. Language is the means of communication or interaction among humans and in present scenario when everything is dependent on machine or everything is computerized, communication between computer and human has become a necessity. To fulfill this necessity NLP has been emerged as the means of interaction which narrows the gap between machines (computers) and humans. It was evolved from the study of linguistics which was passed through the Turing test to check the similarity between data but it was limited to small set of data. Later on various algorithms were developed along with the concept of AI (Artificial Intelligence) for the successful execution of NLP. In this paper, the main emphasis is on the different techniques of NLP which have been developed till now, their applications and the comparison of all those techniques on different parameters.
Mining Opinion Features in Customer ReviewsIJCERT JOURNAL
Â
Now days, E-commerce systems have become extremely important. Large numbers of customers are choosing online shopping because of its convenience, reliability, and cost. Client generated information and especially item reviews are significant sources of data for consumers to make informed buy choices and for makers to keep track of customerâs opinions. It is difficult for customers to make purchasing decisions based on only pictures and short product descriptions. On the other hand, mining product reviews has become a hot research topic and prior researches are mostly based on pre-specified product features to analyse the opinions. Natural Language Processing (NLP) techniques such as NLTK for Python can be applied to raw customer reviews and keywords can be extracted. This paper presents a survey on the techniques used for designing software to mine opinion features in reviews. Elven IEEE papers are selected and a comparison is made between them. These papers are representative of the significant improvements in opinion mining in the past decade.
A scalable, lexicon based technique for sentiment analysisijfcstjournal
Â
Rapid increase in the volume of sentiment rich social media on the web has resulted in an increased
interest among researchers regarding Sentimental Analysis and opinion mining. However, with so much
social media available on the web, sentiment analysis is now considered as a big data task. Hence the
conventional sentiment analysis approaches fails to efficiently handle the vast amount of sentiment data
available now a days. The main focus of the research was to find such a technique that can efficiently
perform sentiment analysis on big data sets. A technique that can categorize the text as positive, negative
and neutral in a fast and accurate manner. In the research, sentiment analysis was performed on a large
data set of tweets using Hadoop and the performance of the technique was measured in form of speed and
accuracy. The experimental results shows that the technique exhibits very good efficiency in handling big
sentiment data sets.
Keyword Extraction Based Summarization of Categorized Kannada Text Documents ijsc
Â
The internet has caused a humongous growth in the number of documents available online. Summaries of documents can help find the right information and are particularly effective when the document base is very large. Keywords are closely associated to a document as they reflect the document's content and act as indices for a given document. In this work, we present a method to produce extractive summaries of documents in the Kannada language, given number of sentences as limitation. The algorithm extracts key words from pre-categorized Kannada documents collected from online resources. We use two feature selection techniques for obtaining features from documents, then we combine scores obtained by GSS (Galavotti, Sebastiani, Simi) coefficients and IDF (Inverse Document Frequency) methods along with TF (Term Frequency) for extracting key words and later use these for summarization based on rank of the sentence. In the current implementation, a document from a given category is selected from our database and depending on the number of sentences given by the user, a summary is generated.
Sentiment analysis is an important current research area. The demand for sentiment analysis and classification is growing day by day; this paper presents a novel method to classify Urdu documents as previously no work recorded on sentiment classification for Urdu text. We consider the problem by determining whether the review or sentence is positive, negative or neutral. For the purpose we use two machine learning methods NaĂŻve Bayes and Support Vector Machines (SVM) . Firstly the documents are preprocessed and the sentiments features are extracted, then the polarity has been calculated, judged and classify through Machine learning methods.
A hybrid composite features based sentence level sentiment analyzerIAESIJAI
Â
Current lexica and machine learning based sentiment analysis approaches
still suffer from a two-fold limitation. First, manual lexicon construction and
machine training is time consuming and error-prone. Second, the
predictionâs accuracy entails sentences and their corresponding training text
should fall under the same domain. In this article, we experimentally
evaluate four sentiment classifiers, namely support vector machines (SVMs),
Naive Bayes (NB), logistic regression (LR) and random forest (RF). We
quantify the quality of each of these models using three real-world datasets
that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic
movie reviews. Specifically, we study the impact of a variety of natural
language processing (NLP) pipelines on the quality of the predicted
sentiment orientations. Additionally, we measure the impact of incorporating
lexical semantic knowledge captured by WordNet on expanding original
words in sentences. Findings demonstrate that the utilizing different NLP
pipelines and semantic relationships impacts the quality of the sentiment
analyzers. In particular, results indicate that coupling lemmatization and
knowledge-based n-gram features proved to produce higher accuracy results.
With this coupling, the accuracy of the SVM classifier has improved to
90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three
other classifiers.
Text Mining is the technique that helps users to find out useful information from a large amount of text documents on the web or database. Most popular text mining and classification methods have adopted term-based approaches. The term based approaches and the pattern-based method describing user preferences. This review paper analyse how the text mining work on the three level i.e sentence level, document level and feature level. In this paper we review the related work which is previously done. This paper also demonstrated that what are the problems arise while doing text mining done at the feature level. This paper presents the technique to text mining for the compound sentences.
This paper proposes Natural language based Discourse Analysis method used for extracting
information from the news article of different domain. The Discourse analysis used the Rhetorical Structure
theory which is used to find coherent group of text which are most prominent for extracting information
from text. RST theory used the Nucleus- Satellite concept for finding most prominent text from the text
document. After Discourse analysis the text analysis has been done for extracting domain related object
and relates this object. For extracting the information knowledge based system has been used which
consist of domain dictionary .The domain dictionary has a bag of words for domain. The system is
evaluated according gold-of-art analysis and human decision for extracted information.
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
Â
Natural Language Processing (NLP) techniques are one of the most used techniques in the field of computer applications. It has become one of the vast and advanced techniques. Language is the means of communication or interaction among humans and in present scenario when everything is dependent on machine or everything is computerized, communication between computer and human has become a necessity. To fulfill this necessity NLP has been emerged as the means of interaction which narrows the gap between machines (computers) and humans. It was evolved from the study of linguistics which was passed through the Turing test to check the similarity between data but it was limited to small set of data. Later on various algorithms were developed along with the concept of AI (Artificial Intelligence) for the successful execution of NLP. In this paper, the main emphasis is on the different techniques of NLP which have been developed till now, their applications and the comparison of all those techniques on different parameters.
Mining Opinion Features in Customer ReviewsIJCERT JOURNAL
Â
Now days, E-commerce systems have become extremely important. Large numbers of customers are choosing online shopping because of its convenience, reliability, and cost. Client generated information and especially item reviews are significant sources of data for consumers to make informed buy choices and for makers to keep track of customerâs opinions. It is difficult for customers to make purchasing decisions based on only pictures and short product descriptions. On the other hand, mining product reviews has become a hot research topic and prior researches are mostly based on pre-specified product features to analyse the opinions. Natural Language Processing (NLP) techniques such as NLTK for Python can be applied to raw customer reviews and keywords can be extracted. This paper presents a survey on the techniques used for designing software to mine opinion features in reviews. Elven IEEE papers are selected and a comparison is made between them. These papers are representative of the significant improvements in opinion mining in the past decade.
A scalable, lexicon based technique for sentiment analysisijfcstjournal
Â
Rapid increase in the volume of sentiment rich social media on the web has resulted in an increased
interest among researchers regarding Sentimental Analysis and opinion mining. However, with so much
social media available on the web, sentiment analysis is now considered as a big data task. Hence the
conventional sentiment analysis approaches fails to efficiently handle the vast amount of sentiment data
available now a days. The main focus of the research was to find such a technique that can efficiently
perform sentiment analysis on big data sets. A technique that can categorize the text as positive, negative
and neutral in a fast and accurate manner. In the research, sentiment analysis was performed on a large
data set of tweets using Hadoop and the performance of the technique was measured in form of speed and
accuracy. The experimental results shows that the technique exhibits very good efficiency in handling big
sentiment data sets.
Keyword Extraction Based Summarization of Categorized Kannada Text Documents ijsc
Â
The internet has caused a humongous growth in the number of documents available online. Summaries of documents can help find the right information and are particularly effective when the document base is very large. Keywords are closely associated to a document as they reflect the document's content and act as indices for a given document. In this work, we present a method to produce extractive summaries of documents in the Kannada language, given number of sentences as limitation. The algorithm extracts key words from pre-categorized Kannada documents collected from online resources. We use two feature selection techniques for obtaining features from documents, then we combine scores obtained by GSS (Galavotti, Sebastiani, Simi) coefficients and IDF (Inverse Document Frequency) methods along with TF (Term Frequency) for extracting key words and later use these for summarization based on rank of the sentence. In the current implementation, a document from a given category is selected from our database and depending on the number of sentences given by the user, a summary is generated.
Sentiment analysis is an important current research area. The demand for sentiment analysis and classification is growing day by day; this paper presents a novel method to classify Urdu documents as previously no work recorded on sentiment classification for Urdu text. We consider the problem by determining whether the review or sentence is positive, negative or neutral. For the purpose we use two machine learning methods NaĂŻve Bayes and Support Vector Machines (SVM) . Firstly the documents are preprocessed and the sentiments features are extracted, then the polarity has been calculated, judged and classify through Machine learning methods.
A hybrid composite features based sentence level sentiment analyzerIAESIJAI
Â
Current lexica and machine learning based sentiment analysis approaches
still suffer from a two-fold limitation. First, manual lexicon construction and
machine training is time consuming and error-prone. Second, the
predictionâs accuracy entails sentences and their corresponding training text
should fall under the same domain. In this article, we experimentally
evaluate four sentiment classifiers, namely support vector machines (SVMs),
Naive Bayes (NB), logistic regression (LR) and random forest (RF). We
quantify the quality of each of these models using three real-world datasets
that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic
movie reviews. Specifically, we study the impact of a variety of natural
language processing (NLP) pipelines on the quality of the predicted
sentiment orientations. Additionally, we measure the impact of incorporating
lexical semantic knowledge captured by WordNet on expanding original
words in sentences. Findings demonstrate that the utilizing different NLP
pipelines and semantic relationships impacts the quality of the sentiment
analyzers. In particular, results indicate that coupling lemmatization and
knowledge-based n-gram features proved to produce higher accuracy results.
With this coupling, the accuracy of the SVM classifier has improved to
90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three
other classifiers.
Development of an intelligent information resource model based on modern na...IJECEIAES
Â
Currently, there is an avalanche-like increase in the need for automatic text processing, respectively, new effective methods and tools for processing texts in natural language are emerging. Although these methods, tools and resources are mostly presented on the internet, many of them remain inaccessible to developers, since they are not systematized, distributed in various directories or on separate sites of both humanitarian and technical orientation. All this greatly complicates their search and practical use in conducting research in computational linguistics and developing applied systems for natural text processing. This paper is aimed at solving the need described above. The paper goal is to develop model of an intelligent information resource based on modern methods of natural language processing (IIR NLP). The main goal of IIR NLP is to render convenient valuable access for specialists in the field of computational linguistics. The originality of our proposed approach is that the developed ontology of the subject area âNLPâ will be used to systematize all the above knowledge, data, information resources and organize meaningful access to them, and semantic web standards and technology tools will be used as a software basis.
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...IJECEIAES
Â
The heavy involvement of the Arabic internet users resulted in spreading data written in the Arabic language and creating a vast research area regarding natural language processing (NLP). Sentiment analysis is a growing field of research that is of great importance to everyone considering the high added potential for decision-making and predicting upcoming actions using the texts produced in social networks. Arabic used in microblogging websites, especially Twitter, is highly informal. It is not compliant with neither standards nor spelling regulations making it quite challenging for automatic machine-learning techniques. In this paperâs scope, we propose a new approach based on AutoML methods to improve the efficiency of the sentiment classification process for dialectal Arabic. This approach was validated through benchmarks testing on three different datasets that represent three vernacular forms of Arabic. The obtained results show that the presented framework has significantly increased accuracy than similar works in the literature.
A simplified classification computational model of opinion mining using deep ...IJECEIAES
Â
Opinion and attempts to develop an automated system to determine people's viewpoints towards various units such as events, topics, products, services, organizations, individuals, and issues. Opinion analysis from the natural text can be regarded as a text and sequence classification problem which poses high feature space due to the involvement of dynamic information that needs to be addressed precisely. This paper introduces effective modelling of human opinion analysis from social media data subjected to complex and dynamic content. Firstly, a customized preprocessing operation based on natural language processing mechanisms as an effective data treatment process towards building quality-aware input data. On the other hand, a suitable deep learning technique, bidirectional long short term-memory (Bi-LSTM), is implemented for the opinion classification, followed by a data modelling process where truncating and padding is performed manually to achieve better data generalization in the training phase. The design and development of the model are carried on the MATLAB tool. The performance analysis has shown that the proposed system offers a significant advantage in terms of classification accuracy and less training time due to a reduction in the feature space by the data treatment operation.
Data mining is the knowledge discovery in databases and the gaol is to extract patterns and knowledge from large amounts of data. The important term in data mining is text mining. Text mining extracts the quality information highly from text. Statistical pattern learning is used to high quality information. High âquality in text mining defines the combinations of relevance, novelty and interestingness. Tasks in text mining are text categorization, text clustering, entity extraction and sentiment analysis. Applications of natural language processing and analytical methods are highly preferred to turn text into data for analysis. This survey is about the various techniques and algorithms used in text mining.
A statistical model for gist generation a case study on hindi news articleIJDKP
Â
Every day, huge number of news articles are reported and disseminated on the Internet. By generating gist
of an article, reader can go through the main topics instead of reading the whole article as it takes much
time for reader to read the entire content of the article. An ideal system would understand the document
and generate the appropriate theme(s) directly from the results of the understanding. In the absence of
natural language understanding system, it is required to design an appropriate system. Gist generation is a
difficult task because it requires both maximizing text content in short summary and maintains
grammaticality of the text. In this paper we present a statistical approach to generate a gist of a Hindi
news article. The experimental results are evaluated using the standard measures such as precision, recall
and F1 measure for different statistical models and their combination on the article before pre-processing
and after pre-processing.
Data mining is the knowledge discovery in databases and the gaol is to extract patterns and knowledge from
large amounts of data. The important term in data mining is text mining. Text mining extracts the quality
information highly from text. Statistical pattern learning is used to high quality information. High âquality in
text mining defines the combinations of relevance, novelty and interestingness. Tasks in text mining are text
categorization, text clustering, entity extraction and sentiment analysis. Applications of natural language
processing and analytical methods are highly preferred to turn
The Process of Information extraction through Natural Language ProcessingWaqas Tariq
Â
Information Retrieval (IR) is the discipline that deals with retrieval of unstructured data, especially textual documents, in response to a query or topic statement, which may itself be unstructured, e.g., a sentence or even another document, or which may be structured, e.g., a boolean expression. The need for effective methods of automated IR has grown in importance because of the tremendous explosion in the amount of unstructured data, both internal, corporate document collections, and the immense and growing number of document sources on the Internet.. The topics covered include: formulation of structured and unstructured queries and topic statements, indexing (including term weighting) of document collections, methods for computing the similarity of queries and documents, classification and routing of documents in an incoming stream to users on the basis of topic or need statements, clustering of document collections on the basis of language or topic, and statistical, probabilistic, and semantic methods of analyzing and retrieving documents. Information extraction from text has therefore been pursued actively as an attempt to present knowledge from published material in a computer readable format. An automated extraction tool would not only save time and efforts, but also pave way to discover hitherto unknown information implicitly conveyed in this paper. Work in this area has focused on extracting a wide range of information such as chromosomal location of genes, protein functional information, associating genes by functional relevance and relationships between entities of interest. While clinical records provide a semi-structured, technically rich data source for mining information, the publications, in their unstructured format pose a greater challenge, addressed by many approaches.
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...mathsjournal
Â
For one dimensional homogeneous, isotropic aquifer, without accretion the governing Boussinesq
equation under Dupuit assumptions is a nonlinear partial differential equation. In the present paper
approximate analytical solution of nonlinear Boussinesq equation is obtained using Homotopy
perturbation transform method(HPTM). The solution is compared with the exact solution. The
comparison shows that the HPTM is efficient, accurate and reliable. The analysis of two important aquifer
parameters namely viz. specific yield and hydraulic conductivity is studied to see the effects on the height
of water table. The results resemble well with the physical phenomena.
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSISmlaij
Â
Sentiment analysis and Opinion mining has emerged as a popular and efficient technique for information retrieval and web data analysis. The exponential growth of the user generated content has opened new horizons for research in the field of sentiment analysis. This paper proposes a model for sentiment analysis of movie reviews using a combination of natural language processing and machine learning approaches. Firstly, different data pre-processing schemes are applied on the dataset. Secondly, the behaviour of twoclassifiers, Naive Bayes and SVM, is investigated in combination with different feature selection schemes to
obtain the results for sentiment analysis. Thirdly, the proposed model for sentiment analysis is extended to
obtain the results for higher order n-grams.
Due to an exponential growth in the generation of textual data, the need for tools and mechanisms for automatic summarization of documents has become very critical. Text documents are vital to any organization's day-to-day working and as such, long documents often hamper trivial work. Therefore, an automatic summarizer is vital towards reducing human effort. Text summarization is an important activity in the analysis of a high volume text documents and is currently a major research topic in Natural Language Processing. It is the process of generation of the summary of input text by extracting the representative sentences from it. In this project, we present a novel technique for generating the summarization of domain specific text by using Semantic Analysis for text summarization, which is a subset of Natural Language Processing.
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet IJECEIAES
Â
Arabic Sentiment analysis research field has been progressing in a slow pace compared to English and other languages. In addition to that most of the contributions are based on using supervised machine learning algorithms while comparing the performance of different classifiers with different selected stylistic and syntactic features. In this paper, we presented a novel framework for using the Concept-level sentiment analysis approach which classifies text based on their semantics rather than syntactic features. Moreover, we provided a lexicon dataset of around 69 k unique concepts that covers multi-domain reviews collected from the internet. We also tested the lexicon on a test sample from the dataset it was collected from and obtained an accuracy of 70%. The lexicon has been made publicly available for scientific purposes.
The sarcasm detection with the method of logistic regressionEditorIJAERD
Â
The prediction analysis is approach which may predict future possibilities. This research work is based on the
sarcasm detection from the text data. In the previous time SVM classification is applied for the sarcasm detection. The SVM
classifier classifies data based on the hyper plane which give low accuracy. To improve accuracy for sarcasm detection
logistic regression is applied during this work. The existing and proposed techniques are implemented in python and results
are analysed in terms of accuracy, execution time. The proposed approach has high accuracy and low execution time as
compared to SVM classifier for sarcasm detection.
In recent years the growth of digital data is increasing dramatically, knowledge discovery and data mining have attracted immense attention with coming up need for turning such data into useful information and knowledge. Keyword extraction is considered an essential task in natural language processing (NLP) that facilitates mapping of documents to a concise set of representative single and multi-word phrases. This paper investigates using of Word2Vec and Decision Tree for keywords extraction from textual documents. The Sem-Eval (2010) dataset is used as a main input for the proposed study. The words are represented by vectors with Word2Vec technique following applying pre-processing operations on the dataset. This method is based on word similarity between candidate keywords from both collecting keywords for each label and one sample from the same label. An appropriate threshold has been determined by which the percentages that exceed this threshold are exported to the Decision Tree in order to consider an appropriate classification to be taken on the text document.
Some similarity measurements were used for the classification process. The efficiency and accuracy of the algorithm was measured in the process of classification using precision, recall and F-score rates. The obtained results indicated that using of vector representation for each keyword is an effective way to identify the most similar words, so that the opportunity to recognize the correct classification of the document increases. When using word2Vec CBOW the result of F-Score was 64% with the Gini method and WordNet Lemmatizer. Meanwhile, when using Word2Vec SG the result of F-Score was 82% with Gini Index and English Porter Stemming which considered the highest ratio for all our experiments.
http://sites.google.com/site/ijcsis/
https://google.academia.edu/JournalofComputerScience
https://www.linkedin.com/in/ijcsis-research-publications-8b916516/
http://www.researcherid.com/rid/E-1319-2016
AN AUTOMATED MULTIPLE-CHOICE QUESTION GENERATION USING NATURAL LANGUAGE PROCE...kevig
Â
Automatic multiple-choice question generation (MCQG) is a useful yet challenging task in Natural Language
Processing (NLP). It is the task of automatic generation of correct and relevant questions from textual data.
Despite its usefulness, manually creating sizeable, meaningful and relevant questions is a time-consuming
and challenging task for teachers. In this paper, we present an NLP-based system for automatic MCQG for
Computer-Based Testing Examination (CBTE).We used NLP technique to extract keywords that are
important words in a given lesson material. To validate that the system is not perverse, five lesson materials
were used to check the effectiveness and efficiency of the system. The manually extracted keywords by the
teacher were compared to the auto-generated keywords and the result shows that the system was capable of
extracting keywords from lesson materials in setting examinable questions. This outcome is presented in a
user-friendly interface for easy accessibility.
Step-By-Step Guide to Essay Writing - ESL Buzz. Why We Learn English Essay | Vocabulary | Reading (Process). 007 English Essay Example Download Lovely Reflective Online Com .... Learning english 80 essays.
Development of an intelligent information resource model based on modern na...IJECEIAES
Â
Currently, there is an avalanche-like increase in the need for automatic text processing, respectively, new effective methods and tools for processing texts in natural language are emerging. Although these methods, tools and resources are mostly presented on the internet, many of them remain inaccessible to developers, since they are not systematized, distributed in various directories or on separate sites of both humanitarian and technical orientation. All this greatly complicates their search and practical use in conducting research in computational linguistics and developing applied systems for natural text processing. This paper is aimed at solving the need described above. The paper goal is to develop model of an intelligent information resource based on modern methods of natural language processing (IIR NLP). The main goal of IIR NLP is to render convenient valuable access for specialists in the field of computational linguistics. The originality of our proposed approach is that the developed ontology of the subject area âNLPâ will be used to systematize all the above knowledge, data, information resources and organize meaningful access to them, and semantic web standards and technology tools will be used as a software basis.
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...IJECEIAES
Â
The heavy involvement of the Arabic internet users resulted in spreading data written in the Arabic language and creating a vast research area regarding natural language processing (NLP). Sentiment analysis is a growing field of research that is of great importance to everyone considering the high added potential for decision-making and predicting upcoming actions using the texts produced in social networks. Arabic used in microblogging websites, especially Twitter, is highly informal. It is not compliant with neither standards nor spelling regulations making it quite challenging for automatic machine-learning techniques. In this paperâs scope, we propose a new approach based on AutoML methods to improve the efficiency of the sentiment classification process for dialectal Arabic. This approach was validated through benchmarks testing on three different datasets that represent three vernacular forms of Arabic. The obtained results show that the presented framework has significantly increased accuracy than similar works in the literature.
A simplified classification computational model of opinion mining using deep ...IJECEIAES
Â
Opinion and attempts to develop an automated system to determine people's viewpoints towards various units such as events, topics, products, services, organizations, individuals, and issues. Opinion analysis from the natural text can be regarded as a text and sequence classification problem which poses high feature space due to the involvement of dynamic information that needs to be addressed precisely. This paper introduces effective modelling of human opinion analysis from social media data subjected to complex and dynamic content. Firstly, a customized preprocessing operation based on natural language processing mechanisms as an effective data treatment process towards building quality-aware input data. On the other hand, a suitable deep learning technique, bidirectional long short term-memory (Bi-LSTM), is implemented for the opinion classification, followed by a data modelling process where truncating and padding is performed manually to achieve better data generalization in the training phase. The design and development of the model are carried on the MATLAB tool. The performance analysis has shown that the proposed system offers a significant advantage in terms of classification accuracy and less training time due to a reduction in the feature space by the data treatment operation.
Data mining is the knowledge discovery in databases and the gaol is to extract patterns and knowledge from large amounts of data. The important term in data mining is text mining. Text mining extracts the quality information highly from text. Statistical pattern learning is used to high quality information. High âquality in text mining defines the combinations of relevance, novelty and interestingness. Tasks in text mining are text categorization, text clustering, entity extraction and sentiment analysis. Applications of natural language processing and analytical methods are highly preferred to turn text into data for analysis. This survey is about the various techniques and algorithms used in text mining.
A statistical model for gist generation a case study on hindi news articleIJDKP
Â
Every day, huge number of news articles are reported and disseminated on the Internet. By generating gist
of an article, reader can go through the main topics instead of reading the whole article as it takes much
time for reader to read the entire content of the article. An ideal system would understand the document
and generate the appropriate theme(s) directly from the results of the understanding. In the absence of
natural language understanding system, it is required to design an appropriate system. Gist generation is a
difficult task because it requires both maximizing text content in short summary and maintains
grammaticality of the text. In this paper we present a statistical approach to generate a gist of a Hindi
news article. The experimental results are evaluated using the standard measures such as precision, recall
and F1 measure for different statistical models and their combination on the article before pre-processing
and after pre-processing.
Data mining is the knowledge discovery in databases and the gaol is to extract patterns and knowledge from
large amounts of data. The important term in data mining is text mining. Text mining extracts the quality
information highly from text. Statistical pattern learning is used to high quality information. High âquality in
text mining defines the combinations of relevance, novelty and interestingness. Tasks in text mining are text
categorization, text clustering, entity extraction and sentiment analysis. Applications of natural language
processing and analytical methods are highly preferred to turn
The Process of Information extraction through Natural Language ProcessingWaqas Tariq
Â
Information Retrieval (IR) is the discipline that deals with retrieval of unstructured data, especially textual documents, in response to a query or topic statement, which may itself be unstructured, e.g., a sentence or even another document, or which may be structured, e.g., a boolean expression. The need for effective methods of automated IR has grown in importance because of the tremendous explosion in the amount of unstructured data, both internal, corporate document collections, and the immense and growing number of document sources on the Internet.. The topics covered include: formulation of structured and unstructured queries and topic statements, indexing (including term weighting) of document collections, methods for computing the similarity of queries and documents, classification and routing of documents in an incoming stream to users on the basis of topic or need statements, clustering of document collections on the basis of language or topic, and statistical, probabilistic, and semantic methods of analyzing and retrieving documents. Information extraction from text has therefore been pursued actively as an attempt to present knowledge from published material in a computer readable format. An automated extraction tool would not only save time and efforts, but also pave way to discover hitherto unknown information implicitly conveyed in this paper. Work in this area has focused on extracting a wide range of information such as chromosomal location of genes, protein functional information, associating genes by functional relevance and relationships between entities of interest. While clinical records provide a semi-structured, technically rich data source for mining information, the publications, in their unstructured format pose a greater challenge, addressed by many approaches.
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...mathsjournal
Â
For one dimensional homogeneous, isotropic aquifer, without accretion the governing Boussinesq
equation under Dupuit assumptions is a nonlinear partial differential equation. In the present paper
approximate analytical solution of nonlinear Boussinesq equation is obtained using Homotopy
perturbation transform method(HPTM). The solution is compared with the exact solution. The
comparison shows that the HPTM is efficient, accurate and reliable. The analysis of two important aquifer
parameters namely viz. specific yield and hydraulic conductivity is studied to see the effects on the height
of water table. The results resemble well with the physical phenomena.
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSISmlaij
Â
Sentiment analysis and Opinion mining has emerged as a popular and efficient technique for information retrieval and web data analysis. The exponential growth of the user generated content has opened new horizons for research in the field of sentiment analysis. This paper proposes a model for sentiment analysis of movie reviews using a combination of natural language processing and machine learning approaches. Firstly, different data pre-processing schemes are applied on the dataset. Secondly, the behaviour of twoclassifiers, Naive Bayes and SVM, is investigated in combination with different feature selection schemes to
obtain the results for sentiment analysis. Thirdly, the proposed model for sentiment analysis is extended to
obtain the results for higher order n-grams.
Due to an exponential growth in the generation of textual data, the need for tools and mechanisms for automatic summarization of documents has become very critical. Text documents are vital to any organization's day-to-day working and as such, long documents often hamper trivial work. Therefore, an automatic summarizer is vital towards reducing human effort. Text summarization is an important activity in the analysis of a high volume text documents and is currently a major research topic in Natural Language Processing. It is the process of generation of the summary of input text by extracting the representative sentences from it. In this project, we present a novel technique for generating the summarization of domain specific text by using Semantic Analysis for text summarization, which is a subset of Natural Language Processing.
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet IJECEIAES
Â
Arabic Sentiment analysis research field has been progressing in a slow pace compared to English and other languages. In addition to that most of the contributions are based on using supervised machine learning algorithms while comparing the performance of different classifiers with different selected stylistic and syntactic features. In this paper, we presented a novel framework for using the Concept-level sentiment analysis approach which classifies text based on their semantics rather than syntactic features. Moreover, we provided a lexicon dataset of around 69 k unique concepts that covers multi-domain reviews collected from the internet. We also tested the lexicon on a test sample from the dataset it was collected from and obtained an accuracy of 70%. The lexicon has been made publicly available for scientific purposes.
The sarcasm detection with the method of logistic regressionEditorIJAERD
Â
The prediction analysis is approach which may predict future possibilities. This research work is based on the
sarcasm detection from the text data. In the previous time SVM classification is applied for the sarcasm detection. The SVM
classifier classifies data based on the hyper plane which give low accuracy. To improve accuracy for sarcasm detection
logistic regression is applied during this work. The existing and proposed techniques are implemented in python and results
are analysed in terms of accuracy, execution time. The proposed approach has high accuracy and low execution time as
compared to SVM classifier for sarcasm detection.
In recent years the growth of digital data is increasing dramatically, knowledge discovery and data mining have attracted immense attention with coming up need for turning such data into useful information and knowledge. Keyword extraction is considered an essential task in natural language processing (NLP) that facilitates mapping of documents to a concise set of representative single and multi-word phrases. This paper investigates using of Word2Vec and Decision Tree for keywords extraction from textual documents. The Sem-Eval (2010) dataset is used as a main input for the proposed study. The words are represented by vectors with Word2Vec technique following applying pre-processing operations on the dataset. This method is based on word similarity between candidate keywords from both collecting keywords for each label and one sample from the same label. An appropriate threshold has been determined by which the percentages that exceed this threshold are exported to the Decision Tree in order to consider an appropriate classification to be taken on the text document.
Some similarity measurements were used for the classification process. The efficiency and accuracy of the algorithm was measured in the process of classification using precision, recall and F-score rates. The obtained results indicated that using of vector representation for each keyword is an effective way to identify the most similar words, so that the opportunity to recognize the correct classification of the document increases. When using word2Vec CBOW the result of F-Score was 64% with the Gini method and WordNet Lemmatizer. Meanwhile, when using Word2Vec SG the result of F-Score was 82% with Gini Index and English Porter Stemming which considered the highest ratio for all our experiments.
http://sites.google.com/site/ijcsis/
https://google.academia.edu/JournalofComputerScience
https://www.linkedin.com/in/ijcsis-research-publications-8b916516/
http://www.researcherid.com/rid/E-1319-2016
AN AUTOMATED MULTIPLE-CHOICE QUESTION GENERATION USING NATURAL LANGUAGE PROCE...kevig
Â
Automatic multiple-choice question generation (MCQG) is a useful yet challenging task in Natural Language
Processing (NLP). It is the task of automatic generation of correct and relevant questions from textual data.
Despite its usefulness, manually creating sizeable, meaningful and relevant questions is a time-consuming
and challenging task for teachers. In this paper, we present an NLP-based system for automatic MCQG for
Computer-Based Testing Examination (CBTE).We used NLP technique to extract keywords that are
important words in a given lesson material. To validate that the system is not perverse, five lesson materials
were used to check the effectiveness and efficiency of the system. The manually extracted keywords by the
teacher were compared to the auto-generated keywords and the result shows that the system was capable of
extracting keywords from lesson materials in setting examinable questions. This outcome is presented in a
user-friendly interface for easy accessibility.
Similar to A Review Of Text Mining Techniques And Applications (20)
Step-By-Step Guide to Essay Writing - ESL Buzz. Why We Learn English Essay | Vocabulary | Reading (Process). 007 English Essay Example Download Lovely Reflective Online Com .... Learning english 80 essays.
"Apocalypto" Movie Review Essay Paper Example - PHDessay.com. ANTH Apocalypto Essay.pdf - Apocalypto is a film directed by renowned .... Apocalypto.
Embracing GenAI - A Strategic ImperativePeter Windle
Â
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Palestine last event orientationfvgnh .pptxRaedMohamed3
Â
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
A Strategic Approach: GenAI in EducationPeter Windle
Â
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
Â
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Â
Francesca Gottschalk from the OECDâs Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
Â
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasnât one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
Â
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Â
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
A Review Of Text Mining Techniques And Applications
1. 170
International Journal of Computer (IJC)
ISSN 2307-4523 (Print & Online)
Š Global Society of Scientific Research and Researchers
http://ijcjournal.org/
A Review of Text Mining Techniques and Applications
Kanak Sharmaa
*, Ashish Sharmab
, Dhananjay Joshic
, Nikhil Vyasd
, Arpit
Bapnae
a,b,d,e
B.Tech.(CE), NMIMS University, Maharashtra 425405, India.
c
Asst. Prof., CE Department, NMIMS University, Maharashtra 425405, India.
a
Email: sharmakanak33@gmail.com
Abstract
Due to the ever increasing rate at which information is generated, text mining and its automated analysis have
become the need of the hour. The paper discusses some of the developments in text mining applications,
primarily reviewing techniques in the classification, summarization and analysis of text, as advocated by
academia. The goal is, in essence, to ultimately turn unstructured text into useful data and information for
analysis using critical methods. We introduce the paper by introducing the concept of âtextual analysisâ similar
to text mining done using the analysis of Natural Language texts, their respective techniques in use and the open
source tools in use to do so. We survey varied topics that use NLP, and also expand the horizons of this domain
by devising new techniques for improving the efficiency even in limited amounts of data, improved accuracy,
new methods, novel approaches, and new application areas for it, and relating to text summarization and text
classification. Various text mining techniques used in text classification and summarization are reviewed,
followed by the application areas of text mining being worked upon by businesses. Finally, the paper concludes
by introducing âorganizational text miningâ and emphasizing the need for it.
Keywords: natural language processing; text mining; text classification; text summarization.
1. Introduction
NLP is defined as a domain of CS in which the algorithms and techniques are used to comprehend and create
natural language. Like every other limb of computer science, NLP has also been paired with Machine Learning
(ML), to automate classification and pattern discovery in electronic documents and other unstructured text
mined from various sources, probably for the best.
-----------------------------------------------------------------------
* Corresponding author.
2. International Journal of Computer (IJC) (2017) Volume 24, No 1, pp 170-176
171
Advanced algorithms, neat NLP techniques and humongous amounts of data have ensured that Natural
Language processing as a field of study is progressing rapidly, now more than ever. Text mining is the process
of extracting text from various sources, converting it into structured information, determining relationships
between them and subsequent analysis of those relationships between the lemmas for finding patterns, solving
problems and creating useful applications of the same. âTextual Analysisâ can be formulated as an umbrella
under which evaluation, use and applications of Natural language Processing and computational linguistics lie.
It encompasses text extraction [11], pre-processing, classification, summarization and a lot of other activities
that can be performed with NLP. Text mining is being done by various algorithms ranging from the widely cited
NaĂŻve Bayes to the relatively unknown techniques of back propagation in Artificial Neural Networks (ANN).
2. Text Mining
Text mining is the process of extracting invaluable information from text [15]. Text mining study is constantly
gaining more and more reputation recently because of accessibility of the growing variety of sources and count
of electronic documents. The resources of semi-structured and unstructured information in the world include the
WWW, news articles, biological records, governmental electronic repositories, online forums, digital libraries,
chat rooms, and electronic mail and blog repositories. Hence we can say that proper knowledge discovery from
these resources is a research area of some importance. The paper on Document classification methodology [1]
establishes the idea of attaining high accuracy in classifying documents. It focuses mainly on small training
data. It accomplishes this by compiling results from previous work under large data sets that use Bayesian
classification and statistical decision theory. The first half uses large data set and estimates the unknown class x
in the new document under the condition that the string of key words y` n` in the doc and the learning data
named doc `L are given where a loss function gives a binary output when decision function gives an estimate i.e.
1 or 0. In the next set, it works on small training data and documents occurring from different sources as a
means of estimating data using Dirichletian distribution. For the new classification method, accuracy is higher
than before when the quantity of documents in the training data was small, and is almost the same when the
training data is big, but parameters A1, A2 and A3 for the working of this method have to be taken heuristically.
Another research paper on Sentiment expression via emoticons [2], in which Hao Wang argues the idea of
emoticons as strong signal of sentiment on social media and the clarity they bring to sentiment polarity and
expression. First, to display the occurrence of emoticons, it discovers that out of 1.5 billion tweets, exactly
8,625,753 of emoticons were found. Four analyses are done to examine relationship between emoticons and the
backgrounds wherein they are used. The 1st study graphed recurrent ones. The next study inspected clusters of
words and the meaning conveyed by the emoticons. 3rd analyzed the emotion spread of texts before and after
smileys were deleted from text. The 4th one, showed the theory that deleting smileys in text affects emotion
arranging. The results established that only a few smileys are tough sentiment signals, and that a large group of
emoticons convey difficult sentiments therefore should be treated with caution. A paper by Shweta and Sonal
Patil [3] describes techniques for automatic marking of free-text responses using Natural Language Processing
they mention them as being prorated into three main categorizes: Firstly, the straight forward Statistical
Technique based on keyword matching. It lacks the ability to tackle problems on various fronts such as
3. International Journal of Computer (IJC) (2017) Volume 24, No 1, pp 170-176
172
synonyms, accounting for the order of words, or dealing with lexical variability. Information Extraction (IE)
Technique on the other hand, consists of getting structured information from free text in order to extract
dependencies between concepts breaking the text into concepts and their relationships and then comparing
dependencies against human experts to reach the decision. Finally, the Full Natural Language Processing
technique involves parsing of text and finding semantic meaning of text and finally comparing it with subject
text and assigning final scores. In Design of an Automated Essay Grading (AEG) system in the Indian Context
[4], the researchers deal with the problem of scoring systems also known as Automated essay grading systems to
automatically assess the answers of students in exams like TOEFL where students write essays which are
presently being assessed by both â AEG and a human and an average is taken. Dealing with linguistics has an
inherent problem that is really complex to deal with, multilingual contextual recognition. Currently, the systems
used for grading are essay grade systems which deal with pure English essays or ones printed in pure European
languages. We have 21 regional languages and pressure of these local languages, in English, is highly observed.
Newspapers in Hyderabad, India sometimes printâ âNow the time has come to say âalbidaâ (good bye) to
monsoonâ [4]. Due to influence of regional languages such as Hindi or Bangla on non-native English speakers
the consequence of TOEFL exams, has revealed lower scores alongside Indian students and other Asian students
as far as the Essay section is concerned. A review paper on text summarization defines transcript summarization
as a course of extracting or collecting significant information from unique text and presenting that information
in the shape of a synopsis. This paper is an effort to present the sight of text summarization from every facet in a
historical review paper format [5]. The method arranged for summarization varies from structured, for all time
being used to begin with, being the simplest to comprehend and intuitive to linguistic. Further it brings to light
that in India, multi linguistic techniques are being explored and work has been done, but presently it is in an
infancy state. This paper gives a theoretical sight of the present situation of study for transcript summarization
[9-10]. Another beautiful and informative paper on featured-based sentiment classification for hotel reviews
using Bayesian classification talks about facts and opinions and how sentiment analysis and text mining on the
data on the Internet, specifically in hotel reviews, can be used to classify positive and negative reviews [6]. The
paper talks in detail about using the following techniques to achieve the automated classification objective [8]
such as firstly, semantic orientation or synonym- based review classification, secondly, ML-based classification
using techniques such as kNN, SVM and NaĂŻve Bayes, followed by a third approach of using using NLP
techniques such as NER as POS tagging and finally JAPE rule which is essentially a set of pattern action rules.
2.1. Text Mining Techniques
Text Summarization and classification being the most popular applications of text mining, especially amongst
businesses, it is only fair that we discuss some of the techniques used to implement them.
2.1.1. Text Summarization
Various research papers talk about text summarization and its urgent need for business process automations and
intelligent systems implementations [9-10]. The main techniques of text summarization are abstractive and
extractive text summarization. Abstractive summarization generates summaries that are normally broadly
classified into two groupings, structure based approach and semantic based approach.
4. International Journal of Computer (IJC) (2017) Volume 24, No 1, pp 170-176
173
⢠Structured Based Approach: Structured based approach codes most vital substance or information from
the text during cognitive schemes such as patterns, extraction policy and other arrangements such as
hierarchy, ontology, and guide and body phrase makeup.
⢠Semantic Based Approach: In Semantic support approach on the other hand, semantic depiction of text
is used as input into natural language generation (NLG) system. This system focus on recognizing the
variety of noun and verb phrases by dispensation of linguistic data.
The extractive summarization way consists of choosing significant sentences, paragraphs etc. from the unique
text and concatenating them into smaller form. The significance of sentences is determined based on arithmetic
and linguistic features of sentences, frequently on the basis of priority assigned.
2.1.2. Text Classification
The different type of classification models are decision trees, neural networks (NN) and genetic algorithm (GA)
[7]. Classification using Decision Trees can be done by three major techniques:
⢠C4.5 Algorithm: Generating a classification decision tree for the given data set by recursively
partitioning of the given input data. The algorithm considers all the possible tests that can split the data
set and selects a test that gives the best information gain. It should be noted that the decision tree is
grown using DFS strategy.
⢠Sequential Decision Tree based Classification: A decision-tree model consisting of internal decision
leaves and nodes. It consists of tree induction and pruning. The inferred decision-tree is made further
robust and concise by removing all statistical dependencies on the original training data set.
⢠Synchronous Tree Construction Approach: using multi-processor architecture for fast and efficient
decision tree construction and expansion.
In classification using neural network, interlinked processing nodes are used for doing the classification. An
artificial neural network is a mathematical model inspired by biological neurons consisting of an interconnected
group of artificial neurons processing information using a connection-oriented approach to computation. We are
given a set of sample pairs and the aim is to find a function that matches the sample, that is, we wish to infer the
mapping implied by the data. A cost function is used, which is related to our mapping and the data and it
implicitly contains prior knowledge about the problem domain. The function must be such that it correctly
buckets or classifies all input data up to a certain degree of error. Neural networks can exist in two major forms:
⢠Multi-Layer Perceptron is a form of simple feed forward NN, also referred to as a MLP. The neurons
are stacked layer-wise with outputs always flowing toward the output layer. If only one layer exists, it
is called a perceptron.
⢠Back Propagation algorithm is a technique that accommodates weights in neural network by making
weight changes backwards from the output to the input nodes.
Classification using genetic algorithms works on the same basis as biological evolution works in species as
5. International Journal of Computer (IJC) (2017) Volume 24, No 1, pp 170-176
174
mentioned in theories on evolution. In Genetic Algorithm, the units or individuals are called chromosomes.
After the initial population is generated randomly, selection and permutation function is run so that the
termination criterion is finally reached. The selection operator is intended to improve the average âqualityâ of
the population by giving individuals of higher quality, a higher probability to be passed onto the next round of
selection and mating, as is the case with humans. Each execution, i.e. one loop is called a generation. The
quality of an individual is measured by a fitness function. Genetic Operators namely, mutation and crossover are
applied to generate offspring from the existing population. Genetic Algorithms need a termination criterion to
stop the complete process. If no âsignificantâ improvement is observed, in consecutive generations, the entire
process is stopped. The sufficiency criterion is decided by developer according to the problem domain.
2.2 Text Mining Applications
Text mining is now used in a wide array of research, business and government needs. Applications can be sorted
into a variety of categories by business function or analysis type. Classifying solutions in this manner, the
numerous currently used application categories include:
⢠Enterprise Business Intelligence: analyzing data to predict market trends and to improve enterprise
performance [12].
⢠Sentiment Analysis: It generally refers to the use of text analysis, natural language processing and
computational linguistics to find and extract subjective information from various sources regarding
behavioral sciences [2].
⢠Natural Language/Semantic Toolkit or Service: huge open source libraries such as the Stanford NLTK
[14] toolkit and Apache openNLP library provide the functionalities of NLP in a comprehensive
package ranging from features such as Part-of-Speech (POS) Tagging to Named Entity Recognition
(NER) to Lemmatization to Chunking and so on.
⢠Social media monitoring: Term frequency-inverse document frequency (TF-IDF) analysis and relative
normalized term frequency analysis are pretty common to identify the trending topics in social media
web sites such as Twitter and Facebook [13].
3. Conclusion
After reading, summarizing, categorizing and contemplating upon research work done in the field of Natural
Language Processing with primary focus on textual mining/ analysis we feel that the majority of research being
done is consumer focused. The textual data being collected is produced by consumers either on social media by
users, or by test-takers, in case of examinations. Hence, there should be more research focused on using text
mining and NLP techniques to analyze the hosts of this textual data, like Facebook or Twitter or TOEFL.
Various social media alerts are automatically generated by recommendation algorithms these companies
employ. And as far as examination question answer grading is concerned, an analysis of the question paper itself
could be done, this can be used to find out how efficient a system is, such as the Friendsâ postsâ
recommendation notifications given by Facebook could be improved via sentiment analysis of users posts or a
complete Bloomâs Taxonomical evaluation of major competitive exams like SAT, TOEFL, IELTS and GRE can
6. International Journal of Computer (IJC) (2017) Volume 24, No 1, pp 170-176
175
be done and the efficiency of them mapping to the careers of their respective test takers can be calculated.
Hence, what we would coin as âorganizational text miningâ is currently the need of the hour, whether it is
related to data produced by large social media corporations or other large organizations and bodies, what it says
about them as opposed to what it does about their users, needs to be focused on and researched with greater
interest and more attention.
References
[1] Yasunari Maeda, Hideki Yoshida, and Toshiyasu Matsushima. âDocument classification method with
small training data,â in Proc. ICCAS-SICE, 2009.
[2] Hao Wang and Jorge A. Castanon. âSentiment Expression via Emoticons on Social Mediaâ in Proc.
IEEE International Conference on Big Data, 2015.
[3] Shweta Patil and Sonal Patil. "Intelligent Tutoring System for Evaluating Student Performance in
Descriptive Answers Using Natural Language Processing." International Journal of Science and
Research, 2014.
[4] Siddhartha Ghosh and Dr. Sameen S Fatima. âDesign of an Automated Essay Grading (AEG) system in
Indian Context.â International Journal of Computer Application, vol.1, No.11, 2010.
[5] Deepali K. Gaikwad and C. Namrata Mahender. âA Review Paper on Text Summarizationâ.
International Journal of Advanced Research in Computer and Communication Engineering, Vol. 5,
Issue 3, Mar. 2016.
[6] Tushar Ghorpade and Lata Ragha. âFeatured Based Sentiment Classification for Hotel Reviews using
NLP and Bayesian Classificationâ presented at the International Conference on Communication,
Information & Computing Technology (ICCICT), Mumbai, India, Oct. 2012.
[7] Bhumika, Prof Sukhjit Singh Sehra and Prof Anand Nayyar. âA Review Paper On Algorithms Used For
Text Classificationâ. International Journal of Application or Innovation in Engineering & Management,
Vol. 2, Issue 3, March 2013.
[8] Mita K. Dalal and Mukesh A. Zaveri. âAutomatic Text Classification: A Technical Reviewâ, 2011.
[9] N. Moratanch and Dr. S. Chitrakala. âA Survey on Abstractive Text Summarizationâ, in Proc.
International Conference on Circuit, Power and Computing Technologies, 2016.
[10] Urmila Shrawankar and Kranti Wankhede. âConstruction of News Headline from Detailed News
Articleâ, 2016.
[11] Manju Khari, Amita Jain, Sonakshi Vij and Manoj Kumar. âAnalysis of Various Information Retrieval
7. International Journal of Computer (IJC) (2017) Volume 24, No 1, pp 170-176
176
Modelsâ, 2016.
[12] B. Azvine, Z. Cui, D.D. Nauck and B. Majeed. âReal Time Business Intelligence for the Adaptive
Enterpriseâ, 2006.
[13] James Benhardus. âStreaming Trend Detection in Twitterâ, 2013. Benhardus, James, and Jugal Kalita.
"Streaming trend detection in twitter." International Journal of Web Based Communities, pp. 122-139,
2013.
[14] Steven Bird. âNLTK: The Natural Language Toolkitâ, Proc. COLING/ACL on Interactive presentation
sessions, pp. 69-72, 2006.
[15] Chetan Botre, Saad Patel, Shrinivas Kunjir and Swapnil Shinde. âNoteMate - A Note Making System
Using OCR and Text Miningâ in International Journal of Advanced Research in Computer Science and
Software Engineering, Volume 5, Issue 3, Mar. 2015.