A number of benefits have been reported for computer-based assessments over traditional paper-based exams, both in terms of IT support for question development, reduced distribution and test administration costs, and automated support. Possible for the ranking. However, existing computerized assessment systems do not provide all kinds of questions, namely open questions that require writing solutions. To overcome the challenges of the existing, the objective of this work is to achieve an intelligent evaluation system (IES) responding to the problems identified, and which adapts to the different types of questions, especially open-ended questions of which the answer requires sentence writing or programming.
Seeds Affinity Propagation Based on Text ClusteringIJRES Journal
The objective is to find among all partitions of the data set, best publishing according to some quality measure. Affinity propagation is a low error, high speed, flexible, and remarkably simple clustering algorithm that may be used in forming teams of participants for business simulations and experiential exercises, and in organizing participant’s preferences for the parameters of simulations. This paper proposes an efficient Affinity Propagation algorithm that guarantees the same clustering result as the original algorithm after convergence. The heart of our approach is (1) to prune unnecessary message exchanges in the iterations and (2) to compute the convergence values of pruned messages after the iterations to determine clusters.
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
This article will introduce some approaches for improving text categorization models by integrating
previously imported ontologies. From the Reuters Corpus Volume I (RCV1) dataset, some categories very
similar in content and related to telecommunications, Internet and computer areas were selected for models
experiments. Several domain ontologies, covering these areas were built and integrated to categorization
models for their improvements.
A hybrid composite features based sentence level sentiment analyzerIAESIJAI
Current lexica and machine learning based sentiment analysis approaches
still suffer from a two-fold limitation. First, manual lexicon construction and
machine training is time consuming and error-prone. Second, the
prediction’s accuracy entails sentences and their corresponding training text
should fall under the same domain. In this article, we experimentally
evaluate four sentiment classifiers, namely support vector machines (SVMs),
Naive Bayes (NB), logistic regression (LR) and random forest (RF). We
quantify the quality of each of these models using three real-world datasets
that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic
movie reviews. Specifically, we study the impact of a variety of natural
language processing (NLP) pipelines on the quality of the predicted
sentiment orientations. Additionally, we measure the impact of incorporating
lexical semantic knowledge captured by WordNet on expanding original
words in sentences. Findings demonstrate that the utilizing different NLP
pipelines and semantic relationships impacts the quality of the sentiment
analyzers. In particular, results indicate that coupling lemmatization and
knowledge-based n-gram features proved to produce higher accuracy results.
With this coupling, the accuracy of the SVM classifier has improved to
90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three
other classifiers.
Review of Various Text Categorization Methodsiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Supreme court dialogue classification using machine learning models IJECEIAES
Legal classification models help lawyers identify the relevant documents required for a study. In this study, the focus is on sentence level classification. To be more precise, the work undertaken focuses on a conversation in the supreme court between the justice and other correspondents. In the study, both the naïve Bayes classifier and logistic regression are used to classify conversations at the sentence level. The performance is measured with the help of the area under the curve score. The study found that the model that was trained on a specific case yielded better results than a model that was trained on a larger number of conversations. Case specificity is found to be more crucial in gaining better results from the classifier.
Seeds Affinity Propagation Based on Text ClusteringIJRES Journal
The objective is to find among all partitions of the data set, best publishing according to some quality measure. Affinity propagation is a low error, high speed, flexible, and remarkably simple clustering algorithm that may be used in forming teams of participants for business simulations and experiential exercises, and in organizing participant’s preferences for the parameters of simulations. This paper proposes an efficient Affinity Propagation algorithm that guarantees the same clustering result as the original algorithm after convergence. The heart of our approach is (1) to prune unnecessary message exchanges in the iterations and (2) to compute the convergence values of pruned messages after the iterations to determine clusters.
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
This article will introduce some approaches for improving text categorization models by integrating
previously imported ontologies. From the Reuters Corpus Volume I (RCV1) dataset, some categories very
similar in content and related to telecommunications, Internet and computer areas were selected for models
experiments. Several domain ontologies, covering these areas were built and integrated to categorization
models for their improvements.
A hybrid composite features based sentence level sentiment analyzerIAESIJAI
Current lexica and machine learning based sentiment analysis approaches
still suffer from a two-fold limitation. First, manual lexicon construction and
machine training is time consuming and error-prone. Second, the
prediction’s accuracy entails sentences and their corresponding training text
should fall under the same domain. In this article, we experimentally
evaluate four sentiment classifiers, namely support vector machines (SVMs),
Naive Bayes (NB), logistic regression (LR) and random forest (RF). We
quantify the quality of each of these models using three real-world datasets
that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic
movie reviews. Specifically, we study the impact of a variety of natural
language processing (NLP) pipelines on the quality of the predicted
sentiment orientations. Additionally, we measure the impact of incorporating
lexical semantic knowledge captured by WordNet on expanding original
words in sentences. Findings demonstrate that the utilizing different NLP
pipelines and semantic relationships impacts the quality of the sentiment
analyzers. In particular, results indicate that coupling lemmatization and
knowledge-based n-gram features proved to produce higher accuracy results.
With this coupling, the accuracy of the SVM classifier has improved to
90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three
other classifiers.
Review of Various Text Categorization Methodsiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Supreme court dialogue classification using machine learning models IJECEIAES
Legal classification models help lawyers identify the relevant documents required for a study. In this study, the focus is on sentence level classification. To be more precise, the work undertaken focuses on a conversation in the supreme court between the justice and other correspondents. In the study, both the naïve Bayes classifier and logistic regression are used to classify conversations at the sentence level. The performance is measured with the help of the area under the curve score. The study found that the model that was trained on a specific case yielded better results than a model that was trained on a larger number of conversations. Case specificity is found to be more crucial in gaining better results from the classifier.
A simplified classification computational model of opinion mining using deep ...IJECEIAES
Opinion and attempts to develop an automated system to determine people's viewpoints towards various units such as events, topics, products, services, organizations, individuals, and issues. Opinion analysis from the natural text can be regarded as a text and sequence classification problem which poses high feature space due to the involvement of dynamic information that needs to be addressed precisely. This paper introduces effective modelling of human opinion analysis from social media data subjected to complex and dynamic content. Firstly, a customized preprocessing operation based on natural language processing mechanisms as an effective data treatment process towards building quality-aware input data. On the other hand, a suitable deep learning technique, bidirectional long short term-memory (Bi-LSTM), is implemented for the opinion classification, followed by a data modelling process where truncating and padding is performed manually to achieve better data generalization in the training phase. The design and development of the model are carried on the MATLAB tool. The performance analysis has shown that the proposed system offers a significant advantage in terms of classification accuracy and less training time due to a reduction in the feature space by the data treatment operation.
A scoring rubric for automatic short answer grading systemTELKOMNIKA JOURNAL
During the past decades, researches about automatic grading have become an interesting issue. These studies focuses on how to make machines are able to help human on assessing students’ learning outcomes. Automatic grading enables teachers to assess student's answers with more objective, consistent, and faster. Especially for essay model, it has two different types, i.e. long essay and short answer. Almost of the previous researches merely developed automatic essay grading (AEG) instead of automatic short answer grading (ASAG). This study aims to assess the sentence similarity of short answer to the questions and answers in Indonesian without any language semantic's tool. This research uses pre-processing steps consisting of case folding, tokenization, stemming, and stopword removal. The proposed approach is a scoring rubric obtained by measuring the similarity of sentences using the string-based similarity methods and the keyword matching process. The dataset used in this study consists of 7 questions, 34 alternative reference answers and 224 student’s answers. The experiment results show that the proposed approach is able to achieve a correlation value between 0.65419 up to 0.66383 at Pearson's correlation, with Mean Absolute Error (푀퐴퐸) value about 0.94994 until 1.24295. The proposed approach also leverages the correlation value and decreases the error value in each method.
A PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTSijsc
Classification is an important technique used in information retrieval. Supervised classification suffers
from certain limitations concerning the collection and labeling of the training dataset. When facing Multi-
Domain classification, multiple training datasets and classifiers are needed which is relatively difficult. In
this paper an unsupervised classification system is proposed that can manage the Multi-Domain
classification problem as well. It is a multi-domain system where each domain represented by an ontology.
A document is mapped on each ontology based on the weights of the mutual tokens between them with the
help of fuzzy sets, resulting in a mapping degree of the document with each domain. An experiment carried
out showing satisfying classification results with an improvement in the evaluation results of the proposed
system compared to Apache Lucene.
A Proposed Multi-Domain Approach for Automatic Classification of Text Documents ijsc
Classification is an important technique used in information retrieval. Supervised classification suffers from certain limitations concerning the collection and labeling of the training dataset. When facing MultiDomain classification, multiple training datasets and classifiers are needed which is relatively difficult. In this paper an unsupervised classification system is proposed that can manage the Multi-Domain classification problem as well. It is a multi-domain system where each domain represented by an ontology. A document is mapped on each ontology based on the weights of the mutual tokens between them with the help of fuzzy sets, resulting in a mapping degree of the document with each domain. An experiment carried out showing satisfying classification results with an improvement in the evaluation results of the proposed system compared to Apache Lucene.
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...csandit
Text mining and Text classification are the two pro
minent and challenging tasks in the field of
Machine learning. Text mining refers to the process
of deriving high quality and relevant
information from text, while Text classification de
als with the categorization of text documents
into different classes. The real challenge in these
areas is to address the problems like handling
large text corpora, similarity of words in text doc
uments, and association of text documents with
a subset of class categories. The feature extractio
n and classification of such text documents
require an efficient machine learning algorithm whi
ch performs automatic text classification.
This paper describes the classification of product
review documents as a multi-label
classification scenario and addresses the problem u
sing Structured Support Vector Machine.
The work also explains the flexibility and performan
ce of the proposed approach for e
fficient text classification.
Machine Learning Techniques with Ontology for Subjective Answer Evaluationijnlc
Computerized Evaluation of English Essays is performed using Machine learning techniques like Latent
Semantic Analysis (LSA), Generalized LSA, Bilingual Evaluation Understudy and Maximum Entropy.
Ontology, a concept map of domain knowledge, can enhance the performance of these techniques. Use of
Ontology makes the evaluation process holistic as presence of keywords, synonyms, the right word
combination and coverage of concepts can be checked. In this paper, the above mentioned techniques are
implemented both with and without Ontology and tested on common input data consisting of technical
answers of Computer Science. Domain Ontology of Computer Graphics is designed and developed. The
software used for implementation includes Java Programming Language and tools such as MATLAB,
Protégé, etc. Ten questions from Computer Graphics with sixty answers for each question are used for
testing. The results are analyzed and it is concluded that the results are more accurate with use of
Ontology.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
A novel approach for text extraction using effective pattern matching techniqueeSAT Journals
Abstract
There are many data mining techniques have been proposed for mining useful patterns from documents. Still, how to effectively use and update discovered patterns is open for future research , especially in the field of text mining. As most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy(words have multiple meanings) and synonymy(multiple words have same meaning). People have held hypothesis that pattern-based approaches should perform better than the term-based, but many experiments does not support this hypothesis. This paper presents an innovative and effective pattern discovery technique which includes the processes of pattern deploying and pattern matching, to improve the effective use of discovered patterns.
Keywords: Pattern Mining, Pattern Taxonomy Model, Inner Pattern Evolving, TF-IDF, NLP etc.
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Many of previous research have proven that the usage of rhetorical relations is capable to enhance many applications such as text summarization, question answering and natural language generation. This work proposes an approach that expands the benefit of rhetorical relations to address redundancy problem for cluster-based text summarization of multiple documents. We exploited rhetorical relations exist between sentences to group similar sentences into multiple clusters to identify themes of common information. The candidate summary were extracted from these clusters. Then, cluster-based text summarization is performed using Conditional Markov Random Walk Model to measure the saliency scores of the candidate summary. We evaluated our method by measuring the cohesion and separation of the clusters constructed by exploiting rhetorical relations and ROUGE score of generated summaries. The experimental result shows that our method performed well which shows promising potential of applying rhetorical relation in text clustering which benefits text summarization of multiple documents.
Most of the text classification problems are associated with multiple class labels and hence automatic text
classification is one of the most challenging and prominent research area. Text classification is the
problem of categorizing text documents into different classes. In the multi-label classification scenario,
each document is associated may have more than one label. The real challenge in the multi-label
classification is the labelling of large number of text documents with a subset of class categories. The
feature extraction and classification of such text documents require an efficient machine learning algorithm
which performs automatic text classification. This paper describes the multi-label classification of product
review documents using Structured Support Vector Machine.
EXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATIONIJNSA Journal
Many of previous research have proven that the usage of rhetorical relations is capable to enhance many applications such as text summarization, question answering and natural language generation. This work proposes an approach that expands the benefit of rhetorical relations to address redundancy problem for cluster-based text summarization of multiple documents. We exploited rhetorical relations exist between sentences to group similar sentences into multiple clusters to identify themes of common information. The candidate summary were extracted from these clusters. Then, cluster-based text summarization is performed using Conditional Markov Random Walk Model to measure the saliency scores of the candidate summary. We evaluated our method by measuring the cohesion and separation of the clusters constructed by exploiting rhetorical relations and ROUGE score of generated summaries. The experimental result shows that our method performed well which shows promising potential of applying rhetorical relation in text clustering which benefits text summarization of multiple documents.
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION cscpconf
The internet has caused a humongous growth in the amount of data available to the common
man. Summaries of documents can help find the right information and are particularly effective
when the document base is very large. Keywords are closely associated to a document as they
reflect the document's content and act as indexes for the given document. In this work, we
present a method to produce extractive summaries of documents in the Kannada language. The
algorithm extracts key words from pre-categorized Kannada documents collected from online
resources. We combine GSS (Galavotti, Sebastiani, Simi) coefficients and IDF (Inverse
Document Frequency) methods along with TF (Term Frequency) for extracting key words and
later use these for summarization. In the current implementation a document from a given category is selected from our database and depending on the number of sentences given by theuser, a summary is generated.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
In this study, machine learning is examined in relation to commercial machine learning's resilience to the COVID-19 pandemic-related crisis. Two approaches are used to assess the pandemic's impact on machine learning risk, as well as a method to prioritize sectors according to the crisis's potential negative consequences. I conducted the study to determine Santander machine learning's resilience. The data mining area offers prospects for COVID-19's future. A total of 13 machine learning demos were selected for its organization. The Hellweg strategy and the technique for order preference by similarity to ideal solution (TOPSIS) technique were utilized as direct request strategies. Parametric assessment of machine learning versatility in business was based on capital sufficiency, liquidity proportion, market benefits, and share in an arrangement of openings with a perceived disability, and affectability of machine learning's credit portfolio to monetary hazard. As a result of the COVID-19 pandemic, these enterprises were ranked according to their threat. Based on the findings of the research, machine learning worked the best for the pandemic. Meanwhile, machine learning suffered the most during the downturn. It can be seen, for example, in conversations about the impact of the pandemic on developing business sector soundness and managing financial framework solidity risk.
Agriculture has since become a major source of livelihood for Nigerians. It also accounts for over 85% of the total food consumed within her borders. The sector has maintained improved productivity and profitability via a concerted effort to address critical issues such as an unorganized regulatory system, lack of food safety data, no standards in agricultural produce, non-adaptation to precision farming, and non-harmony via inventory trace supports. This study proposes blockchain-based trace-support in a continued effort to ensure food quality, consumer safety, and trading of food assets. It uses the radio frequency identification (RFID) sensor to register and track livestocks, farms/farmers, and abattoir processes as well as provisions a databank to trace livestock data. Results show the model adequately perform about 1,101 transactions per seconds with a response time of 0.21 s for queries and 0.28 s for https pages respectively for 2,500 users. Also, it yields a slightly longer time of 0.32 s for queries and 0.38 s for https pages respectively with an increased 5,000 users via the world-state as stored in the blockchain’s hyper-fabric ledger. Overall, the framework can directly query and retrieve data without it traversing the whole ledger. This, in turn, improves the efficiency and effectiveness of the traceability system.
A simplified classification computational model of opinion mining using deep ...IJECEIAES
Opinion and attempts to develop an automated system to determine people's viewpoints towards various units such as events, topics, products, services, organizations, individuals, and issues. Opinion analysis from the natural text can be regarded as a text and sequence classification problem which poses high feature space due to the involvement of dynamic information that needs to be addressed precisely. This paper introduces effective modelling of human opinion analysis from social media data subjected to complex and dynamic content. Firstly, a customized preprocessing operation based on natural language processing mechanisms as an effective data treatment process towards building quality-aware input data. On the other hand, a suitable deep learning technique, bidirectional long short term-memory (Bi-LSTM), is implemented for the opinion classification, followed by a data modelling process where truncating and padding is performed manually to achieve better data generalization in the training phase. The design and development of the model are carried on the MATLAB tool. The performance analysis has shown that the proposed system offers a significant advantage in terms of classification accuracy and less training time due to a reduction in the feature space by the data treatment operation.
A scoring rubric for automatic short answer grading systemTELKOMNIKA JOURNAL
During the past decades, researches about automatic grading have become an interesting issue. These studies focuses on how to make machines are able to help human on assessing students’ learning outcomes. Automatic grading enables teachers to assess student's answers with more objective, consistent, and faster. Especially for essay model, it has two different types, i.e. long essay and short answer. Almost of the previous researches merely developed automatic essay grading (AEG) instead of automatic short answer grading (ASAG). This study aims to assess the sentence similarity of short answer to the questions and answers in Indonesian without any language semantic's tool. This research uses pre-processing steps consisting of case folding, tokenization, stemming, and stopword removal. The proposed approach is a scoring rubric obtained by measuring the similarity of sentences using the string-based similarity methods and the keyword matching process. The dataset used in this study consists of 7 questions, 34 alternative reference answers and 224 student’s answers. The experiment results show that the proposed approach is able to achieve a correlation value between 0.65419 up to 0.66383 at Pearson's correlation, with Mean Absolute Error (푀퐴퐸) value about 0.94994 until 1.24295. The proposed approach also leverages the correlation value and decreases the error value in each method.
A PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTSijsc
Classification is an important technique used in information retrieval. Supervised classification suffers
from certain limitations concerning the collection and labeling of the training dataset. When facing Multi-
Domain classification, multiple training datasets and classifiers are needed which is relatively difficult. In
this paper an unsupervised classification system is proposed that can manage the Multi-Domain
classification problem as well. It is a multi-domain system where each domain represented by an ontology.
A document is mapped on each ontology based on the weights of the mutual tokens between them with the
help of fuzzy sets, resulting in a mapping degree of the document with each domain. An experiment carried
out showing satisfying classification results with an improvement in the evaluation results of the proposed
system compared to Apache Lucene.
A Proposed Multi-Domain Approach for Automatic Classification of Text Documents ijsc
Classification is an important technique used in information retrieval. Supervised classification suffers from certain limitations concerning the collection and labeling of the training dataset. When facing MultiDomain classification, multiple training datasets and classifiers are needed which is relatively difficult. In this paper an unsupervised classification system is proposed that can manage the Multi-Domain classification problem as well. It is a multi-domain system where each domain represented by an ontology. A document is mapped on each ontology based on the weights of the mutual tokens between them with the help of fuzzy sets, resulting in a mapping degree of the document with each domain. An experiment carried out showing satisfying classification results with an improvement in the evaluation results of the proposed system compared to Apache Lucene.
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...csandit
Text mining and Text classification are the two pro
minent and challenging tasks in the field of
Machine learning. Text mining refers to the process
of deriving high quality and relevant
information from text, while Text classification de
als with the categorization of text documents
into different classes. The real challenge in these
areas is to address the problems like handling
large text corpora, similarity of words in text doc
uments, and association of text documents with
a subset of class categories. The feature extractio
n and classification of such text documents
require an efficient machine learning algorithm whi
ch performs automatic text classification.
This paper describes the classification of product
review documents as a multi-label
classification scenario and addresses the problem u
sing Structured Support Vector Machine.
The work also explains the flexibility and performan
ce of the proposed approach for e
fficient text classification.
Machine Learning Techniques with Ontology for Subjective Answer Evaluationijnlc
Computerized Evaluation of English Essays is performed using Machine learning techniques like Latent
Semantic Analysis (LSA), Generalized LSA, Bilingual Evaluation Understudy and Maximum Entropy.
Ontology, a concept map of domain knowledge, can enhance the performance of these techniques. Use of
Ontology makes the evaluation process holistic as presence of keywords, synonyms, the right word
combination and coverage of concepts can be checked. In this paper, the above mentioned techniques are
implemented both with and without Ontology and tested on common input data consisting of technical
answers of Computer Science. Domain Ontology of Computer Graphics is designed and developed. The
software used for implementation includes Java Programming Language and tools such as MATLAB,
Protégé, etc. Ten questions from Computer Graphics with sixty answers for each question are used for
testing. The results are analyzed and it is concluded that the results are more accurate with use of
Ontology.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
A novel approach for text extraction using effective pattern matching techniqueeSAT Journals
Abstract
There are many data mining techniques have been proposed for mining useful patterns from documents. Still, how to effectively use and update discovered patterns is open for future research , especially in the field of text mining. As most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy(words have multiple meanings) and synonymy(multiple words have same meaning). People have held hypothesis that pattern-based approaches should perform better than the term-based, but many experiments does not support this hypothesis. This paper presents an innovative and effective pattern discovery technique which includes the processes of pattern deploying and pattern matching, to improve the effective use of discovered patterns.
Keywords: Pattern Mining, Pattern Taxonomy Model, Inner Pattern Evolving, TF-IDF, NLP etc.
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Many of previous research have proven that the usage of rhetorical relations is capable to enhance many applications such as text summarization, question answering and natural language generation. This work proposes an approach that expands the benefit of rhetorical relations to address redundancy problem for cluster-based text summarization of multiple documents. We exploited rhetorical relations exist between sentences to group similar sentences into multiple clusters to identify themes of common information. The candidate summary were extracted from these clusters. Then, cluster-based text summarization is performed using Conditional Markov Random Walk Model to measure the saliency scores of the candidate summary. We evaluated our method by measuring the cohesion and separation of the clusters constructed by exploiting rhetorical relations and ROUGE score of generated summaries. The experimental result shows that our method performed well which shows promising potential of applying rhetorical relation in text clustering which benefits text summarization of multiple documents.
Most of the text classification problems are associated with multiple class labels and hence automatic text
classification is one of the most challenging and prominent research area. Text classification is the
problem of categorizing text documents into different classes. In the multi-label classification scenario,
each document is associated may have more than one label. The real challenge in the multi-label
classification is the labelling of large number of text documents with a subset of class categories. The
feature extraction and classification of such text documents require an efficient machine learning algorithm
which performs automatic text classification. This paper describes the multi-label classification of product
review documents using Structured Support Vector Machine.
EXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATIONIJNSA Journal
Many of previous research have proven that the usage of rhetorical relations is capable to enhance many applications such as text summarization, question answering and natural language generation. This work proposes an approach that expands the benefit of rhetorical relations to address redundancy problem for cluster-based text summarization of multiple documents. We exploited rhetorical relations exist between sentences to group similar sentences into multiple clusters to identify themes of common information. The candidate summary were extracted from these clusters. Then, cluster-based text summarization is performed using Conditional Markov Random Walk Model to measure the saliency scores of the candidate summary. We evaluated our method by measuring the cohesion and separation of the clusters constructed by exploiting rhetorical relations and ROUGE score of generated summaries. The experimental result shows that our method performed well which shows promising potential of applying rhetorical relation in text clustering which benefits text summarization of multiple documents.
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION cscpconf
The internet has caused a humongous growth in the amount of data available to the common
man. Summaries of documents can help find the right information and are particularly effective
when the document base is very large. Keywords are closely associated to a document as they
reflect the document's content and act as indexes for the given document. In this work, we
present a method to produce extractive summaries of documents in the Kannada language. The
algorithm extracts key words from pre-categorized Kannada documents collected from online
resources. We combine GSS (Galavotti, Sebastiani, Simi) coefficients and IDF (Inverse
Document Frequency) methods along with TF (Term Frequency) for extracting key words and
later use these for summarization. In the current implementation a document from a given category is selected from our database and depending on the number of sentences given by theuser, a summary is generated.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
In this study, machine learning is examined in relation to commercial machine learning's resilience to the COVID-19 pandemic-related crisis. Two approaches are used to assess the pandemic's impact on machine learning risk, as well as a method to prioritize sectors according to the crisis's potential negative consequences. I conducted the study to determine Santander machine learning's resilience. The data mining area offers prospects for COVID-19's future. A total of 13 machine learning demos were selected for its organization. The Hellweg strategy and the technique for order preference by similarity to ideal solution (TOPSIS) technique were utilized as direct request strategies. Parametric assessment of machine learning versatility in business was based on capital sufficiency, liquidity proportion, market benefits, and share in an arrangement of openings with a perceived disability, and affectability of machine learning's credit portfolio to monetary hazard. As a result of the COVID-19 pandemic, these enterprises were ranked according to their threat. Based on the findings of the research, machine learning worked the best for the pandemic. Meanwhile, machine learning suffered the most during the downturn. It can be seen, for example, in conversations about the impact of the pandemic on developing business sector soundness and managing financial framework solidity risk.
Agriculture has since become a major source of livelihood for Nigerians. It also accounts for over 85% of the total food consumed within her borders. The sector has maintained improved productivity and profitability via a concerted effort to address critical issues such as an unorganized regulatory system, lack of food safety data, no standards in agricultural produce, non-adaptation to precision farming, and non-harmony via inventory trace supports. This study proposes blockchain-based trace-support in a continued effort to ensure food quality, consumer safety, and trading of food assets. It uses the radio frequency identification (RFID) sensor to register and track livestocks, farms/farmers, and abattoir processes as well as provisions a databank to trace livestock data. Results show the model adequately perform about 1,101 transactions per seconds with a response time of 0.21 s for queries and 0.28 s for https pages respectively for 2,500 users. Also, it yields a slightly longer time of 0.32 s for queries and 0.38 s for https pages respectively with an increased 5,000 users via the world-state as stored in the blockchain’s hyper-fabric ledger. Overall, the framework can directly query and retrieve data without it traversing the whole ledger. This, in turn, improves the efficiency and effectiveness of the traceability system.
Diabetic retinopathy (DR) is one of the most common causes of blindness. The necessity for a robust and automated DR screening system for regular examination has long been recognized in order to identify DR at an early stage. In this paper, an embedded DR diagnosis system based on convolutional neural networks (CNNs) has been proposed to assess the proper stage of DR. We coupled the power of CNN with transfer learning to design our model based on state-of-the-art architecture. We preprocessed the input data, which is color fundus photography, to reduce undesirable noise in the image. After training many models on the dataset, we chose the adopted ResNet50 because it produced the best results, with a 92.90% accuracy. Extensive experiments and comparisons with other research work show that the proposed method is effective. Furthermore, the CNN model has been implemented on an embedded target to be a part of a medical instrument diagnostic system. We have accelerated our model inference on a field programmable gate array (FPGA) using Xilinx tools. Results have confirmed that a customized FPGA system on chip (SoC) with hardware accelerators is a promising target for our DR detection model with high performance and low power consumption.
As many countries experience the emergence of new waves of Covid-19, many governments around the world have reminded their citizens of the need for an engaging intervention that could improve compliance with Covid-19 safe behaviors using the media general public or social media. In the face of the serious threat of Covid-19, immunity issues are currently the subject of various research and studies. A promising approach is to use video game culture to educate and train citizens to healthily adopt eating habits to strengthen the immune system. The objective of this study is to develop a prototype of a serious game (SG) on how to strengthen the immune defenses in order to be able to fight a coronavirus infection and to constitute an antivirus barrier. After defining the learning objectives by interviewing the stakeholders, we searched the scientific literature to establish the relevant theoretical bases. The learning contents have been validated by biology teachers. The learning mechanisms were then determined based on the learning objectives. The obtained experimental results show that 92% of the participants in the study have appreciated the quality of the scenario and the way in which the concept of interaction between the different game elements was presented.
The need for automated speech recognition has expanded as a result of significant industrial expansion for a variety of automation and humanmachine interface applications. The speech impairment brought on by communication disorders, neurogenic speech disorders, or psychological speech disorders limits the performance of different artificial intelligencebased systems. The dysarthric condition is a neurogenic speech disease that restricts the capacity of the human voice to articulate. This article presents a comprehensive survey of the recent advances in the automatic dysarthric speech recognition (DSR) using machine learning (ML) and deep learning (DL) paradigms. It focuses on the methodology, database, evaluation metrics, and major findings from the study of previous approaches. From the literature survey it provides the gaps between exiting work and previous work on DSR and provides the future direction for improvement of DSR. The performance of the various machine and DL schemes is evaluated for the DSR on UASpeech dataset based on accuracy, precision, recall, and F1- score. It is observed that the DL based DSR schems outperforms the ML based DSR schemes.
The router is a network device that is used to connect subnetwork and packet-switched networking by directing the data packets to the intended IP addresses. It succeeds the traffic between different systems and allows several devices to share the internet connection. The router is applicable for the effective commutation in system on chip (SoC) modules for network on chip (NoC) communication. The research paper emphasizes the design of the two dimensional (2D) router hardware chip in the Xilinx integrated system environment (ISE) 14.7 software and further logic verification using the data packets transmitted from all input/output ports. The design evaluation is done based on the pre-synthesis device utilization summary relating to different field programmable gate array (FPGA) boards such as Spartan-3E (XC3S500E), Spartan-6 (XC6SLX45), Virtex-4 (XC4VFX12), Virtex-5 (XC5VSX50T), and Virtex-7 (XC7VX550T). The 64-bit data logic is verified on the different ports of the router configuration in the Xilinx and Modelsim waveform simulator. The Virtex-7 has proven the fast-switching speed and optimal hardware parameters in comparison to other FPGAs.
In this work, an internet of things (IoT) sensing and monitoring box has been developed. The proposed low-cost system is a portable device for smart buildings to measure vibrations, monitor, and control noise caused by the industrial machines. We will present an instrument and a method to measure the vibration and tilt of a mechanical system (air conditioner). The primary goal is to create a signal acquisition and monitoring system that is both userfriendly and affordable, while also delivering exceptional precision. The key concept is centered around acquiring and processing signals through the Raspberry Pi. We will use for the first time as an application, which does not exist before, a conversion method to control and monitor remotely the noise generated by the machines. Once the noise reaches a high value or the air conditioner is too much tilted, the system sends an alert in the form of an email. We will use the Python language to acquire and process the signal and send the alerts. The proposed approach is straightforward to implement, and the obtained results demonstrate a high level of accuracy that is consistent with the existing literature.
The use of a convolutional neural network (CNN) to analyze and classify electroencephalogram (EEG) signals has recently attracted the interest of researchers to identify epileptic seizures. This success has come with an enormous increase in the computational complexity and memory requirements of CNNs. For the sake of boosting the performance of CNN inference, several hardware accelerators have been proposed. The high performance and flexibility of the field programmable gate array (FPGA) make it an efficient accelerator for CNNs. Nevertheless, for resource-limited platforms, the deployment of CNN models poses significant challenges. For an ease of CNN implementation on such platforms, several tools and frameworks have been made available by the research community along with different optimization techniques. In this paper, we proposed an FPGA implementation for an automatic seizure detection approach using two CNN models, namely VGG-16 and ResNet-50. To reduce the model size and computation cost, we exploited two optimization approaches: pruning and quantization. Furthermore, we presented the results and discussed the advantages and limitations of two implementation alternatives for the inference acceleration of quantized CNNs on Zynq-7000: an advanced RISC machine (ARM) software implementation-based ARM, NN, software development kit (SDK) and a software/hardware implementation-based deep learning processor unit (DPU) accelerator and DNNDK toolkit.
Convolutional neural networks (CNN) trained using deep learning (DL) have advanced dramatically in recent years. Researchers from a variety of fields have been motivated by the success of CNNs in computer vision to develop better CNN models for use in other visually-rich settings. Successes in image classification and research have been achieved in a wide variety of domains throughout the past year. Among the many popularized image classification techniques, the detection of plant leaf diseases has received extensive research. As a result of the nature of the procedure, image quality is often degraded and distortions are introduced during the capturing of the image. In this study, we look into how various CNN models are affected by distortions. Corn-maze leaf photos from the 4,188-image corn or maize leaf Dataset (split into four categories) are under consideration. To evaluate how well they handle noise and blur, researchers have deployed pre-trained deep CNN models like visual geometry group (VGG), InceptionV3, ResNet50, and EfficientNetB0. Classification accuracy and metrics like as recall and f1-score are used to evaluate CNN performance.
In this research, we present a low phase noise (PN) and wide tuning range 175 GHz inductors and capacitors (LC) voltage-controlled oscillator (VCO) based on a differential Colpitts oscillator that was designed using a 0.13 μm bipolar complementary metal oxide semiconductor (BiCMOS) and simulated. The square of the tank Q-factor and the square of the oscillation amplitude were both maximized to reduce PN. With an extensive examination of parasitic, mathematical analysis of load impedances, and implementation of differential design, the PN was reduced, and the output power was enhanced. Using a supply voltage of 1.6 V, the VCO consumes 41.9 mA, resulting in a total power usage of 67 mW to prevent undesirable PN deterioration, an inter-stage LC filter at the VCO-buffer interface increases the swing at the buffer input. To make a better output, a buffer is used to isolate the load from the VCO core. In addition, the VCO has a high linearity and the overall, the VCO presented in this study demonstrates excellent performance and has the potential to be used in a wide range of applications that require a high-performance, low-power VCO.
Diseases in edible and industrial plants remains a major concern, affecting producers and consumers. The problem is further exacerbated as there are different species of plants with a wide variety of diseases that reduce the effectiveness of certain pesticides while increasing our risk of illness. A timely, accurate and automated detection of diseases can be beneficial. Our work focuses on evaluating deep learning (DL) approaches using transfer learning to automatically detect diseases in plants. To enhance the capabilities of our approach, we compiled a novel image dataset containing 87,570 records encompassing 32 different plants and 74 types of diseases. The dataset consists of leaf images from both laboratory setups and cultivation fields, making it more representative. To the best of our knowledge, no such datasets have been used for DL models. Four pretrained computer vision models, namely VGG-16, VGG-19, ResNet-50, and ResNet-101 were evaluated on our dataset. Our experiments demonstrate that both VGG-16 and VGG-19 models proved more efficient, yielding an accuracy of approximately 86% and a f1-score of 87%, as compared to ResNet-50 and ResNet-101. ResNet-50 attains an accuracy and a f1-score of 46.9% and 45.6%, respectively, while ResNet-101 reaches an accuracy of 40.7% and a f1-score of 26.9%.
In this proposed work, we identified the significant research issues on lung cancer risk factors. Capturing and defining symptoms at an early stage is one of the most difficult phases for patients. Based on the history of patients records, we reviewed a number of current research studies on lung cancer and its various stages. We identified that lung cancer is one of the significant research issues in predicting the early stages of cancer disease. This research aimed to develop a model that can detect lung cancer with a remarkably high level of accuracy using the deep learning approach (convolution neural network). This method considers and resolves significant gaps in previous studies. We compare the accuracy levels and loss values of our model with VGG16, InceptionV3, and Resnet50. We found that our model achieved an accuracy of 94% and a minimum loss of 0.1%. Hence physicians can use our convolution neural network models for predicting lung cancer risk factors in the real world. Moreover, this investigation reveals that squamous cell carcinoma, normal, adenocarcinoma, and large cell carcinoma are the most significant risk factors. In addition, the remaining attributes are also crucial for achieving the best performance.
Wirelessly based security applications have exploded as a result of modern technology. To build and/or implement security access control systems, many types of wireless communication technologies have been deployed. quick response (QR code) is a contactless technology that is extensively utilised in a variety of sectors, including access control, library book tracking, supply chains, and tollgate systems, among others. This paper combines QR code technology with Arduino and Python to construct an automated QR code-based access management system. After detecting a QR code, the QR scanner at the entry collects and compares the user's unique identifier (UID) with the UID recorded in the system. The results show that this system is capable of granting or denying access to a protected environment in a timely, effective, and reliable way. Security systems can protect physical and intellectual property by preventing unauthorized persons from entering the area. Many door locks, such as mechanical and electrical locks, were created to meet basic security needs but it also helps to create a data files structure of the authorized persons.
Localization is a critical concern in many wireless sensor network (WSN) applications. Furthermore, correct information regarding the geographic placements of nodes (sensors) is critical for making the collected data valuable and relevant. Because of their benefits, such as simplicity and acceptable accuracy, the based connectivity algorithms attempt to localize multi-hop WSN. However, due to environmental factors, the precision of localisation may be rather low. This publication describes an extreme learning machine (ELM) technique for minimizing localization error in range-free WSN. In this paper, we propose a Cascade-ELM to increase localization accuracy in range-free WSNs. We tested the proposed approaches in a variety of multi-hop WSN scenarios. Our research focused on an isotropic and irregular environment. The simulation results show that the proposed Cascade-ELM algorithm considerably improves localization accuracy when compared to previous algorithms derived from smart computing approaches. When compared to previous work, isotropic environments show improved localization results.
The study evaluated interference in a dense heterogeneous network using third-generation universal mobile telecommunication systems (UMTS) and fourth-generation long term evolution (LTE) networks LTE. The UMTs/LTE heterogeneous network determines the level of interference when the two communication systems coexist and how to improve the network by migrating from UMTs to LTE, which has a faster download speed and larger capacity. Techno lite 8 on third generation (3G) and Infinix Pro 6 on fourth generation (4G) were used to measure network the received signal strength (RSS) during site investigation. UE interference was detected and traced using a spectrum analyzer. UMTS and LTE path loss exponents are 2.6 and 3.2. Shannon's capacity theorem calculated LTE and UMTS capacity. When signal to interference and noise ratio (SINR) was used as a quality of service (QoS) indicator, MATLAB channel capacity plots did not match Shannon's due to neighboring interference. UMTS had an R2 of 0.54 and LTE 0.57 for the Shannon channel capacity equation. Adjacent channel interference (ACI) user devices reduce network capacity, lowering QoS for other customers.
The course activity log is where a learning management system (LMS) like Moodle keeps track of the various learning activities. The instructor may directly examine the log or use more complex method such as data mining to conduct a quicker and more in-depth examination of the student's behaviors. Most previous studies on analyzing this log data rely on predictive analysis. Instead of predictive analysis, this study investigates cluster analysis and association analysis. Cluster analysis based on k-means++ is utilized to organize students into groups, given their engagement in the learning course module. Association analysis based on apriori is utilized to extract the relationships between various student activities. A dashboard presentation of the findings is provided to facilitate clearer comprehension. Based on the analysis findings, it can be concluded that the structure of the student cluster is medium. In contrast, the association between student activities is positively correlated and well-balanced. The subjective review of the dashboard reveals that the visualization is already sufficient, but there are some recommendations for making it even better.
The software-defined network (SDN) controller adds and removes the contents of the flow table through secure channels to determine how packets are processed and how the flow table is managed. The controller pays attention to network intelligence and becomes the middle part, where the network manages the transfer data of the aircraft delivered via the OpenFlow (OF) switch. To this end, the controller provides an interface for managing, controlling, and managing this switch flow table. Run tests to calculate controller throughput and latency levels and test using the cbance tool, which can test transmission control protocol (TCP) and user datagram protocol (UDP) protocols. The tests are run by forcing the controller to run at maximum without any additional settings (default settings) in order to use the correct information about the controller’s capabilities. Because of this need, you need to test the performance of your controller. In this study, the tests were run on three popular controllers. Test results show that flowed controllers are more stable than open network operating dystem (ONOS) and open daylight (ODL) controllers in managing switch and host loads.
This study examined the influence of electronic government (e-government) implementation for the Ministry of Transport on fulfilling Saudi vision 2030 by transforming the Kingdom of Saudi Arabia (KSA) into a logistics center linking three continents. Saudi vision 2030 aims to cut transportation costs by improving infrastructure, shorten importing and exporting times by streamlining and automating operations, and increase supply chain transparency through sector reform. Implementing e-government would improve government services and engagement through information and communication technology (ICT). This article focuses on four primary areas: i) making KSA a logistics center; ii) increasing the chance of living throughout the Kingdom; and iii) promoting long-term financial sustainability. The study is founded on the idea that logistics is a crucial component for competitive advantage and transportation (by land, sea, or air) is a logistical sub-process for Saudi enterprises that benefit from transport networks similar to the best in the world. The Kingdom's strategic location at the junction of three continents gives its transport sector a geographical competitive advantage that provides access to important emerging markets and critical sea lanes. Despite the optimistic future of the transport and logistics industries in KSA, some important hurdles must be overcome.
Chronic disease (CD) such as kidney disease and causes severe challenging issues to the people all around the world. Chronic kidney disease (CKD) and diabetes mellitus (DM) are considered in this paper. Predicting the diseases in earlier stage, gives better preventive measures to the people. Healthcare domain leads to tremendous cost savings and improved health status of the society. The main objective of this paper is to develop an algorithm to predict CKD occurrence using machine learning (ML) technique. The commonly used classification algorithms namely logistic regression (LR), random forest (RF), conditional random forest (CRF), and recurrent neural networks (RNN) are considered to predict the disease at an earlier stage. The proposed algorithm in this paper uses medical code data to predict disease at an earlier stage.
Suppression of noise in noisy speech signal is required in many speech enhancement applications like signal recording and transmission from one place to other. In this paper a novel single line noise cancellation system is proposed using derivative of normalized least mean spare algorithm. The proposed system has two phases. The first phase is generation of secondary reference signal from incoming primary signal itself at initial silence period and pause between two words, which is essential while adaptive filter using as noise canceller. Second phase is noise cancellation using proposed modified error data normalized step size (EDNSS) algorithm. The performance of the proposed algorithm is compared with normalized least mean square (NLMS) algorithm and original EDNSS algorithm using standard IEEE sentence (SP23) of Noizeus data base with different types of real-world noise at different level of signal to noise ratio (SNR). The output of proposed, NLMS and EDNSS algorithm are measured with output SNR, excessive mean square error (EMSE) and misadjustment (M). The results clearly illustrates that the proposed algorithm gives improved result over conventional NLMS and EDNSS algorithm. The speed of convergence is also maintained as same conventional NLMS algorithm.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...ssuser7dcef0
Power plants release a large amount of water vapor into the
atmosphere through the stack. The flue gas can be a potential
source for obtaining much needed cooling water for a power
plant. If a power plant could recover and reuse a portion of this
moisture, it could reduce its total cooling water intake
requirement. One of the most practical way to recover water
from flue gas is to use a condensing heat exchanger. The power
plant could also recover latent heat due to condensation as well
as sensible heat due to lowering the flue gas exit temperature.
Additionally, harmful acids released from the stack can be
reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation.
Condensation of vapors in flue gas is a complicated
phenomenon since heat and mass transfer of water vapor and
various acids simultaneously occur in the presence of noncondensable
gases such as nitrogen and oxygen. Design of a
condenser depends on the knowledge and understanding of the
heat and mass transfer processes. A computer program for
numerical simulations of water (H2O) and sulfuric acid (H2SO4)
condensation in a flue gas condensing heat exchanger was
developed using MATLAB. Governing equations based on
mass and energy balances for the system were derived to
predict variables such as flue gas exit temperature, cooling
water outlet temperature, mole fraction and condensation rates
of water and sulfuric acid vapors. The equations were solved
using an iterative solution technique with calculations of heat
and mass transfer coefficients and physical properties.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
An Approach to Detecting Writing Styles Based on Clustering Techniquesambekarshweta25
An Approach to Detecting Writing Styles Based on Clustering Techniques
Authors:
-Devkinandan Jagtap
-Shweta Ambekar
-Harshit Singh
-Nakul Sharma (Assistant Professor)
Institution:
VIIT Pune, India
Abstract:
This paper proposes a system to differentiate between human-generated and AI-generated texts using stylometric analysis. The system analyzes text files and classifies writing styles by employing various clustering algorithms, such as k-means, k-means++, hierarchical, and DBSCAN. The effectiveness of these algorithms is measured using silhouette scores. The system successfully identifies distinct writing styles within documents, demonstrating its potential for plagiarism detection.
Introduction:
Stylometry, the study of linguistic and structural features in texts, is used for tasks like plagiarism detection, genre separation, and author verification. This paper leverages stylometric analysis to identify different writing styles and improve plagiarism detection methods.
Methodology:
The system includes data collection, preprocessing, feature extraction, dimensional reduction, machine learning models for clustering, and performance comparison using silhouette scores. Feature extraction focuses on lexical features, vocabulary richness, and readability scores. The study uses a small dataset of texts from various authors and employs algorithms like k-means, k-means++, hierarchical clustering, and DBSCAN for clustering.
Results:
Experiments show that the system effectively identifies writing styles, with silhouette scores indicating reasonable to strong clustering when k=2. As the number of clusters increases, the silhouette scores decrease, indicating a drop in accuracy. K-means and k-means++ perform similarly, while hierarchical clustering is less optimized.
Conclusion and Future Work:
The system works well for distinguishing writing styles with two clusters but becomes less accurate as the number of clusters increases. Future research could focus on adding more parameters and optimizing the methodology to improve accuracy with higher cluster values. This system can enhance existing plagiarism detection tools, especially in academic settings.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
1. International Journal of Informatics and Communication Technology (IJ-ICT)
Vol. 12, No. 3, December 2023, pp. 284~292
ISSN: 2252-8776, DOI: 10.11591/ijict.v12i3.pp284-292 284
Journal homepage: http://ijict.iaescore.com
Realization of an intelligent evaluation system
Otman Maarouf1
, Rachid El Ayachi1
, Mohamed Biniz2
1
Department of Computer Science, Faculty of Science and Technology, Sultan Moulay Slimane University, Beni Mellal, Morocco
2
Department of Computer Science, Faculty of Polydisciplinary, Sultan Moulay Slimane University, Beni Mellal, Morocco
Article Info ABSTRACT
Article history:
Received May 17, 2021
Revised Dec 24, 2021
Accepted Mar 9, 2023
A number of benefits have been reported for computer-based assessments
over traditional paper-based exams, both in terms of IT support for question
development, reduced distribution and test administration costs, and
automated support. Possible for the ranking. However, existing
computerized assessment systems do not provide all kinds of questions,
namely open questions that require writing solutions. To overcome the
challenges of the existing, the objective of this work is to achieve an
intelligent evaluation system (IES) responding to the problems identified,
and which adapts to the different types of questions, especially open-ended
questions of which the answer requires sentence writing or programming.
Keywords:
Intelligent evaluation system
Open questions
Programming questions
Semantic similarity
Syntactical similarity This is an open access article under the CC BY-SA license.
Corresponding Author:
Otman Maarouf
Department of Computer Science, Faculty of Science and Technology, Sultan Moulay Slimane University
Beni Mellal, Morocco
Email: maarouf.otman94@gmail.com
1. INTRODUCTION
More and more scientific findings are becoming available as a result of the ongoing advancements
in natural science, humanities, and science and technology. Modern educational ideas, concepts, and the
utilization of contemporary educational technology are necessary in order to address the modernization,
cosmopolitanization, and future-oriented nature of all subject matter instruction today [1]. The delivery of
customized education would be ineffective without the constant observation and evaluation of the student,
both for the purpose of an efficient educational assessment and to enable the system to adjust to the demands
of the learners. It can be difficult to monitor and evaluate students' performance in a classroom setting,
especially when it's required to be done in real time.
Classical assessments have encountered several problems, namely: printing, monitoring, space
management and correction. This requires the automation of the evaluation process using computerized
assessment systems. According to the research carried out, we are only the evaluation systems are limited in
terms of questions asked. They are based on questions of the QCM type, true or false, filling of empty fields,
and multiple answers. But we note the absence of open-type questions, writing answers, which are of great
importance in the learner’s theoretical and practical assessments. Attempts have been made to overcome the
problems encountered in the classical evaluation system software engineering institute (SEI). This system
offers two other types of evaluation: i) use the system compiler to answer the questions and ii) the second is
interested in the questions that require writing answers in the form of sentences; the correction of this kind of
answer has adopted the notion of semantic similarity.
An intelligent evaluation system (IES) in the form of a website has been developed to enhance the
automated correction of assessments and allow for the assessment of learners' learning levels. It provides a
number of options, including the ability for students to take assessments online, prepare examinations with a
2. Int J Inf & Commun Technol ISSN: 2252-8776
Realization of an intelligent evaluation system (Otman Maarouf)
285
variety of topics, including open-ended and programming-related questions, and automatically correct
answers based on teachers' responses, then it gives a score of the student. Regarding paper's structure, there
are two sections; the first one contains the different methods for calculating the semantic and syntaxique
similarity between two sentences. The second section give the general architecture of the realized system, all
tools used in this project are mentioned in this section, finally the conclusion will be given.
2. CONCEPT AND TYPES OF SIMILARITIES
2.1. Introduction
Measuring the similarity between two sentences (or short texts) consists in evaluating to what extent
the meaning of these sentences is close [2]. This task semantic textual similarity (STS) is often used in
several important areas of the automatic language processing (TAL), among which we can mention the
search for information, the categorization of texts [3], the summary of text [4], and the machine translation
[5]. When we talk about similarity we talk about classification, clustering (or clustering) to describe data
partitioning and a cluster is then a set of data or elements with similarities. The description language of the
objects of a database must make it possible to define the distance of this object from the others.
In the field of artificial intelligence, similarity is one of the criteria for computer analysis of clusters
and for data partitioning. This automatic classification step is necessary for the implementation of the
machine learning methods. Expert software also seeks to take into account the context, according to which
the similarity may vary [6]. The software will do a lot more relevant work as the attributes of the data will be
useful and relevant in the context.
2.2. Syntactic similarity
Measuring syntactic similarity, “the syntactic word denotes in the linguistic sense a method of
classification of languages according to the order of appearance of words in the sentence. Between words, the
text runs in the field of data mining plays an important role [7].
2.2.1. Term frequency-inverse document frequency method
Term frequency-inverse document frequency (TF-IDF) est une approche de pondération reconnue par
un poids, et qui est souvent utilisée dans la recherche d'information et l'exploration de texte [8]. Ce poids est une
mesure statistique numérique destinée à refléter l'importance d'un mot pour un document dans une collection ou
un corpus [9]. Typically, the weight of TF-IDF is composed of two terms: the first calculates the normalized TF,
which is the number of times a word appears in a document, divided by the total number of words in that document
[10]. The second term is the IDF, calculated as the logarithm of the number of documents in the corpus divided by
the number of documents containing the specific term. TF is defined as [11].
− Term frequency
We notice: 𝑇𝑓 (𝑤, 𝑑) = {𝑤′ ∈ 𝑑: 𝑤 ′ = 𝑤} where w is a word, and 𝑑 = {𝑤1, … , 𝑤𝑚} is a
document such as:
𝑡𝑓𝑖,𝑗 =
𝑛𝑖,𝑗
∑ 𝑛𝑘,𝑗
𝑘
(1)
where;
𝑛𝑖,𝑗 : the number of appearances of the word we want to calculate.
𝑛𝑘,𝑗 : the sum of all the words existing in the document by eliminating the punctuation, the spaces, the
apostrophes.
− Inverse document frequency
𝑖𝑑𝑓𝑖 = 𝑙𝑜𝑔
|𝐷|
|{𝑑𝑗:𝑡𝑖∈𝑑𝑗}|
(2)
with,
|𝐷| : total number of documents in the corpus
|{𝑑𝑗: 𝑡𝑖 ∈ 𝑑𝑗}| : number of documents where the term appears (i.e. 𝑛𝑖,𝑗 ≠ 0)
− Calculation of TF-IDF
Finally, to put it all together, the total weight of TF-IDF for a token in a document is the product of
its TF and IDF weights:
𝑡𝑓𝑖𝑑𝑓𝑖,𝑗 = 𝑡𝑓𝑖,𝑗 × 𝑖𝑑𝑓𝑖 (3)
3. ISSN: 2252-8776
Int J Inf & Commun Technol, Vol. 12, No. 3, December 2023: 284-292
286
2.2.2. Similarity cosine
Cosine similarity is frequently used as a measure of similarity between two documents [12]. It may
be a question of comparing the texts resulting from a corpus in an optic of classification or search of
information (in this case, a vectorized document is constituted by the words of the request and is compared
by measure of the cosine of the angle with vectors corresponding to all the documents present in the corpus,
so we evaluate which ones are closest) [13]. As the angle measurement between two vectors can only be
done with numerical values, we must imagine a way to convert the words of a document into numbers. This
is why we rely on the results of the previous TD-IDF method which is the weight of each word in a document
and we can consider it as a vector. The cosine similarity between two documents d1 and d2 is a measure of
similarity. It is a question of calculating the cosine of the angle between the vector representations of the
documents to compare [14].
Similarity is 𝑠𝑖𝑚𝑐𝑜𝑠𝑖𝑛𝑢𝑠(𝑑1, 𝑑2) ∈ [0,1]
simcosinus(d1, d2) =
d1
⃗⃗⃗⃗⃗ .d2
⃗⃗⃗⃗⃗
‖d1
⃗⃗⃗⃗⃗ ‖‖d2
⃗⃗⃗⃗⃗ ‖
(4)
2.3. Semantic similarity
Semantic similarity is a concept in which a set of documents or terms are given a metric based on
the similarity of their meaning/semantic content [15]. Concretely, this can be done by defining a topological
similarity, for example, by using ontologies to define a distance between words, or by defining a statistical
similarity, for example by using a vector space model to correlate terms and conditions [16]. Contexts from
an appropriate body of text (co-occurrence) [17]. Text similarity is a field of research whereby two terms or
expressions are assigned a score based on the likeness of their meaning. Kocoń and Maziarz [18] short text
similarity measures have an important role in many applications such as word sense disambiguation,
synonymy detection, spell checking, thesauri generation, machine translation, information retrieval, and
question answering [19].
2.3.1. Measure of Wu & Palmer [20]
In a domain of concepts, similarity is defined with respect to the distance between two concepts in
the hierarchy and their position relative to the root. This similarity also takes into account the length of the
original path 𝑐𝑖 and the extremity 𝑐𝑗 but also the depth of their most specific common subsuming, i.e. the
length of the original path and 𝑐0 and the extremity [21]. The similarity between C1 and C2 is:
ConSim(C1, C2) =
2∗N3
N1+N2+2∗N3
(5)
with, N1 is the distance between the concept C1 and the concept C3; N2 is the distance between the C2
concept and the C3 concept; N3 is the distance between the concept C3 and the root.
This measure has the advantage of being simple to implement and of having good performance than the
other similarity measures [22]. The Figure 1 describe the relationships between the conceptual C1, C2, C3, and root.
Figure 1. Conceptual relationships [20]
2.3.2. Similarity of Mihalcia [23]
Simple lexical correspondence is described in [23]. The word-for-word similarity measures and a
word specificity measure are used to estimate the semantic similarity of the sentence pairs [24]. The
following notation function was used [25]:
4. Int J Inf & Commun Technol ISSN: 2252-8776
Realization of an intelligent evaluation system (Otman Maarouf)
287
Sim(T1, T2) =
1
2
(
∑ (maxSim(w,T2)∗idf(w))
w∈{T1}
∑ idf(w)
w∈{T1}
+
∑ (maxSim(w,T1)∗idf(w))
w∈{T2}
∑ idf(w)
w∈{T2}
) (6)
where 𝑚𝑎𝑥𝑆𝑖𝑚 (𝑤, 𝑇) is the maximum score between the word w and the words in T according to a word-
for-word similarity measure, and 𝑖𝑑𝑓 (𝑤) is the inverse document frequency of the word. A threshold of 0.5
was used for classification: a score above the threshold was classified as paraphrasing other than paraphrase.
According to Mihalcea et al. [23], takes into account the syntactic nature of terms and restricts
comparisons of similarities to terms of the same syntactic nature: verbs, nouns, and adjectives between them.
He tested several types of restrictions more or less binding related to the syntactic nature of the terms: from
nouns/ nouns, adjectives/adjectives, and verbs/verbs to only proper nouns/proper nouns. In spite of what
could be predicted by the similarity tests between terms according to their syntactic natures, these different
tests all led to a very marked deterioration of the results. One might think, for example, that restricting a
named entity to being comparable only to another named entity cannot damage the results, but experience has
shown that this discrimination leads to a bad similarity between expressions such as “the Japanese
president… ‘And’ in Japan, the president…” [26]. The system whose results are given in the evaluations thus
operates without any restriction as to the syntactic nature of the terms compared [27]. Below is an example of
calculating the similarity between two sentences. Table 1 shows the similarity matrix of Wu and Palmer [20].
𝑃1 et 𝑃2 two sentences such as: i) 𝑃1: eventually, a huge cyclone hit the entrance of my house and ii) 𝑃2:
finally, a massive hurricane attacked my home.
Table 1. Similarity matrix of Wu and Palmer [20] between two sentences
Cyclone/NN Hit/VBD Entrance/NN House/NN
Hurricane/NN 0.9565 - 0.2857 0.3158
Attacked/VBD - 0.8571 - -
Home/NN 0.3529 - 0.6667 1.0000
𝑆𝑖𝑚(𝑃1, 𝑃2) =
1
2
(
[0.9565 ∗ log (
2
1
) + 0.857 ∗ log (
2
1
) + 1 ∗ log (
2
1
)]
3 ∗ log (
2
1
)
+
[0.9565 ∗ log (
2
1
) + 0.857 ∗ log (
2
1
) + 0.6667 ∗ log (
2
1
) + 1 ∗ log (
2
1
)]
4 ∗ log (
2
1
)
=
(0.93 + 0.865)
2
= 0.89
From these results, we find that the two sentences 𝑃1 and 𝑃2 are similar.
3. DESCRIPTION OF THE REALIZED EVALUATION SYSTEM
3.1. Architecture of IES
The IES developed is a website for assessing learner’s level of learning. It offers several
opportunities, namely: the online passage of assessment tests by students, the preparation of exams that
contain all kinds of questions including programming questions and open questions and automatic correction
based on the answers provided by teachers. IES is an evaluation system developed to automate the process of
assessing learners’ competencies, with several parties communicating with each other. Figure 2 shows the
architecture of IES. The IES system has a set of components that are:
− Teacher space: this space allows teachers to register and authenticate in the platform to build an exam
(questions, answers, and notation) and to enter the codes of students who are allowed to take the exam.
− Database: the database contains the information of the professors and students, as well as the exams
(questions and answers proposed by the professors).
− Student area: this space allows students to register and to authenticate in the platform to pass exams.
3.2. Mechanism for correcting the open questions
To correct the open questions (the questions that have writing responses), the IES system makes use
of the notion of semantic similarity; this similarity is calculated between the answer provided by the learner
and the answer given by the teacher. The operation of the correction is done in several steps: i) the syntactic
correctness of the answers, ii) sentence segmentation, labeling of speech parts, extraction of named entities
using OPENNLP, iii) the elimination of stop words and the punctuation, and iv) the calculation of semantic
similarity using the Mihalcea et al. [21] approach.
5. ISSN: 2252-8776
Int J Inf & Commun Technol, Vol. 12, No. 3, December 2023: 284-292
288
Figure 2. IES architecture developed
3.2.1. Syntax correction
The syntax correction of the sentence is a technique that allows to correct the syntax errors based on a
corpus and a dictionary to correct a sentence syntactically, we need to determine the correction center of the
sentence (the position of the word in the sentence that has a maximum frequency in the corpus) using (7) [28]:
𝑝𝑜𝑠 = {𝑖 ∶ 𝑚𝑎𝑥𝑓𝑟𝑒𝑞𝑤 = 𝑓𝑟𝑒𝑞𝑤𝑖} (7)
with 𝑚𝑎𝑥𝑓𝑟𝑒𝑞𝑤 is the maximum frequency of the phrase words.
The second step is the detection of the erroneous words of the sentence by doing a search and a
comparison of the words of the sentence with the words of the dictionary, then the calculation of the distance
between the erroneous words and the words of the dictionary, in order to recover all the words close to the
erroneous word, by (8) and (9) [28]:
𝑑𝑖𝑠𝑡(𝑤1, 𝑤2) = |{ 𝑐𝑖: (𝑐𝑖𝜖𝑤1
𝑘
˄𝑐𝑖𝜖𝑤2
𝑘
)˅(𝑐𝑖𝜖𝑤1
𝑘
˄𝑐𝑖𝜖𝑤2
𝑘±1
)
˅(𝑐𝑖𝜖𝑤1
𝑘±1
˄𝑐𝑖𝜖𝑤2
𝑘
)˅(𝑐𝑖𝜖𝑤1
𝑘±1
˄𝑐𝑖𝜖𝑤2
𝑘±1
)˅}|
(8)
𝑚𝑜𝑡𝑠_𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑠(𝑤) = { 𝑤𝑖: 𝑑𝑖𝑠𝑡(𝑤, 𝑤𝑖) = 𝑚𝑎𝑥 _𝑑𝑖𝑠𝑡(𝑤) } (9)
In (8) is adopted to calculate the distance between the erroneous word and all the words in the
dictionary. In (9) is used to retrieve all words close to the erroneous word. With 𝑚𝑎𝑥_𝑑𝑖𝑠𝑡 is the maximum
distance between the wrong word and the dictionary words.
Afterwards, the correction of the wrong word is based on the correct words of the sentence, the
correction center and the list of words close to the wrong word. This processing applies a recursive technique
[12] and the n-gram correction to the left and right to choose the correct word among the words of the list by
calculating the frequency of each word in the list followed or preceded by a n correct word of the phrase [29].
Then the correct word is that has a non-zero frequency with a maximum n.
𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑅
(𝑤
̅) = {𝑤𝑖: 𝑚𝑎𝑥𝑓𝑟𝑒𝑞𝑖/𝑝𝑜𝑠
𝑅
= 𝑓𝑟𝑒𝑞𝑤𝑖/𝑝𝑜𝑠
𝑅
𝑒𝑡 𝑤𝑖𝜖 𝑤𝑜𝑟𝑑𝑠_𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑠(𝑤
̅)} (10)
𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝐿
(𝑤
̅) = {𝑤𝑖: 𝑚𝑎𝑥𝑓𝑟𝑒𝑞𝑖 𝑝𝑜𝑠
⁄
𝐿
= 𝑓𝑟𝑒𝑞𝑤𝑖 𝑝𝑜𝑠
⁄
𝐿
𝑒𝑡 𝑤𝑖𝜖 𝑤𝑜𝑟𝑑𝑠_𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑠(𝑤
̅)} (11)
If the position of the erroneous word exists after the correction center, then (10) is used. Otherwise,
(11) is adopted. With 𝑤
̅ represents the wrong word. Finally, we can correct the sentence, using the correct
word left or right and the recursive technique, by (12):
𝑐𝑜𝑟𝑟𝑒𝑐𝑡(𝑝ℎ𝑟𝑎𝑠𝑒) = {𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝐿
(𝑤𝑖<𝑝𝑜𝑠); 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑅
(𝑤𝑖>𝑝𝑜𝑠)} (12)
with 𝑤𝑖<𝑝𝑜𝑠 represent the words of sentence located before the word existing in the correction center. 𝑤𝑖>𝑝𝑜𝑠
represent sentence words located after the word existing in the correction center.
6. Int J Inf & Commun Technol ISSN: 2252-8776
Realization of an intelligent evaluation system (Otman Maarouf)
289
3.2.2. Segmentation of sentences and labeling of parts of speech
In this step, we used natural language processing (openNLP) to segment the sentence (answer of the
question) and to make a labeling of each word of the sentence; this is adopted to calculate the semantic
similarity between the answer proposed by the teacher and the other given by the student, using the approach
of Mihalcea et al. [23]. We generate the similarity matrix of Wu & Palmer words of sentences that have the
same grammatical field to use (6). If we find a similarity greater than or equal to 0.75, we consider that the
similar sentences, that is to say the answer given by the student is correct, so he takes the complete note of
the question. If no, we calculate the question score (13) by multiplying the similarity of the two answers and
the scale of the question.
𝑛𝑜𝑡𝑒(𝑞) = 𝑆𝑖𝑚(𝑅, 𝑅𝑝) ∗ 𝑏𝑎𝑟 (13)
Where R is the answer of the student; 𝑅𝑝 is the answer proposed by the professor; 𝑏𝑎𝑟 is the full note of the question.
Figure 3 shows the different steps of the similarity calculation, starting with sentence cleaning,
segmentation, generation of the similarity matrix and ending with the Mihalcea similarity calculation.
Figure 3. Flowchart that presents the semantic similarity calculation process
− Apache openNLP: the apache OpenNLP library is a toolbox. Support of the openNLP library: i)
tokenization, ii) segmentation of the sentence, iii) marking of part of speech, iv) named entity
extraction, v) data chunking, vi) data analysis.
− WordNet: is a wide-ranging lexical database, developed for over 20 years by linguists from the
cognitive science laboratory at Princeton University for the English language. It is freely usable, even
been created manually or automatically from, in extension to, or in addition to WordNet. Programs from
the world of artificial intelligence have also established bridges with WordNet. It is a semantic network
of the English language, which is based on a psychological theory of language. The first version
released dates back to June 1991 [13].
7. ISSN: 2252-8776
Int J Inf & Commun Technol, Vol. 12, No. 3, December 2023: 284-292
290
3.2.3. Application example
The statement of the question is as follows: “give the definition of java class”. The answer proposed
by the professor is as follows: “a class is a definition model for objects with the same set of attributes, and the
same set of operations”. The answer written by the student is: “a class is a model to generate and define the
objects to have the same attributes and method”. The note reserved for this question is 3 points. The
correction of this type of question starts with the verification of the spelling errors, the spelling errors, in this
example we found two errors “defined” and “th”, after the syntax correction of this answer [28] we consider
the new answer is “a class is a model to generate and define the objects for the same attributes and method”.
Figure 4 demonstrate an example to calculate the similarity between response of teacher and student.
Figure 4. Example of calculating the semantic similarity between two sentences sentence (student) and
sentence (teacher)
If we have a similarity between the proposed answer and the student’s response less than 0.25, we
consider that the answer is false, i.e., the grade that will be given to the student for this question is “0”.
Otherwise, if the similarity exists in [0.25, 0.75], we consider that the answer is partially correct by a percentage
that is to say the student’s note for this question equal to “sim * note by cons if the similarity is greater than or
equal to 0.75, we consider that the answer is correct and the student will have the whole note. The similarity
between the proposed answer and the student’s answer in the example equals ‘0.71’, so in this case the student’s
score is ‘0.71*3=2.13’. Figure 5 explain the process to calculate the score of the student response.
8. Int J Inf & Commun Technol ISSN: 2252-8776
Realization of an intelligent evaluation system (Otman Maarouf)
291
Figure 5. Flow chart that explains the calculation process
4. CONCLUSION
In this end-of-studies project, we have developed an intelligent evaluation system (SEI), which
makes it possible to assess learners effectively; this effectiveness lies in the diversification of the types of
questions to be proposed in an evaluation. The particularity of this system compared to the existing one is
that it added other types of questions, namely: i) programming issues where the learner must answer the
question through a program using a programming language and ii) open questions where the learner has to
write the answer of the question as a text. The realization of this system required several steps; starting with
the study of the existing, then looking for improvement as well as resolution approaches, and finally the
choice of tools and the development of the system. Several perspectives are conceivable, namely: i) adapt this
evaluation system to correct all types of questions of all subjects and ii) use multilingual dictionaries and
corpora to correct answers to questions from different languages.
REFERENCES
[1] B. Gao, “Research and Implementation of Intelligent Evaluation System of Teaching Quality in Universities Based on Artificial
Intelligence Neural Network Model,” Math. Probl. Eng., vol. 2022, p. e8224184, Mar. 2022, doi: 10.1155/2022/8224184.
[2] S. Abujar, M. Hasan, and S. A. Hossain, “Sentence Similarity Estimation for Text Summarization Using Deep Learning,” in Proc. of
the 2nd Inter. Conf. on Data Eng. and Comm. Tech.y, Springer Singapore, 2018, pp. 155–164, doi: 10.1007/978-981-13-1610-4_16.
[3] X. Guo, H. Mirzaalian, E. Sabir, A. Jaiswal, and W. Abd-Almageed, “CORD19STS: COVID-19 Semantic Textual Similarity
Dataset,” 2020, doi: 10.48550/ARXIV.2007.02461.
[4] Y. Wang et al., “MedSTS: a resource for clinical semantic textual similarity,” Lang. Resour. Eval., vol. 54, no. 1, pp. 57–72, Oct.
2018, doi: 10.1007/s10579-018-9431-1.
[5] C. Lo and M. Simard, “Fully Unsupervised Crosslingual Semantic Textual Similarity Metric Based on BERT for Identifying
Parallel Data,” in Proc. of the 23rd Conference on Computational Natural Language Learning (CoNLL), 2019, doi:
10.18653/v1/k19-1020.
[6] R. Yan, D. Qiu, and H. Jiang, “Sentence Similarity Calculation Based on Probabilistic Tolerance Rough Sets,” Math. Probl. Eng.,
vol. 2021, pp. 1–9, Jan. 2021, doi: 10.1155/2021/1635708.
[7] N. Gali, R. Mariescu-Istodor, D. Hostettler, and P. Fränti, “Framework for syntactic string similarity measures,” Expert Syst.
Appl., vol. 129, pp. 169–185, Sep. 2019, doi: 10.1016/j.eswa.2019.03.048.
[8] A. Thakkar and K. Chaudhari, “Predicting stock trend using an integrated term frequency–inverse document frequency-based
feature weight matrix with neural networks,” Appl. Soft Comput., vol. 96, Nov. 2020, doi: 10.1016/j.asoc.2020.106684.
[9] A. Mahmoud and M. Zrigui, “Semantic Similarity Analysis for Paraphrase Identification in Arabic Texts,” in Proceedings of the
31st Pacific Asia Conference on Language, Information and Computation, Nov. 2017, pp. 274–281. Accessed: Aug. 18, 2022.
[Online]. Available: https://aclanthology.org/Y17-1037
[10] L. Suryani and K. Edy, “Application development ‘Lost & Found’ android-based using term frequency-inverse document
frequency (TF-IDF) and Cosine similarity method,” Electro Luceat, vol. 6, no. 2, pp. 190–204, Nov. 2020, doi:
10.32531/jelekn.v6i2.232.
[11] R. G. Pawar, S. U. Ghumbre, and R. R. Deshmukh, “Visual Similarity Using Convolution Neural Network over Textual
Similarity in Content- Based Recommender System,” Int. J. Adv. Sci. Technol., vol. 27, pp. 137–147, Sep. 2019.
[12] L. Zheng, K. Jia, W. Wu, Q. Liu, T. Bi, and Q. Yang, “Cosine Similarity Based Line Protection for Large Scale Wind Farms Part
II—The Industrial Application,” IEEE Trans. Ind. Electron., vol. 69, no. 3, pp. 2599–2609, Mar. 2022, doi:
10.1109/tie.2021.3069400.
[13] R. Zhang, Z. Xu, and X. Gou, “ELECTRE II method based on the cosine similarity to evaluate the performance of financial
logistics enterprises under double hierarchy hesitant fuzzy linguistic environment,” Fuzzy Optim. Decis. Mak., Feb. 2022, doi:
10.1007/s10700-022-09382-3.
9. ISSN: 2252-8776
Int J Inf & Commun Technol, Vol. 12, No. 3, December 2023: 284-292
292
[14] D. Liu, X. Chen, and D. Peng, “Some cosine similarity measures and distance measures between q-rung orthopair fuzzy sets,” Int.
J. Intell. Syst., vol. 34, no. 7, pp. 1572–1587, Mar. 2019, doi: 10.1002/int.22108.
[15] M. AlMousa, R. Benlamri, and R. Khoury, “Exploiting non-taxonomic relations for measuring semantic similarity and relatedness
in WordNet,” Knowl.-Based Syst., vol. 212, Jan. 2021, doi: 10.1016/j.knosys.2020.106565.
[16] M. Farouk, “Measuring Sentences Similarity: A Survey,” Indian J. Sci. Technol., vol. 12, no. 25, pp. 1–11, Jul. 2019, doi:
10.17485/ijst/2019/v12i25/143977.
[17] M. Farouk, “Sentence Semantic Similarity based on Word Embedding and WordNet,” in 2018 13th International Conference on
Computer Engineering and Systems (ICCES), Dec. 2018, doi: 10.1109/icces.2018.8639211.
[18] J. Kocoń and M. Maziarz, “Mapping WordNet onto human brain connectome in emotion processing and semantic similarity
recognition,” Inf. Process. Manag., vol. 58, no. 3, p. 102530, May 2021, doi: 10.1016/j.ipm.2021.102530.
[19] W. H. Gomaa and A. A. Fahmy, “A Survey of Text Similarity Approaches,” Int. J. Comput. Appl., vol. 68, no. 13, pp. 13–18,
Apr. 2013, doi: 10.5120/11638-7118.
[20] Z. Wu and M. Palmer, “Verb semantics and lexical selection,” 32nd
Annu. Meet. Assoc. Comput. Linguist., pp. 133–138, 1994, doi:
10.3115/981732.981751.
[21] H. Zargayouna and S. Salotti, “Mesure de similarité dans une ontologie pour l’indexation sémantique de documents XML,” in
15èmes Journées francophones d’Ingénierie des Connaissances, Lyon, France, May 2004, pp. 249–260.
[22] I. Atoum and A. Otoom, “Efficient Hybrid Semantic Text Similarity using Wordnet and a Corpus,” Int. J. Adv. Comput. Sci.
Appl., vol. 7, no. 9, 2016, doi: 10.14569/IJACSA.2016.070917.
[23] R. Mihalcea, C. Courtney, and o S. Carlo, “Corpus-based and Knowledge-based Measures of Text Semantic Similarity,” Am.
Assoc. Artificial Intell., p. 6, 2006.
[24] M. Mohler and R. Mihalcea, “Text-to-text semantic similarity for automatic short answer grading,” in Proceedings of the 12th Conference
of the European Chapter of the Association for Computational Linguistics on-EACL ’09, 2009, doi: 10.3115/1609067.1609130.
[25] M. Biniz, R. E. Ayachi, and M. Fakir, “Ontology Matching using BabelNet Dictionary and Word Sense Disambiguation
Algorithms,” Indones. J. Electr. Eng. Comput. Sci., vol. 5, no. 1, p. 196, Jan. 2017, doi: 10.11591/ijeecs.v5.i1.pp196-205.
[26] H. Vu, “Random indexing and inter-sentences similarity applied to automatic summarization,” 2016.
[27] C. Welch, C. Gu, J. Kummerfeld, V. Perez-Rosas, and R. Mihalcea, “Leveraging Similar Users for Personalized Language
Modeling with Limited Data,” Proc. 60th Annu. Meet. Assoc. Comput. Linguist., vol. 1, pp. 1742–1752, 2022.
[28] O. Maarouf, R. E. Ayachi, and M. Biniz, “Correcting optical character recognition result via a novel approach,” Int. J. Inform.
Commun. Technol. IJ-ICT, vol. 11, no. 1, p. 8, Apr. 2022, doi: 10.11591/ijict.v11i1.pp8-19.
[29] O. Maarouf, M. Biniz, and R. El Ayachi, “A new Approach of Natural Language Processing to correct the Result of an OCR,”
CBI Beni Mellal, p. 6, Apr. 2018.
BIOGRAPHIES OF AUTHORS
Otman Maarouf received his master's degree in business intelligence in 2018
from the Faculty of Science and Technology, University Sultan Moulay Sliman Beni-Mellal.
He is currently a PhD degree student. His research activities are located in the area of natural
language processing specifically; it deals with part of speech, named entity recognition, and
limatization-steaming of the Amazigh language. He can be contacted at email:
Maarouf.otman94@gmail.com.
Rachid El Ayachi obtained a degree in Master of Informatic Telecom and
Multimedia (ITM) in 2006 from the Faculty of Sciences, Mohammed V University
(Morocco) and a Ph.D. degree in computer science from the Faculty of Science and
Technology, Sultan Moulay Slimane University (Morocco). He is currently a member of
laboratory TIAD and a professor at the Faculty of Science and Technology, Sultan Moulay
Slimane University, Morocco. His research focuses on image processing, pattern recognition,
machine learning, and semantic web. He can be contacted at email:
rachid.elayachi@usms.ma.
Mohamed Biniz received his master's degree in business intelligence in 2014
and Ph.D. degree in computer science in 2018 from the Faculty of Science and Technology,
University Sultan Moulay Sliman Beni Mellal. He is a professor at polydisciplinary faculty
University Sultan Moulay Slimane Beni Mellal Morocco. His research activities are located
in the area of the semantic web engineering and deep learning specifically, it deals with the
research question of the evolution of ontology, big data, natural language processing, and
dynamic programming. He can be contacted at email: mohamedbiniz@gmail.com.