Sentiment Analysis (SA) and machine learning techniques are collaborating to understand the
attitude of text writer, implied in particular text. Although, SA is an important challenging
itself, it is very important challenging in Arabic language. In this paper, we are enhancing
sentiment analysis in Arabic language. Our approach had begun with special pre-processing
steps. Then, we had adopted sentiment keywords co-occurrence measure (SKCM), as an
algorithm extracted sentiment-based feature selection method. This feature selection method
had utilized on three sentiment corpora using SVM classifier. We compared our approach with
some traditional methods, followed by most SA works. The experimental results were very
promising for enhancing SA accuracy.
Following are the questions which I tried to answer in this ppt
What is text summarization.
What is automatic text summarization?
How it has evolved over the time?
What are different methods?
How deep learning is used for text summarization?
business application
in first few slides extractive summarization is explained, with pro and cons in next section abstractive on is explained.
In the last section business application of each one is highlighted
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...cscpconf
Source and target word segmentation and alignment is a primary step in the statistical learning of a Transliteration. Here, we analyze the benefit of a syllable-like segmentation approach for learning a transliteration from English to an Indic language, which aligns the training set word pairs in terms of sub-syllable-like units instead of individual character units. While this has been found useful in the case of dealing with Out-of-vocabulary words in English-Chinese in the presence of multiple target dialects, we asked if this would be true for Indic languages which are simpler in their phonetic representation and pronunciation. We expected this syllable-like method to perform marginally better, but we found instead that even though our proposed approach improved the Top-1 accuracy, the individual-character-unit alignment model
somewhat outperformed our approach when the Top-10 results of the system were re-ranked using language modeling approaches. Our experiments were conducted for English to Telugu transliteration (our method will apply equally well to most written Indic languages); our training consisted of a syllable-like segmentation and alignment of a large training set, on which we built a statistical model by modifying a previous character-level maximum entropy based Transliteration learning system due to Kumaran and Kellner; our testing consisted of using the same segmentation of a test English word, followed by applying the model, and reranking the resulting top 10 Telugu words. We also report the dataset creation and selection since standard datasets are not available.
The document describes the Columbia-GWU system submitted to the 2016 TAC KBP BeSt Evaluation. It discusses several approaches used for different languages and genres, including:
1) A sentiment system based on identifying the target only, adapted for English, Chinese, and Spanish.
2) An English sentiment system based on relation extraction, treating sentiment as a relation between source and target.
3) English and Chinese belief systems that combine high-precision word tagging with a high-recall default system.
4) A Spanish belief system based on weighted random choice of tags.
The document provides details on the data, approaches, and results for each language-specific system.
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIijnlc
Machine Transliteration has come out to be an emerging and a very important research area in the field of
machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper
transliteration of name entities plays a very significant role in improving the quality of machine translation.
In this paper we are doing machine transliteration for English-Punjabi language pair using rule based
approach. We have constructed some rules for syllabification. Syllabification is the process to extract or
separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper
names and location). For those words which do not come under the category of name entities, separate
probabilities are being calculated by using relative frequency through a statistical machine translation
toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to
Punjabi.
This document summarizes an experiment comparing different character-level embedding approaches for Korean sentence classification tasks. Dense character-level embeddings using pre-trained fastText vectors outperformed sparse one-hot encodings. Character-level embeddings preserved local semantics around character boundaries better than Jamo-level encodings, which performed best with self-attention. While Jamo-level features may be useful for syntax-semantic tasks, character-level approaches had better performance and computation efficiency. These findings provide insights for character-rich languages beyond Korean.
The document summarizes different techniques for automatic document summarization including extractive and abstractive approaches. It discusses simple techniques like frequency-based methods and cue phrases. Graph-based approaches like TextRank and LexRank that model text as a graph are explained. Linguistic methods involving lexical chains and rhetorical structure are covered. Finally, it summarizes WordNet-based semantic approaches and techniques for evaluating summaries.
2010 PACLIC - pay attention to categoriesWarNik Chow
This document summarizes a research paper on a proposed method called Metadata Projection Matrix (MPM) for sentence modeling that allows controlling attention to certain syntactic categories. The method uses a projection matrix to incorporate syntactic category information when calculating attention weights. Experimental results on several datasets show MPM outperforms baselines on tasks where attention to specific categories is important, like detecting terms or irony, but is weaker on more context-dependent tasks. The method is best suited to applications where syntactic structure significantly informs predictions.
This document summarizes a research paper that investigates authorship attribution on short historical Arabic texts using naive Bayes classification. It explores using n-gram word and character level features and the naive Bayes classifier to attribute 30 short texts by 10 authors. The experiments show the naive Bayes classifier achieves high accuracy, up to 96%, especially using word 1-gram features. It also finds punctuation marks can help distinguish authors and increase performance.
Following are the questions which I tried to answer in this ppt
What is text summarization.
What is automatic text summarization?
How it has evolved over the time?
What are different methods?
How deep learning is used for text summarization?
business application
in first few slides extractive summarization is explained, with pro and cons in next section abstractive on is explained.
In the last section business application of each one is highlighted
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...cscpconf
Source and target word segmentation and alignment is a primary step in the statistical learning of a Transliteration. Here, we analyze the benefit of a syllable-like segmentation approach for learning a transliteration from English to an Indic language, which aligns the training set word pairs in terms of sub-syllable-like units instead of individual character units. While this has been found useful in the case of dealing with Out-of-vocabulary words in English-Chinese in the presence of multiple target dialects, we asked if this would be true for Indic languages which are simpler in their phonetic representation and pronunciation. We expected this syllable-like method to perform marginally better, but we found instead that even though our proposed approach improved the Top-1 accuracy, the individual-character-unit alignment model
somewhat outperformed our approach when the Top-10 results of the system were re-ranked using language modeling approaches. Our experiments were conducted for English to Telugu transliteration (our method will apply equally well to most written Indic languages); our training consisted of a syllable-like segmentation and alignment of a large training set, on which we built a statistical model by modifying a previous character-level maximum entropy based Transliteration learning system due to Kumaran and Kellner; our testing consisted of using the same segmentation of a test English word, followed by applying the model, and reranking the resulting top 10 Telugu words. We also report the dataset creation and selection since standard datasets are not available.
The document describes the Columbia-GWU system submitted to the 2016 TAC KBP BeSt Evaluation. It discusses several approaches used for different languages and genres, including:
1) A sentiment system based on identifying the target only, adapted for English, Chinese, and Spanish.
2) An English sentiment system based on relation extraction, treating sentiment as a relation between source and target.
3) English and Chinese belief systems that combine high-precision word tagging with a high-recall default system.
4) A Spanish belief system based on weighted random choice of tags.
The document provides details on the data, approaches, and results for each language-specific system.
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIijnlc
Machine Transliteration has come out to be an emerging and a very important research area in the field of
machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper
transliteration of name entities plays a very significant role in improving the quality of machine translation.
In this paper we are doing machine transliteration for English-Punjabi language pair using rule based
approach. We have constructed some rules for syllabification. Syllabification is the process to extract or
separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper
names and location). For those words which do not come under the category of name entities, separate
probabilities are being calculated by using relative frequency through a statistical machine translation
toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to
Punjabi.
This document summarizes an experiment comparing different character-level embedding approaches for Korean sentence classification tasks. Dense character-level embeddings using pre-trained fastText vectors outperformed sparse one-hot encodings. Character-level embeddings preserved local semantics around character boundaries better than Jamo-level encodings, which performed best with self-attention. While Jamo-level features may be useful for syntax-semantic tasks, character-level approaches had better performance and computation efficiency. These findings provide insights for character-rich languages beyond Korean.
The document summarizes different techniques for automatic document summarization including extractive and abstractive approaches. It discusses simple techniques like frequency-based methods and cue phrases. Graph-based approaches like TextRank and LexRank that model text as a graph are explained. Linguistic methods involving lexical chains and rhetorical structure are covered. Finally, it summarizes WordNet-based semantic approaches and techniques for evaluating summaries.
2010 PACLIC - pay attention to categoriesWarNik Chow
This document summarizes a research paper on a proposed method called Metadata Projection Matrix (MPM) for sentence modeling that allows controlling attention to certain syntactic categories. The method uses a projection matrix to incorporate syntactic category information when calculating attention weights. Experimental results on several datasets show MPM outperforms baselines on tasks where attention to specific categories is important, like detecting terms or irony, but is weaker on more context-dependent tasks. The method is best suited to applications where syntactic structure significantly informs predictions.
This document summarizes a research paper that investigates authorship attribution on short historical Arabic texts using naive Bayes classification. It explores using n-gram word and character level features and the naive Bayes classifier to attribute 30 short texts by 10 authors. The experiments show the naive Bayes classifier achieves high accuracy, up to 96%, especially using word 1-gram features. It also finds punctuation marks can help distinguish authors and increase performance.
The project is developed as a part of IRE course work @IIIT-Hyderabad.
Team members:
Aishwary Gupta (201302216)
B Prabhakar (201505618)
Sahil Swami (201302071)
Links:
https://github.com/prabhakar9885/Text-Summarization
http://prabhakar9885.github.io/Text-Summarization/
https://www.youtube.com/playlist?list=PLtBx4kn8YjxJUGsszlev52fC1Jn07HkUw
http://www.slideshare.net/prabhakar9885/text-summarization-60954970
https://www.dropbox.com/sh/uaxc2cpyy3pi97z/AADkuZ_24OHVi3PJmEAziLxha?dl=0
Extracts objects from the entire collection, without modifying the objects themselves. where the goal is to select whole sentences (without modifying them)
An exploratory research on grammar checking of Bangla sentences using statist...IJECEIAES
N-gram based language models are very popular and extensively used statistical methods for solving various natural language processing problems including grammar checking. Smoothing is one of the most effective techniques used in building a language model to deal with data sparsity problem. Kneser-Ney is one of the most prominently used and successful smoothing technique for language modelling. In our previous work, we presented a Witten-Bell smoothing based language modelling technique for checking grammatical correctness of Bangla sentences which showed promising results outperforming previous methods. In this work, we proposed an improved method using Kneser-Ney smoothing based n-gram language model for grammar checking and performed a comparative performance analysis between Kneser-Ney and Witten-Bell smoothing techniques for the same purpose. We also provided an improved technique for calculating the optimum threshold which further enhanced the the results. Our experimental results show that, Kneser-Ney outperforms Witten-Bell as a smoothing technique when used with n-gram LMs for checking grammatical correctness of Bangla sentences.
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...Lifeng (Aaron) Han
This document discusses developing a statistical approach for measuring confidence intervals in translation quality evaluation and post-editing distance. It proposes modeling errors as independent binomial distributions and using Monte Carlo simulations to determine confidence intervals for different sample sizes. The simulations show that with samples of 100 sentences or less, the 95% confidence interval is too broad to reliably measure quality. A minimum sample of 30 pages is recommended to achieve a reasonable confidence level and narrower interval. Understanding confidence intervals provides a measure of reliability for translation quality scores.
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
The document provides an overview of machine translation evaluation (MTE). It discusses existing MTE methods like BLEU, METEOR, WER, and their weaknesses. The author's thesis proposes a new metric called LEPOR that incorporates additional factors to address weaknesses. The additional factors include an enhanced length penalty, n-gram position difference penalty, and tunable parameters to handle cross-language performance differences. The thesis will experiment with LEPOR on various language pairs and shared tasks to evaluate its performance.
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET Journal
This document discusses deep learning approaches for identifying phrase structures in sentences. It begins with an introduction to natural language processing and phrase structure grammar. Traditional n-gram and rule-based approaches to phrase structure identification are described. Recent deep learning methods for natural language tasks that have been applied to phrase structure identification are then summarized, including word embeddings, convolutional neural networks, recurrent neural networks and recursive neural networks. The document concludes that deep learning requires less manual feature engineering and has achieved good performance on many NLP tasks, but still has room for improvement, especially on tasks involving unlabeled data.
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...TELKOMNIKA JOURNAL
This document summarizes a research paper that proposed improving sentiment analysis of short informal Indonesian product reviews using synonym-based feature expansion. The paper developed an automatic sentiment analysis system using Naive Bayes classification and feature expansion. It first preprocessed texts through normalization, then used an API to find synonyms and expand text features. Experiments showed the proposed method improved sentiment analysis accuracy of short reviews to 98%, and that feature expansion helped more with small training datasets. The best performance was with 400 training examples using expansion.
The document discusses two neural network models for reading comprehension tasks: the Attentive Reader model proposed by Herman et al. in 2015 and the Stanford Reader model proposed by Chen et al. in 2016. The author implemented a two-layer attention model inspired by these previous models that achieves a 1.5% higher accuracy on reading comprehension tasks compared to the Stanford Reader.
This document provides an overview of natural language processing (NLP) research trends presented at ACL 2020, including shifting away from large labeled datasets towards unsupervised and data augmentation techniques. It discusses the resurgence of retrieval models combined with language models, the focus on explainable NLP models, and reflections on current achievements and limitations in the field. Key papers on BERT and XLNet are summarized, outlining their main ideas and achievements in advancing the state-of-the-art on various NLP tasks.
Anaphora resolution in hindi language using gazetteer methodijcsa
Anaphora resolution is one of the active research areas within the realm of natural language processing.
Resolution of anaphoric reference is one of the most challenging and complex task to be handled. This
paper completely emphasis on pronominal anaphora resolution for Hindi Language. There are various
methodologies for resolving anaphora. This paper presents a computational model for anaphora resolution
in Hindi that is based on Gazetteer method. Gazetteer method is a creation of lists and then applies
operations to classify elements present in the list. There are many salient factors for resolving anaphora.
The proposed model resolves anaphora by using two factors that is Animistic and Recency. Animistic factor
always represent living things and non living things whereas Recency describes that the referents
mentioned in current sentence tends to have higher weights than those in previous sentence. This paper
demonstrate the experiments conducted on short Hindi stories ,news articles and biography content from
Wikipedia, its result & future directions to improve accuracy.
A survey on phrase structure learning methods for text classificationijnlc
Text classification is a task of automatic classification of text into one of the predefined categories. The
problem of text classification has been widely studied in different communities like natural language
processing, data mining and information retrieval. Text classification is an important constituent in many
information management tasks like topic identification, spam filtering, email routing, language
identification, genre classification, readability assessment etc. The performance of text classification
improves notably when phrase patterns are used. The use of phrase patterns helps in capturing non-local
behaviours and thus helps in the improvement of text classification task. Phrase structure extraction is the
first step to continue with the phrase pattern identification. In this survey, detailed study of phrase structure
learning methods have been carried out. This will enable future work in several NLP tasks, which uses
syntactic information from phrase structure like grammar checkers, question answering, information
extraction, machine translation, text classification. The paper also provides different levels of classification
and detailed comparison of the phrase structure learning methods.
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGEijnlc
Named Entity Recognition (NER) for Myanmar Language is essential to Myanmar natural language processing research work. In this work, NER for Myanmar language is treated as a sequence tagging problem and the effectiveness of deep neural networks on NER for Myanmar language has been investigated. Experiments are performed by applying deep neural network architectures on syllable level Myanmar contexts. Very first manually annotated NER corpus for Myanmar language is also constructed and proposed. In developing our in-house NER corpus, sentences from online news website and also sentences supported from ALT-Parallel-Corpus are also used. This ALT corpus is one part of the Asian Language Treebank (ALT) project under ASEAN IVO. This paper contributes the first evaluation of neural network models on NER task for Myanmar language. The experimental results show that those neural sequence models can produce promising results compared to the baseline CRF model. Among those neural architectures, bidirectional LSTM network added CRF layer above gives the highest F-score value. This work also aims to discover the effectiveness of neural network approaches to Myanmar textual processing as well as to promote further researches on this understudied language.
Automatic text summarization is the process of reducing the text content and retaining the
important points of the document. Generally, there are two approaches for automatic text summarization:
Extractive and Abstractive. The process of extractive based text summarization can be divided into two
phases: pre-processing and processing. In this paper, we discuss some of the extractive based text
summarization approaches used by researchers. We also provide the features for extractive based text
summarization process. We also present the available linguistic preprocessing tools with their features,
which are used for automatic text summarization. The tools and parameters useful for evaluating the
generated summary are also discussed in this paper. Moreover, we explain our proposed lexical chain
analysis approach, with sample generated lexical chains, for extractive based automatic text summarization.
We also provide the evaluation results of our system generated summary. The proposed lexical chain
analysis approach can be used to solve different text mining problems like topic classification, sentiment
analysis, and summarization.
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...Hiroki Shimanaka
This document summarizes the paper "Supervised Learning of Universal Sentence Representations from Natural Language Inference Data". It discusses how the researchers trained sentence embeddings using supervised data from the Stanford Natural Language Inference dataset. They tested several sentence encoder architectures and found that a BiLSTM network with max pooling produced the best performing universal sentence representations, outperforming prior unsupervised methods on 12 transfer tasks. The sentence representations learned from the natural language inference data consistently achieved state-of-the-art performance across multiple downstream tasks.
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...IJERA Editor
This research paper reports preliminary results of data-driven modeling of segmentalphoneme duration for
Marathi. Classification and Regression Tree based data driven duration modeling for segmental duration
prediction is presented. A number of features are considered and their usefulness and relative contribution for
segmental duration prediction is assessed. Objective evaluation of the duration model, by root mean squared
prediction error and correlation between actual and predicted durations, is performed.
Indexing of Arabic documents automatically based on lexical analysis kevig
The continuous information explosion through the Internet and all information sources makes it
necessary to perform all information processing activities automatically in quick and reliable
manners. In this paper, we proposed and implemented a method to automatically create and Index
for books written in Arabic language. The process depends largely on text summarization and
abstraction processes to collect main topics and statements in the book. The process is developed
in terms of accuracy and performance and results showed that this process can effectively replace
the effort of manually indexing books and document, a process that can be very useful in all
information processing and retrieval applications.
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...ijnlc
This article introduces a methodology for analyzing sentiment in Arabic text using a global foreign lexical
source. Our method leverages the available resource in another language such as the SentiWordNet in
English to the limited language resource that is Arabic. The knowledge that is taken from the external
resource will be injected into the feature model whilethe machine-learning-based classifier is trained. The
first step of our method is to build the bag-of-words (BOW) model of the Arabic text. The second step
calculates the score of polarity using translation machine technique and English SentiWordNet. The scores
for each text will be added to the model in three pairs for objective, positive, and negative. The last step of
our method involves training the ML classifier on that model to predict the sentiment of the Arabic text.
Our method increases the performance compared with the baseline model that is BOW in most cases. In
addition, it seems a viable approach to sentiment analysis in Arabic text where there is limitation of the
available resource.
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSkevig
This article focuses on evaluating and comparing the available feature selection methods in general versatility regarding authorship attribution problems and tries to identify which method is the most effective. The discussions on general versatility of feature selection methods and its connection in selecting the appropriate features for varying data were done. In addition, different languages, different types of features, different systems for calculating the accuracy of SVM (support vector machine), and different criteria for determining the rank of feature selection methods were used to measure the general versatility of these methods together. The analysis results indicate the best feature selection method is different for each dataset; however, some methods can always extract useful information to discriminate the classes. The chi-square was proved to be a better method overall.
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSkevig
Text analysis has been attracting increasing attention in this data era. Selecting effective features from
datasets is a particular important part in text classification studies. Feature selection excludes irrelevant
features from the classification task, reduces the dimensionality of a dataset, and improves the accuracy
and performance of identification. So far, so many feature selection methods have been proposed, however,
it remains unclear which method is the most effective in practice. This article focuses on evaluating and
comparing the available feature selection methods in general versatility regarding authorship attribution
problems and tries to identify which method is the most effective. The discussions on general versatility of
feature selection methods and its connection in selecting the appropriate features for varying data were
done. In addition, different languages, different types of features, different systems for calculating the
accuracy of SVM (support vector machine), and different criteria for determining the rank of feature
selection methods were used to measure the general versatility of these methods together. The analysis
results indicate the best feature selection method is different for each dataset; however, some methods can
always extract useful information to discriminate the classes. The chi-square was proved to be a better
method overall.
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSijnlc
Text analysis has been attracting increasing attention in this data era. Selecting effective features from datasets is a particular important part in text classification studies. Feature selection excludes irrelevant features from the classification task, reduces the dimensionality of a dataset, and improves the accuracy and performance of identification. So far, so many feature selection methods have been proposed, however,
it remains unclear which method is the most effective in practice. This article focuses on evaluating and comparing the available feature selection methods in general versatility regarding authorship attribution problems and tries to identify which method is the most effective. The discussions on general versatility of feature selection methods and its connection in selecting the appropriate features for varying data were
done. In addition, different languages, different types of features, different systems for calculating the accuracy of SVM (support vector machine), and different criteria for determining the rank of feature selection methods were used to measure the general versatility of these methods together. The analysis
results indicate the best feature selection method is different for each dataset; however, some methods can always extract useful information to discriminate the classes. The chi-square was proved to be a better method overall.
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans. In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
The project is developed as a part of IRE course work @IIIT-Hyderabad.
Team members:
Aishwary Gupta (201302216)
B Prabhakar (201505618)
Sahil Swami (201302071)
Links:
https://github.com/prabhakar9885/Text-Summarization
http://prabhakar9885.github.io/Text-Summarization/
https://www.youtube.com/playlist?list=PLtBx4kn8YjxJUGsszlev52fC1Jn07HkUw
http://www.slideshare.net/prabhakar9885/text-summarization-60954970
https://www.dropbox.com/sh/uaxc2cpyy3pi97z/AADkuZ_24OHVi3PJmEAziLxha?dl=0
Extracts objects from the entire collection, without modifying the objects themselves. where the goal is to select whole sentences (without modifying them)
An exploratory research on grammar checking of Bangla sentences using statist...IJECEIAES
N-gram based language models are very popular and extensively used statistical methods for solving various natural language processing problems including grammar checking. Smoothing is one of the most effective techniques used in building a language model to deal with data sparsity problem. Kneser-Ney is one of the most prominently used and successful smoothing technique for language modelling. In our previous work, we presented a Witten-Bell smoothing based language modelling technique for checking grammatical correctness of Bangla sentences which showed promising results outperforming previous methods. In this work, we proposed an improved method using Kneser-Ney smoothing based n-gram language model for grammar checking and performed a comparative performance analysis between Kneser-Ney and Witten-Bell smoothing techniques for the same purpose. We also provided an improved technique for calculating the optimum threshold which further enhanced the the results. Our experimental results show that, Kneser-Ney outperforms Witten-Bell as a smoothing technique when used with n-gram LMs for checking grammatical correctness of Bangla sentences.
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...Lifeng (Aaron) Han
This document discusses developing a statistical approach for measuring confidence intervals in translation quality evaluation and post-editing distance. It proposes modeling errors as independent binomial distributions and using Monte Carlo simulations to determine confidence intervals for different sample sizes. The simulations show that with samples of 100 sentences or less, the 95% confidence interval is too broad to reliably measure quality. A minimum sample of 30 pages is recommended to achieve a reasonable confidence level and narrower interval. Understanding confidence intervals provides a measure of reliability for translation quality scores.
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
The document provides an overview of machine translation evaluation (MTE). It discusses existing MTE methods like BLEU, METEOR, WER, and their weaknesses. The author's thesis proposes a new metric called LEPOR that incorporates additional factors to address weaknesses. The additional factors include an enhanced length penalty, n-gram position difference penalty, and tunable parameters to handle cross-language performance differences. The thesis will experiment with LEPOR on various language pairs and shared tasks to evaluate its performance.
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET Journal
This document discusses deep learning approaches for identifying phrase structures in sentences. It begins with an introduction to natural language processing and phrase structure grammar. Traditional n-gram and rule-based approaches to phrase structure identification are described. Recent deep learning methods for natural language tasks that have been applied to phrase structure identification are then summarized, including word embeddings, convolutional neural networks, recurrent neural networks and recursive neural networks. The document concludes that deep learning requires less manual feature engineering and has achieved good performance on many NLP tasks, but still has room for improvement, especially on tasks involving unlabeled data.
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...TELKOMNIKA JOURNAL
This document summarizes a research paper that proposed improving sentiment analysis of short informal Indonesian product reviews using synonym-based feature expansion. The paper developed an automatic sentiment analysis system using Naive Bayes classification and feature expansion. It first preprocessed texts through normalization, then used an API to find synonyms and expand text features. Experiments showed the proposed method improved sentiment analysis accuracy of short reviews to 98%, and that feature expansion helped more with small training datasets. The best performance was with 400 training examples using expansion.
The document discusses two neural network models for reading comprehension tasks: the Attentive Reader model proposed by Herman et al. in 2015 and the Stanford Reader model proposed by Chen et al. in 2016. The author implemented a two-layer attention model inspired by these previous models that achieves a 1.5% higher accuracy on reading comprehension tasks compared to the Stanford Reader.
This document provides an overview of natural language processing (NLP) research trends presented at ACL 2020, including shifting away from large labeled datasets towards unsupervised and data augmentation techniques. It discusses the resurgence of retrieval models combined with language models, the focus on explainable NLP models, and reflections on current achievements and limitations in the field. Key papers on BERT and XLNet are summarized, outlining their main ideas and achievements in advancing the state-of-the-art on various NLP tasks.
Anaphora resolution in hindi language using gazetteer methodijcsa
Anaphora resolution is one of the active research areas within the realm of natural language processing.
Resolution of anaphoric reference is one of the most challenging and complex task to be handled. This
paper completely emphasis on pronominal anaphora resolution for Hindi Language. There are various
methodologies for resolving anaphora. This paper presents a computational model for anaphora resolution
in Hindi that is based on Gazetteer method. Gazetteer method is a creation of lists and then applies
operations to classify elements present in the list. There are many salient factors for resolving anaphora.
The proposed model resolves anaphora by using two factors that is Animistic and Recency. Animistic factor
always represent living things and non living things whereas Recency describes that the referents
mentioned in current sentence tends to have higher weights than those in previous sentence. This paper
demonstrate the experiments conducted on short Hindi stories ,news articles and biography content from
Wikipedia, its result & future directions to improve accuracy.
A survey on phrase structure learning methods for text classificationijnlc
Text classification is a task of automatic classification of text into one of the predefined categories. The
problem of text classification has been widely studied in different communities like natural language
processing, data mining and information retrieval. Text classification is an important constituent in many
information management tasks like topic identification, spam filtering, email routing, language
identification, genre classification, readability assessment etc. The performance of text classification
improves notably when phrase patterns are used. The use of phrase patterns helps in capturing non-local
behaviours and thus helps in the improvement of text classification task. Phrase structure extraction is the
first step to continue with the phrase pattern identification. In this survey, detailed study of phrase structure
learning methods have been carried out. This will enable future work in several NLP tasks, which uses
syntactic information from phrase structure like grammar checkers, question answering, information
extraction, machine translation, text classification. The paper also provides different levels of classification
and detailed comparison of the phrase structure learning methods.
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGEijnlc
Named Entity Recognition (NER) for Myanmar Language is essential to Myanmar natural language processing research work. In this work, NER for Myanmar language is treated as a sequence tagging problem and the effectiveness of deep neural networks on NER for Myanmar language has been investigated. Experiments are performed by applying deep neural network architectures on syllable level Myanmar contexts. Very first manually annotated NER corpus for Myanmar language is also constructed and proposed. In developing our in-house NER corpus, sentences from online news website and also sentences supported from ALT-Parallel-Corpus are also used. This ALT corpus is one part of the Asian Language Treebank (ALT) project under ASEAN IVO. This paper contributes the first evaluation of neural network models on NER task for Myanmar language. The experimental results show that those neural sequence models can produce promising results compared to the baseline CRF model. Among those neural architectures, bidirectional LSTM network added CRF layer above gives the highest F-score value. This work also aims to discover the effectiveness of neural network approaches to Myanmar textual processing as well as to promote further researches on this understudied language.
Automatic text summarization is the process of reducing the text content and retaining the
important points of the document. Generally, there are two approaches for automatic text summarization:
Extractive and Abstractive. The process of extractive based text summarization can be divided into two
phases: pre-processing and processing. In this paper, we discuss some of the extractive based text
summarization approaches used by researchers. We also provide the features for extractive based text
summarization process. We also present the available linguistic preprocessing tools with their features,
which are used for automatic text summarization. The tools and parameters useful for evaluating the
generated summary are also discussed in this paper. Moreover, we explain our proposed lexical chain
analysis approach, with sample generated lexical chains, for extractive based automatic text summarization.
We also provide the evaluation results of our system generated summary. The proposed lexical chain
analysis approach can be used to solve different text mining problems like topic classification, sentiment
analysis, and summarization.
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...Hiroki Shimanaka
This document summarizes the paper "Supervised Learning of Universal Sentence Representations from Natural Language Inference Data". It discusses how the researchers trained sentence embeddings using supervised data from the Stanford Natural Language Inference dataset. They tested several sentence encoder architectures and found that a BiLSTM network with max pooling produced the best performing universal sentence representations, outperforming prior unsupervised methods on 12 transfer tasks. The sentence representations learned from the natural language inference data consistently achieved state-of-the-art performance across multiple downstream tasks.
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...IJERA Editor
This research paper reports preliminary results of data-driven modeling of segmentalphoneme duration for
Marathi. Classification and Regression Tree based data driven duration modeling for segmental duration
prediction is presented. A number of features are considered and their usefulness and relative contribution for
segmental duration prediction is assessed. Objective evaluation of the duration model, by root mean squared
prediction error and correlation between actual and predicted durations, is performed.
Indexing of Arabic documents automatically based on lexical analysis kevig
The continuous information explosion through the Internet and all information sources makes it
necessary to perform all information processing activities automatically in quick and reliable
manners. In this paper, we proposed and implemented a method to automatically create and Index
for books written in Arabic language. The process depends largely on text summarization and
abstraction processes to collect main topics and statements in the book. The process is developed
in terms of accuracy and performance and results showed that this process can effectively replace
the effort of manually indexing books and document, a process that can be very useful in all
information processing and retrieval applications.
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...ijnlc
This article introduces a methodology for analyzing sentiment in Arabic text using a global foreign lexical
source. Our method leverages the available resource in another language such as the SentiWordNet in
English to the limited language resource that is Arabic. The knowledge that is taken from the external
resource will be injected into the feature model whilethe machine-learning-based classifier is trained. The
first step of our method is to build the bag-of-words (BOW) model of the Arabic text. The second step
calculates the score of polarity using translation machine technique and English SentiWordNet. The scores
for each text will be added to the model in three pairs for objective, positive, and negative. The last step of
our method involves training the ML classifier on that model to predict the sentiment of the Arabic text.
Our method increases the performance compared with the baseline model that is BOW in most cases. In
addition, it seems a viable approach to sentiment analysis in Arabic text where there is limitation of the
available resource.
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSkevig
This article focuses on evaluating and comparing the available feature selection methods in general versatility regarding authorship attribution problems and tries to identify which method is the most effective. The discussions on general versatility of feature selection methods and its connection in selecting the appropriate features for varying data were done. In addition, different languages, different types of features, different systems for calculating the accuracy of SVM (support vector machine), and different criteria for determining the rank of feature selection methods were used to measure the general versatility of these methods together. The analysis results indicate the best feature selection method is different for each dataset; however, some methods can always extract useful information to discriminate the classes. The chi-square was proved to be a better method overall.
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSkevig
Text analysis has been attracting increasing attention in this data era. Selecting effective features from
datasets is a particular important part in text classification studies. Feature selection excludes irrelevant
features from the classification task, reduces the dimensionality of a dataset, and improves the accuracy
and performance of identification. So far, so many feature selection methods have been proposed, however,
it remains unclear which method is the most effective in practice. This article focuses on evaluating and
comparing the available feature selection methods in general versatility regarding authorship attribution
problems and tries to identify which method is the most effective. The discussions on general versatility of
feature selection methods and its connection in selecting the appropriate features for varying data were
done. In addition, different languages, different types of features, different systems for calculating the
accuracy of SVM (support vector machine), and different criteria for determining the rank of feature
selection methods were used to measure the general versatility of these methods together. The analysis
results indicate the best feature selection method is different for each dataset; however, some methods can
always extract useful information to discriminate the classes. The chi-square was proved to be a better
method overall.
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSijnlc
Text analysis has been attracting increasing attention in this data era. Selecting effective features from datasets is a particular important part in text classification studies. Feature selection excludes irrelevant features from the classification task, reduces the dimensionality of a dataset, and improves the accuracy and performance of identification. So far, so many feature selection methods have been proposed, however,
it remains unclear which method is the most effective in practice. This article focuses on evaluating and comparing the available feature selection methods in general versatility regarding authorship attribution problems and tries to identify which method is the most effective. The discussions on general versatility of feature selection methods and its connection in selecting the appropriate features for varying data were
done. In addition, different languages, different types of features, different systems for calculating the accuracy of SVM (support vector machine), and different criteria for determining the rank of feature selection methods were used to measure the general versatility of these methods together. The analysis
results indicate the best feature selection method is different for each dataset; however, some methods can always extract useful information to discriminate the classes. The chi-square was proved to be a better method overall.
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans. In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAlijnlc
The rise of social media such as blogs and social n
etworks has fueled interest in sentiment analysis.
With
the proliferation of reviews, ratings, recommendati
ons and other forms of online expression, online op
inion
has turned into a kind of virtual currency for busi
nesses looking to market their products, identify n
ew
opportunities and manage their reputations, therefo
re many are now looking to the field of sentiment
analysis. In this paper, we present a feature-based
sentence level approach for Arabic sentiment analy
sis.
Our approach is using Arabic idioms/saying phrases
lexicon as a key importance for improving the
detection of the sentiment polarity in Arabic sente
nces as well as a number of novels and rich set of
linguistically motivated features (contextual Inten
sifiers, contextual Shifter and negation handling),
syntactic features for conflicting phrases which en
hance the sentiment classification accuracy.
Furthermore, we introduce an automatic expandable w
ide coverage polarity lexicon of Arabic sentiment
words. The lexicon is built with gold-standard sent
iment words as a seed which is manually collected a
nd
annotated and it expands and detects the sentiment
orientation automatically of new sentiment words us
ing
synset aggregation technique and free online Arabic
lexicons and thesauruses. Our data focus on modern
standard Arabic (MSA) and Egyptian dialectal Arabic
tweets and microblogs (hotel reservation, product
reviews, etc.). The experimental results using our
resources and techniques with SVM classifier indica
te
high performance levels, with accuracies of over 95
%.
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)ijnlc
Extracting key phrases from documents is a common task in many applications. In general: The Noun
Phrase Extractor consists of three modules: tokenization; part-of-speech tagging; noun phrase
identification. These will be used as three main steps in building the new system ANPE, This paper aims at
picking Arabic Noun Phrases from a corpus of documents, Relevant criteria (Recall and Precision), will be
used as evaluation measure. On the one hand, when using NPs rather than using single terms, the system
yields more relevant documents from the retrieved ones, on the other hand, it gave low precision because
number of the retrieved documents will be decreased. At the researchers conclude and recommend
improvements for more effective and efficient research in the future.
A hybrid composite features based sentence level sentiment analyzerIAESIJAI
Current lexica and machine learning based sentiment analysis approaches
still suffer from a two-fold limitation. First, manual lexicon construction and
machine training is time consuming and error-prone. Second, the
prediction’s accuracy entails sentences and their corresponding training text
should fall under the same domain. In this article, we experimentally
evaluate four sentiment classifiers, namely support vector machines (SVMs),
Naive Bayes (NB), logistic regression (LR) and random forest (RF). We
quantify the quality of each of these models using three real-world datasets
that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic
movie reviews. Specifically, we study the impact of a variety of natural
language processing (NLP) pipelines on the quality of the predicted
sentiment orientations. Additionally, we measure the impact of incorporating
lexical semantic knowledge captured by WordNet on expanding original
words in sentences. Findings demonstrate that the utilizing different NLP
pipelines and semantic relationships impacts the quality of the sentiment
analyzers. In particular, results indicate that coupling lemmatization and
knowledge-based n-gram features proved to produce higher accuracy results.
With this coupling, the accuracy of the SVM classifier has improved to
90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three
other classifiers.
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...ijnlc
One of the main difficulties in sentiment analysis of the Arabic language is the presence of the
colloquialism. In this paper, we examine the effect of using objective words in conjunction with sentimental
words on sentiment classification for the colloquial Arabic reviews, specifically Jordanian colloquial
reviews. The reviews often include both sentimental and objective words; however, the most existing
sentiment analysis models ignore the objective words as they are considered useless. In this work, we
created tow lexicons: the first includes the colloquial sentimental words and compound phrases, while the
other contains the objective words associated with values of sentiment tendency based on a particular
estimation method. We used these lexicons to extract sentiment features that would be training input to the
Support Vector Machines (SVM) to classify the sentiment polarity of the reviews. The reviews dataset have
been collected manually from JEERAN website. The results of the experiments show that the proposed
approach improves the polarity classification in comparison to two baseline models, with accuracy 95.6%.
In this paper we present gender and authorship categorisationusing the Prediction by Partial Matching (PPM) compression scheme for text from Twitter written in Arabic. The PPMD variant of the compression scheme with different orders was used to perform the categorisation. We also applied different machine learning algorithms such as Multinational Naïve Bayes (MNB), K-Nearest Neighbours (KNN), and an implementation of Support Vector Machine (LIBSVM), applying the same processing steps for all the algorithms. PPMD shows significantly better accuracy in comparison to all the other machine learning algorithms, with order 11 PPMD working best, achieving 90 % and 96% accuracy for gender and authorship respectively.
In this paper we present gender and authorship categorisationusing the Prediction by Partial Matching
(PPM) compression scheme for text from Twitter written in Arabic. The PPMD variant of the compression
scheme with different orders was used to perform the categorisation. We also applied different machine
learning algorithms such as Multinational Naïve Bayes (MNB), K-Nearest Neighbours (KNN), and an
implementation of Support Vector Machine (LIBSVM), applying the same processing steps for all the
algorithms. PPMD shows significantly better accuracy in comparison to all the other machine learning
algorithms, with order 11 PPMD working best, achieving 90 % and 96% accuracy for gender and
authorship respectively.
GENDER AND AUTHORSHIP CATEGORISATION OF ARABIC TEXT FROM TWITTER USING PPMijcsit
In this paper we present gender and authorship categorisation using the Prediction by Partial Matching(PPM) compression scheme for text from Twitter written in Arabic. The PPMD variant of the compression scheme with different orders was used to perform the categorisation. We also applied different machine learning algorithms such as Multinational Naïve Bayes (MNB), K-Nearest Neighbours (KNN), and an
implementation of Support Vector Machine (LIBSVM), applying the same processing steps for all the algorithms. PPMD shows significantly better accuracy in comparison to all the other machine learning algorithms, with order 11 PPMD working best, achieving 90 % and 96% accuracy for gender and
authorship respectively.
Arabic tweeps dialect prediction based on machine learning approach IJECEIAES
In this paper, we present our approach for profiling Arabic authors on Twitter, based on their tweets. We consider here the dialect of an Arabic author as an important trait to be predicted. For this purpose, many indicators, feature vectors and machine learning-based classifiers were implemented. The results of these classifiers were compared to find out the best dialect prediction model. The best dialect prediction model was obtained using random forest classifier with full forms and their stems as feature vector.
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
The article represents the Sentiment Analysis (SA) and Tense Classification using Skip gram model for the word to vector encoding on Nepali language. The experiment on SA for positive-negative classification is carried out in two ways. In the first experiment the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP) classification and it is observed that the F1 score of 0.6486 is achieved for positive-negative classification with overall accuracy of 68%. Whereas in the second experiment the verb chunks are extracted using Nepali parser and carried out the similar experiment on the verb chunks. F1 scores of 0.6779 is observed for positive -negative classification with overall accuracy of 85%. Hence, Chunker based sentiment analysis is proven to be better than sentiment analysis using sentences. This paper also proposes using a skip-gram model to identify the tenses of Nepali sentences and verbs. In the third experiment, the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP)classification and it is observed that verb chunks had very low overall accuracy of 53%. In the fourth experiment, conducted for Tense Classification using Sentences resulted in improved efficiency with overall accuracy of 89%. Past tenses were identified and classified more accurately than other tenses. Hence, sentence based tense classification is proven to be better than verb Chunker based sentiment analysis.
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
The article represents the Sentiment Analysis (SA) and Tense Classification using Skip gram model for the word to vector encoding on Nepali language. The experiment on SA for positive-negative classification is carried out in two ways. In the first experiment the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP) classification and it is observed that the F1 score of 0.6486 is achieved for positive-negative classification with overall accuracy of 68%. Whereas in the second experiment the verb chunks are extracted using Nepali parser and carried out the similar experiment on the verb chunks. F1 scores of 0.6779 is observed for positive -negative classification with overall accuracy of 85%. Hence, Chunker based sentiment analysis is proven to be better than sentiment analysis using sentences. This paper also proposes using a skip-gram model to identify the tenses of Nepali sentences and verbs. In the third experiment, the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP)classification and it is observed that verb chunks had very low overall accuracy of 53%. In the fourth experiment, conducted for Tense Classification using Sentences resulted in improved efficiency with overall accuracy of 89%. Past tenses were identified and classified more accurately than other tenses. Hence, sentence based tense classification is proven to be better than verb Chunker based sentiment analysis.
Using automated lexical resources in arabic sentence subjectivityijaia
This document describes research on developing an Arabic sentiment lexicon using an existing Arabic lexical semantic database called RDI. The researchers generated a sentiment lexicon called SentiRDI by determining the subjectivity and polarity of words and semantic fields in the RDI database. They used a seed list of sentiment words and semantic relations in RDI, like synonyms and antonyms, to propagate sentiment scores. The system achieved 76.1% accuracy in classifying sentences as subjective or objective using multiple machine learning algorithms and an Arabic language corpus. Key challenges in building Arabic sentiment resources included the lack of semantic databases and the morphological complexity of the Arabic language.
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...kevig
Neural machine translation is a new approach to machine translation that has shown the effective results
for high-resource languages. Recently, the attention-based neural machine translation with the large scale
parallel corpus plays an important role to achieve high performance for translation results. In this
research, a parallel corpus for Myanmar-English language pair is prepared and attention-based neural
machine translation models are introduced based on word to word level, character to word level, and
syllable to word level. We do the experiments of the proposed model to translate the long sentences and to
address morphological problems. To decrease the low resource problem, source side monolingual data are
also used. So, this work investigates to improve Myanmar to English neural machine translation system.
The experimental results show that syllable to word level neural mahine translation model obtains an
improvement over the baseline systems.
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...ijnlc
Neural machine translation is a new approach to machine translation that has shown the effective results
for high-resource languages. Recently, the attention-based neural machine translation with the large scale
parallel corpus plays an important role to achieve high performance for translation results. In this
research, a parallel corpus for Myanmar-English language pair is prepared and attention-based neural
machine translation models are introduced based on word to word level, character to word level, and
syllable to word level. We do the experiments of the proposed model to translate the long sentences and to
address morphological problems. To decrease the low resource problem, source side monolingual data are
also used. So, this work investigates to improve Myanmar to English neural machine translation system.
The experimental results show that syllable to word level neural mahine translation model obtains an
improvement over the baseline systems.
Sentiment analysis is an important current research area. The demand for sentiment analysis and classification is growing day by day; this paper presents a novel method to classify Urdu documents as previously no work recorded on sentiment classification for Urdu text. We consider the problem by determining whether the review or sentence is positive, negative or neutral. For the purpose we use two machine learning methods Naïve Bayes and Support Vector Machines (SVM) . Firstly the documents are preprocessed and the sentiments features are extracted, then the polarity has been calculated, judged and classify through Machine learning methods.
Similar to Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sentiments Keywords Based Technique (20)
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR cscpconf
The progressive development of Synthetic Aperture Radar (SAR) systems diversify the exploitation of the generated images by these systems in different applications of geoscience. Detection and monitoring surface deformations, procreated by various phenomena had benefited from this evolution and had been realized by interferometry (InSAR) and differential interferometry (DInSAR) techniques. Nevertheless, spatial and temporal decorrelations of the interferometric couples used, limit strongly the precision of analysis results by these techniques. In this context, we propose, in this work, a methodological approach of surface deformation detection and analysis by differential interferograms to show the limits of this technique according to noise quality and level. The detectability model is generated from the deformation signatures, by simulating a linear fault merged to the images couples of ERS1 / ERS2 sensors acquired in a region of the Algerian south.
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATIONcscpconf
A novel based a trajectory-guided, concatenating approach for synthesizing high-quality image real sample renders video is proposed . The lips reading automated is seeking for modeled the closest real image sample sequence preserve in the library under the data video to the HMM predicted trajectory. The object trajectory is modeled obtained by projecting the face patterns into an KDA feature space is estimated. The approach for speaker's face identification by using synthesise the identity surface of a subject face from a small sample of patterns which sparsely each the view sphere. An KDA algorithm use to the Lip-reading image is discrimination, after that work consisted of in the low dimensional for the fundamental lip features vector is reduced by using the 2D-DCT.The mouth of the set area dimensionality is ordered by a normally reduction base on the PCA to obtain the Eigen lips approach, their proposed approach by[33]. The subjective performance results of the cost function under the automatic lips reading modeled , which wasn’t illustrate the superior performance of the
method.
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...cscpconf
Universities offer software engineering capstone course to simulate a real world-working environment in which students can work in a team for a fixed period to deliver a quality product. The objective of the paper is to report on our experience in moving from Waterfall process to Agile process in conducting the software engineering capstone project. We present the capstone course designs for both Waterfall driven and Agile driven methodologies that highlight the structure, deliverables and assessment plans.To evaluate the improvement, we conducted a survey for two different sections taught by two different instructors to evaluate students’ experience in moving from traditional Waterfall model to Agile like process. Twentyeight students filled the survey. The survey consisted of eight multiple-choice questions and an open-ended question to collect feedback from students. The survey results show that students were able to attain hands one experience, which simulate a real world-working environment. The results also show that the Agile approach helped students to have overall better design and avoid mistakes they have made in the initial design completed in of the first phase of the capstone project. In addition, they were able to decide on their team capabilities, training needs and thus learn the required technologies earlier which is reflected on the final product quality
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIEScscpconf
This document discusses using social media technologies to promote student engagement in a software project management course. It describes the course and objectives of enhancing communication. It discusses using Facebook for 4 years, then switching to WhatsApp based on student feedback, and finally introducing Slack to enable personalized team communication. Surveys found students engaged and satisfied with all three tools, though less familiar with Slack. The conclusion is that social media promotes engagement but familiarity with the tool also impacts satisfaction.
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGICcscpconf
In real world computing environment with using a computer to answer questions has been a human dream since the beginning of the digital era, Question-answering systems are referred to as intelligent systems, that can be used to provide responses for the questions being asked by the user based on certain facts or rules stored in the knowledge base it can generate answers of questions asked in natural , and the first main idea of fuzzy logic was to working on the problem of computer understanding of natural language, so this survey paper provides an overview on what Question-Answering is and its system architecture and the possible relationship and
different with fuzzy logic, as well as the previous related research with respect to approaches that were followed. At the end, the survey provides an analytical discussion of the proposed QA models, along or combined with fuzzy logic and their main contributions and limitations.
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS cscpconf
Human beings generate different speech waveforms while speaking the same word at different times. Also, different human beings have different accents and generate significantly varying speech waveforms for the same word. There is a need to measure the distances between various words which facilitate preparation of pronunciation dictionaries. A new algorithm called Dynamic Phone Warping (DPW) is presented in this paper. It uses dynamic programming technique for global alignment and shortest distance measurements. The DPW algorithm can be used to enhance the pronunciation dictionaries of the well-known languages like English or to build pronunciation dictionaries to the less known sparse languages. The precision measurement experiments show 88.9% accuracy.
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS cscpconf
In education, the use of electronic (E) examination systems is not a novel idea, as Eexamination systems have been used to conduct objective assessments for the last few years. This research deals with randomly designed E-examinations and proposes an E-assessment system that can be used for subjective questions. This system assesses answers to subjective questions by finding a matching ratio for the keywords in instructor and student answers. The matching ratio is achieved based on semantic and document similarity. The assessment system is composed of four modules: preprocessing, keyword expansion, matching, and grading. A survey and case study were used in the research design to validate the proposed system. The examination assessment system will help instructors to save time, costs, and resources, while increasing efficiency and improving the productivity of exam setting and assessments.
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTICcscpconf
African Buffalo Optimization (ABO) is one of the most recent swarms intelligence based metaheuristics. ABO algorithm is inspired by the buffalo’s behavior and lifestyle. Unfortunately, the standard ABO algorithm is proposed only for continuous optimization problems. In this paper, the authors propose two discrete binary ABO algorithms to deal with binary optimization problems. In the first version (called SBABO) they use the sigmoid function and probability model to generate binary solutions. In the second version (called LBABO) they use some logical operator to operate the binary solutions. Computational results on two knapsack problems (KP and MKP) instances show the effectiveness of the proposed algorithm and their ability to achieve good and promising solutions.
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAINcscpconf
In recent years, many malware writers have relied on Dynamic Domain Name Services (DDNS) to maintain their Command and Control (C&C) network infrastructure to ensure a persistence presence on a compromised host. Amongst the various DDNS techniques, Domain Generation Algorithm (DGA) is often perceived as the most difficult to detect using traditional methods. This paper presents an approach for detecting DGA using frequency analysis of the character distribution and the weighted scores of the domain names. The approach’s feasibility is demonstrated using a range of legitimate domains and a number of malicious algorithmicallygenerated domain names. Findings from this study show that domain names made up of English characters “a-z” achieving a weighted score of < 45 are often associated with DGA. When a weighted score of < 45 is applied to the Alexa one million list of domain names, only 15% of the domain names were treated as non-human generated.
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...cscpconf
The document proposes a blockchain-based digital currency and streaming platform called GoMAA to address issues of piracy in the online music streaming industry. Key points:
- GoMAA would use a digital token on the iMediaStreams blockchain to enable secure dissemination and tracking of streamed content. Content owners could control access and track consumption of released content.
- Original media files would be converted to a Secure Portable Streaming (SPS) format, embedding watermarks and smart contract data to indicate ownership and enable validation on the blockchain.
- A browser plugin would provide wallets for fans to collect GoMAA tokens as rewards for consuming content, incentivizing participation and addressing royalty discrepancies by recording
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMcscpconf
This document discusses the importance of verb suffix mapping in discourse translation from English to Telugu. It explains that after anaphora resolution, the verbs must be changed to agree with the gender, number, and person features of the subject or anaphoric pronoun. Verbs in Telugu inflect based on these features, while verbs in English only inflect based on number and person. Several examples are provided that demonstrate how the Telugu verb changes based on whether the subject or pronoun is masculine, feminine, neuter, singular or plural. Proper verb suffix mapping is essential for generating natural and coherent translations while preserving the context and meaning of the original discourse.
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...cscpconf
In this paper, based on the definition of conformable fractional derivative, the functional
variable method (FVM) is proposed to seek the exact traveling wave solutions of two higherdimensional
space-time fractional KdV-type equations in mathematical physics, namely the
(3+1)-dimensional space–time fractional Zakharov-Kuznetsov (ZK) equation and the (2+1)-
dimensional space–time fractional Generalized Zakharov-Kuznetsov-Benjamin-Bona-Mahony
(GZK-BBM) equation. Some new solutions are procured and depicted. These solutions, which
contain kink-shaped, singular kink, bell-shaped soliton, singular soliton and periodic wave
solutions, have many potential applications in mathematical physics and engineering. The
simplicity and reliability of the proposed method is verified.
AUTOMATED PENETRATION TESTING: AN OVERVIEWcscpconf
The document discusses automated penetration testing and provides an overview. It compares manual and automated penetration testing, noting that automated testing allows for faster, more standardized and repeatable tests but has limitations in developing new exploits. It also reviews some current automated penetration testing methodologies and tools, including those using HTTP/TCP/IP attacks, linking common scanning tools, a Python-based tool targeting databases, and one using POMDPs for multi-step penetration test planning under uncertainty. The document concludes that automated testing is more efficient than manual for known vulnerabilities but cannot replace manual testing for discovering new exploits.
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORKcscpconf
Since the mid of 1990s, functional connectivity study using fMRI (fcMRI) has drawn increasing
attention of neuroscientists and computer scientists, since it opens a new window to explore
functional network of human brain with relatively high resolution. BOLD technique provides
almost accurate state of brain. Past researches prove that neuro diseases damage the brain
network interaction, protein- protein interaction and gene-gene interaction. A number of
neurological research paper also analyse the relationship among damaged part. By
computational method especially machine learning technique we can show such classifications.
In this paper we used OASIS fMRI dataset affected with Alzheimer’s disease and normal
patient’s dataset. After proper processing the fMRI data we use the processed data to form
classifier models using SVM (Support Vector Machine), KNN (K- nearest neighbour) & Naïve
Bayes. We also compare the accuracy of our proposed method with existing methods. In future,
we will other combinations of methods for better accuracy.
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...cscpconf
The document proposes a new validation method for fuzzy association rules based on three steps: (1) applying the EFAR-PN algorithm to extract a generic base of non-redundant fuzzy association rules using fuzzy formal concept analysis, (2) categorizing the extracted rules into groups, and (3) evaluating the relevance of the rules using structural equation modeling, specifically partial least squares. The method aims to address issues with existing fuzzy association rule extraction algorithms such as large numbers of extracted rules, redundancy, and difficulties with manual validation.
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAcscpconf
In many applications of data mining, class imbalance is noticed when examples in one class are
overrepresented. Traditional classifiers result in poor accuracy of the minority class due to the
class imbalance. Further, the presence of within class imbalance where classes are composed of
multiple sub-concepts with different number of examples also affect the performance of
classifier. In this paper, we propose an oversampling technique that handles between class and
within class imbalance simultaneously and also takes into consideration the generalization
ability in data space. The proposed method is based on two steps- performing Model Based
Clustering with respect to classes to identify the sub-concepts; and then computing the
separating hyperplane based on equal posterior probability between the classes. The proposed
method is tested on 10 publicly available data sets and the result shows that the proposed
method is statistically superior to other existing oversampling methods.
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCHcscpconf
Data collection is an essential, but manpower intensive procedure in ecological research. An
algorithm was developed by the author which incorporated two important computer vision
techniques to automate data cataloging for butterfly measurements. Optical Character
Recognition is used for character recognition and Contour Detection is used for imageprocessing.
Proper pre-processing is first done on the images to improve accuracy. Although
there are limitations to Tesseract’s detection of certain fonts, overall, it can successfully identify
words of basic fonts. Contour detection is an advanced technique that can be utilized to
measure an image. Shapes and mathematical calculations are crucial in determining the precise
location of the points on which to draw the body and forewing lines of the butterfly. Overall,
92% accuracy were achieved by the program for the set of butterflies measured.
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...cscpconf
Smart cities utilize Internet of Things (IoT) devices and sensors to enhance the quality of the city
services including energy, transportation, health, and much more. They generate massive
volumes of structured and unstructured data on a daily basis. Also, social networks, such as
Twitter, Facebook, and Google+, are becoming a new source of real-time information in smart
cities. Social network users are acting as social sensors. These datasets so large and complex
are difficult to manage with conventional data management tools and methods. To become
valuable, this massive amount of data, known as 'big data,' needs to be processed and
comprehended to hold the promise of supporting a broad range of urban and smart cities
functions, including among others transportation, water, and energy consumption, pollution
surveillance, and smart city governance. In this work, we investigate how social media analytics
help to analyze smart city data collected from various social media sources, such as Twitter and
Facebook, to detect various events taking place in a smart city and identify the importance of
events and concerns of citizens regarding some events. A case scenario analyses the opinions of
users concerning the traffic in three largest cities in the UAE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGEcscpconf
The anonymity of social networks makes it attractive for hate speech to mask their criminal
activities online posing a challenge to the world and in particular Ethiopia. With this everincreasing
volume of social media data, hate speech identification becomes a challenge in
aggravating conflict between citizens of nations. The high rate of production, has become
difficult to collect, store and analyze such big data using traditional detection methods. This
paper proposed the application of apache spark in hate speech detection to reduce the
challenges. Authors developed an apache spark based model to classify Amharic Facebook
posts and comments into hate and not hate. Authors employed Random forest and Naïve Bayes
for learning and Word2Vec and TF-IDF for feature selection. Tested by 10-fold crossvalidation,
the model based on word2vec embedding performed best with 79.83%accuracy. The
proposed method achieve a promising result with unique feature of spark for big data.
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTcscpconf
This article presents Part of Speech tagging for Nepali text using General Regression Neural
Network (GRNN). The corpus is divided into two parts viz. training and testing. The network is
trained and validated on both training and testing data. It is observed that 96.13% words are
correctly being tagged on training set whereas 74.38% words are tagged correctly on testing
data set using GRNN. The result is compared with the traditional Viterbi algorithm based on
Hidden Markov Model. Viterbi algorithm yields 97.2% and 40% classification accuracies on
training and testing data sets respectively. GRNN based POS Tagger is more consistent than the
traditional Viterbi decoding technique.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3Data Hops
Free A4 downloadable and printable Cyber Security, Social Engineering Safety and security Training Posters . Promote security awareness in the home or workplace. Lock them Out From training providers datahops.com
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
2. 108 Computer Science & Information Technology (CS & IT)
'negative', respectively. Some papers had added the case of 'neutral' to the above two polarities,
which make them three directions [1]. In this work, we had applied our proposed technique only
on the two first mentioned polarities, i.e. positive and negative.
SA is achieved using two popular approaches, machine learning sentiment analysis (MLSA), and
lexicon-based sentiment analysis (LBSA). MLSA is intended for supervised learning approaches
that need pre-annotated text data set, to train and test some selected features. On the other hand,
LBSA is considered as unsupervised learning approach, which is based on a pre-made sentiment
lexicon. That lexicon is used to count sentiment terms appearance on target text [1] [2].
Sentiment mining in Arabic language is faced numerous difficulties. These difficulties related to
some factors; firstly, the nature of Arabic as rich morphological language, and the varying of
Arabic dialects which are overlapped with the main form of Modern Arabic Standard, i.e. MSA,
especially in SN web pages. Secondly, the rare of researches in these new fields in Arabic. That
had led to the lack of resources and tools that may help to create more and big techniques.
Although, there were good efforts of Arabic researchers in the NLP, text mining, and sentiment
analysis, they need more works.
In this paper, we present an SA approach, that have implemented MLSA on Arabic language,
used resources had annotated into the two polarities, 'negative' and 'positive', however. We had
exploited LBSA in the purpose of improving MLSA. This had been done by using sentiment
lexicon as sentiment keywords list, which implemented as a feature selection method, which
helped for increasing the performance of supervised learning task on three corpora. We had
compared our approach with other traditional and recent approaches, and the results were very
promising.
Literature included some methods adopted Arabic text pre-processing. Our approach took an
advantage step, thus we begun with the much recommended pre-processing process, which
helped in removing corpus noise, and improved the computation complexity by reducing the size
of feature set variables, simultaneously. Further explanation of our methodology is in section 3,
which based on sentiment keywords co-occurrence measure (SKCM), SKCM is an algorithm we
had used as a measure of each corpus term polarity. It was not considered as term polarity, but,
sentiment weight which give better results when replaced with term appearance, and
outperformed term frequency that had been used in different figures in many literatures [3] [4].
Our selected feature method had tested in support vector machine (SVM) classifier. Thus, we had
classified each corpus of our three corpora three times. Firstly, via traditional feature selection
method. Then with our SKCM results. After that, we tested them with the combination of these
two FS methods. The results in all three corpora had showed the preference of SKCM results
versus both two mentioned methods results.
This paper has related works brief in section 2. Then methodology explanation is presented in
section 3, whereas section 4 shows experiments details and results illustration. Finally, we
concluded this paper at section 5.
2. RELATED WORKS
The work of [5] had built domain-specific sentiment dictionary, automatically. This domain was
movie reviews in Korean language. They used huge set of online movie reviews for finding out
the joint probability of dictionary words This joint probability is between words and their position
wither in negative annotated reviews, or positive annotated reviews. Then they give each word a
polarity which was the difference between those two joint probability values. At last, they had
3. Computer Science & Information Technology (CS & IT) 109
normalized these values and obtained 135,082 words that constitute their outcomes of sentiment
dictionary in Korean language. However, the method of [5], was gave each term a polarity value,
likewise our approach, but we didn't need annotated data set for finding that polarity, due to our
approach do not find that polarity by advanced knowledge of document polarity, where this term
appeared in.
The work of [6], had collected and prepared the biggest Arabic language opinion mining data set
to-date. It is large Arabic book reviews corpus LABR, which included over 63,000 text segments,
distributed between five rating values scaled from 1 and 2 for negative opinionated reviews, 3 as
neutral reviews, whereas 4 and 5 as positive sentiments. Then, they study the data set
characteristics, and testing the corpus for sentiment analysis. Here, we couldn't take the whole
LABR corpus. So, we just take randomly two parts of LABR, one is balanced corpus, and another
as unbalanced corpus.
[4] had exhibited one of the popular Arabic sentiment corpora. For the purpose of research, they
had called it OCA as opinion corpus for Arabic. OCA is consisted of just 500 movie reviews,
gathered from different Arabic blogs and pages in the web. OCA didn't include short text
segments. Thus it had articles that led to big number of unique words. They carried on OCA some
SA experiments using SVM and other classifiers. The features selection used was traditional
feature selection which based on terms frequency (TF), where they concluded that TF had the
same results of TFIDF, whether in the form of unigram, bigram, or trigram.
The work of [7], had utilized a mathematical tool based on rough set (RS) theory. They focused
on the differences between RS reducts in Arabic SA. They used Term Weighting using TFIDF as
a feature selection method, which based on counting terms per text segments, over all corpora.
Rosetta toolkit which is designated for RS was used, with cross validation evaluation. They
achieved an accuracy of 57% as the best result comparing other RS reducts. They concluded the
applicability of using RS for Arabic SA, which need more enhancements, especially, in regard of
reduct methods.
[3] had presented Arabic SA for Arabic tweets, they had collected Arabic dataset from twitter.
Their results showed that SVM classifier was outperformed NB classifier, where they had used
traditional feature selection that based on term frequency in the form of unigram and bigram.
They had presented also that eliminating stop-words enhanced the results of sentiment
classification accuracy. In our work, we re-implemented feature selection which based on
traditional methods, and comparing resulted accuracy with ours.
The work of [8] had exhibited the concept of semantic orientate on (SO) for text segments. This
concept was calculated based on counting the number of adjective or adverbs terms appeared in
text. Also, SO was resulted using Mutual Information relation between text and polarity advanced
known terms, such as "excellent" for positive polarity, or "poor" for negative polarity.
A text is considered as positive or negative based on SO calculated value. They achieved different
accuracies in different domain in English language. The concept of semantic orientation had been
used in literature in various applications such as in [9] [10] [8]. In our work, we didn't use SO as
it is in [8]. We satisfied with the value of terms co-occurrence which is easier to be calculated
than mutual information. And replaced the adjective (or adverb) usage, with sentiment keywords
list that we had generated in [2] from pre-existed sentiment lexicon [6]. Also, we calculated it for
terms not for segments of text. Then distributed resulted terms values over the binary matrix that
record the appearance of each term in the corpus. Finally, we compared these results with the
results of multiplying them by term frequency values related to each term in each text segment in
the corpus.
4. 110 Computer Science & Information Technology (CS & IT)
In [11], we had been implemented number of classifiers with number of feature selection
methods, comparing these accuracies resulted via number of corpus states. Different corpus states
had generated from the same original corpus, after implementing NLP varied operations. We had
performed the top accuracy results with the fifth corpus state and the feature selection method that
based on LBSA approach, where they give each text segment a weight. This weight is based on
the resulted weights from LBSA calculating, which count the number of sentiment lexicon terms
appeared in each text segment, and produce the difference between positive words count and
negative words count.
The feature selection in this work is different from the one used in [11]. It is actually improved
from it, where in [11] we just satisfied with LBSA weights. Nevertheless, we here had take care
of the occurrence of all corpus terms, utilizing the list of sentiment terms for give polarity of each
term instead of text segments. Then, for each text segment, we had employed these polarities as
FS vector.
In [12], seven feature selection methods had been compared, using SVM classifier via each one of
these seven methods, those seven methods are: (Principal Components Analysis, Chi-squared,
Relief-F, Gini Index, Support Vector Machines (SVMs), Uncertainty, and Information Gain).
OCA corpus had been tested. In all SA classification experiments, they found out that SA
performance is based on the method and the number of feature selection applied on it. Also, they
found out that SVM is better than other method, whether as feature selection method or in
classification accuracy.
They had deal with Arabic twitter for opinion mining in [13], by presenting a framework to
analyze Arabic tweets into three sentimental classes: negative, positive, and/or neutral. Handling
Arabizi, emoticons, and Arabic dialects was one of those sentiment framework aspects, which
had done after collecting a dataset from twitter. They had used SVM with other classifiers, since
they compare the results of stop-word eliminating and stemming of the corpus, using traditional
methods of feature selection that based on counting terms appearing in each tweet on the corpus.
In [14], they had tried to discover useful knowledge about the opinions of Tunisian users from
their posts on Facebook, and constructing an emoticons lexicon to be used in sentiment analysis
tasks. SVM and NB had been utilized in SA tasks, via feature selection method based on
traditional terms counting, where they had used popular terms appearance matrix, with the aid of
Part-Of Speech POS, which is defined the type of each corpus term, based on grammar context.
3. METHODOLOGY
We have a hypothesis supposed that, in a number of the same domain text segments, each term
has its own polarity, this polarity may be discovered by utilizing a list of sentiment keywords, and
find the bigram relation between text terms and this list of keywords. We don't consider it as term
polarity. We just used this value for feature selection method, which had led to enhance SA task.
In this work, we had increased the SA accuracy, using feature selection based on above
hypothesis, which achieved results outperformed traditional methods in two opinion mining
domains. The first domain is movie reviews, using OCA corpus collected by [4], since we used
the same corpus and achieved more accuracy measure than traditional methods that applied in [4],
we re-implemented it here for comparison purpose. The second domain is book reader reviews,
using part of the huge corpus, i.e. LABR [6], since we re-implemented traditional method used in
[3], and compared it with our approach which achieved better results.
Different literature had used different text pre-processing schemes; however, we determined the
best pre-processing scheme, as a rule of thumb. This scheme is applied on each used corpus
before any sentiment analysis had done, as illustrated in fig. 1. And we had noticed the efficient
5. Computer Science & Information Technology (CS & IT) 111
benefit of this scheme, whether in feature set size reduction, which decrease computation time, or
in the results of SA enhancement.
Our approach, generally, consisted of number of phases, illustrated in fig. 1, these phases are
included pre-processing steps which had further details in the following subsection.
Figure 1. proposed approach general phases
3.1. Stop-words elimination:
Stop-words elimination is done without eliminating negation terms. Where stop-words are terms
that known to do not have semantic affect on the written texts [3], specially, in the direction of
opinion expressions. However, we preserved some words, with special exception; those are called
negation terms, which are considered as sort of stop-words too. But they have an obvious impact
on the polarity of terms that come after any one of them. Negation terms examples are terms like:
'no', 'but', 'more', 'less', etc, in English or opposite terms in any language, some negation words
inverse the polarity of terms come after, whereas some of them affect the degree of the polarity.
3.2. Stemming:
We have done root stemming, where all unique terms, after stop-words elimination, are processed
in premade tool. That converts each term into its origin: stem or root, stemming operation helps to
reduce the number of terms. More precisely, those terms which have the same root. Stemming
also helped to enhance feature selection, as in [4], because terms of the same root would have the
same statistics, which improve semantic content on the feature selected, whereas most feature
selection techniques on text mining depended on terms occur one the text.
We had utilized stemming tool that adopted by [15], the tool they had name: Alkhalil morpheme
sys., which took the list of corpus unique terms as an input. Then, generated a list of terms, they
were the roots of each input term. Then, we had made some refining to these root-stems before
input them to SKCM algorithm.
6. 112 Computer Science & Information Technology (CS & IT)
3.3. Singleton terms pruning
Third pre-processing step was pruning corpus content. This was done by removing less impact
terms in whole corpus. We choose to prune terms that occurred only once inside all corpus
portions, pruning phase reduces the least significant terms in the corpus.
3.4. Traditional feature selection
The most popular method for text mining is using terms frequency TF as feature selection
method, although, there were various ways to use TF, whether as one term frequency, i.e.
unigram, or two terms as bigrams...etc. Whereas better TF results instance is TF-IDF, which had
replaced TF mostly.
Both TF and TF-IDF are tested in [4], with unigram, bigram, and trigram variations. The results
of them has supposed that no big differences between them via SA operations. This was
compatible with our empirical experiments. Although, our objective herein is to proof the
advantage of SKCM weights, comparing with TF weights traditional feature selection method.
3.5. SKCM algorithm
As illustrated in fig. 1, SKCM algorithm is implemented after the third pre-processing phase has
finished, in parallel with the implementation of traditional FS method. SKCM algorithm is listed
in fig. 2.
The outcome of SKCM algorithm had given each term a weight. This weight is different from the
polarity values that had found out in semantic orientation (SO) as adopted in [8] or [9], since OS
is used for constructing sentiment lexicons form seeds predefined polarity terms [9], or for
determining text polarity as in [8], whereas we used this weight as feature selection method for
SA. Moreover, we had used in SKCM distinct way to find terms weight, than mathematical way
in [8]. Our approach is based on corpus terms discrimination using sentiment keywords co-
occurrence measure (SKCM). This measure is used to find selected feature that depends on the
number of corpus terms co-occurrence with sentiment keywords used for this objective.
SKCM algorithm
Input:
crps; //target corpus reviews.
utrms; // corpus unique terms.
plst; // positive keywords list.
nlst; // negative keywords list.
tfm; //terms frequency matrix.
tdm; // terms appearance matrix.
Output:
skcm1; // skcm with tdm matrix.
skcm2; // skcm with tfm matrix.
tplrty; //terms polarity vector.
Begin
nt = count corpus unique terms.
nr = count corpus reviews.
for i is 1 to nt
neg=0;
//negative co-occurrence with ith term.
pos=0;
//positive co-occurrence with ith term.
for j is 1 to nr
if uterms(i) is existed in crps(j)
n = count nlst terms existed in crps(j).
7. Computer Science & Information Technology (CS & IT) 113
neg = neg + n.
p = count plst terms existed in crps(j).
pos = pos + p.
end if;
end for;
tplrty(i) = pos – neg;
end for;
skcm1 = {replace each term's 1 over tdm, with tplrty value of the
same term}.
skcm2 = {multiply each terms frequency over tfm, by tplrty value of
the same term}.
END.
Figure 2. SKCM algorithm
Sentiment keywords are brought from premade sentiment lexicon, which we had improved in [2].
This lexicon is stemmed and used not to count lexicon words as in LBSA [9]. These keywords
had extracted from a sentiment lexicon and stemmed to look for each keyword in each corpus
portion. And then it gives each term in that corpus a counter. This counter is increased in both
two polarities, i.e. positive and negative. Then for each corpus term, it have its own weight, which
resulted by subtracting the summation of all target term co-occurrences with negative keywords,
from the summation of all target term co-occurrences with positive keywords. Finally, as
illustrated in fig. 2, SKCM algorithm output two selected features. The first resulted in pure terms
weights, the second is the production of multiplying this results by term frequency that had been
calculated in the phase of traditional FS method, as illustrated in our approach in fig. 1.
4. EXPERIMENTS AND RESULTS
We had used support vector machine (SVM) classifier, which had achieved good results in SA
experiments. Also, it is a binary classifier that used to split each group of data into two categories,
which is very suitable for SA tasks.
Three data sets are used. The first one is opinion corpus in Arabic (OCA) that collected by [4].
The second, two corpora are generated from the very big corpus large Arabic book reviews
(LABR) [6], LABR consisted of 64000 reviews. Therefore, we, randomly, extracted two small
corpora from it, a balanced and unbalanced one. UNBSLABR is unbalanced corpus that consisted
of 2500 reviews, 1500 positive and 1000 negative. Whereas BLNSLABR is a balanced one which
consisted of 1000 positive reviews and 1000 negative reviews, corpora statistics are available in
table I.+
SA keywords list is based on SA lexicon that includes of positive bearing terms, and negative
bearing terms, it was improved by [2]. This list terms are stemmed and used directly with each
one of the three corpora.
Our experiments are begun with pre-processing steps, which applied to the three corpora. Then,
we implemented SVM classifier using traditional selected features, which based on terms
frequency (TF).
8. 114 Computer Science & Information Technology (CS & IT)
Table 1. Corpora statistics
Corpus Reviews Positive Negative Unique terms Domain
unbSlabr 2500 1500 1000 14955
Book
reviews
blnSlabr 2000 1000 1000 17801
Book
reviews
OCA 500 250 250 42586
Movie
reviews
Then, we applied SKCM algorithm, to assign each terms with a weight value that output from
SKCM. After that, we had distributed each resulted value over text segment, based on the
appearance of a term inside text segment, i.e. reviews text. These values will illustrate the relation
between each term and text segment that belongs to. Then we carried on classification process
using this selected feature.
Finally, for all corpora, we multiplied the selected feature of SKCM by TF values that used
previously. This step was to measure and compare the effect of SKCM. If their results are better
alone, or with traditional methods, feature selection based on terms frequency.
In all classification operations, we used SVM with cross validation k fold. We chose to use k=5
folds instead of 10 folds, due to computation complexity considerations.
Table II illustrates the results of our experiments. Where SKCM results are better than traditional
method, i.e. term frequency depending selected features, even if we combined between SKCM
results and TF, as illustrated in third column of table II, this achieved good results too. However,
SKCM method results still better, when used solely.
Table 2. SVM accuracy results comparison.
TF SKCM SKCM * TF
unbSlabr 74.20 % 79.30 % 78.00 %
blnSlabr 70.30 % 73.25 % 73.00 %
OCA 89.00 % 93.00 % 90.80 %
About the accuracy percentages, it is noticeable that movie reviews corpus made high number
than book reviews two corpora. This may interpreted by that OCA has less reviews as illustrated
in table I. And in the same time it is the bigger in the number of unique terms. Both two reasons
helped in supervised learning, according to our practical experience. Also, another factor related
to the nature of book readers reviews; this is noticed from [11] and [6], since LABR didn't
achieved high accuracy results, which happened due to the language that used in that corpus, it
was more complicated than another domain. And this is relevant to the richness of Arabic
language morphology, which becomes as hard as movie reviews domain, perhaps due to used
language, far away from dialectic or colloquial language of Arabic.
5. CONCLUSION
We had adopted new approach for enhancing sentiment analysis in Arabic language. Our
approach based firstly on pre-processing steps, which reduces the size of dataset. Then we had
implemented SKCM algorithm to apply our new feature selection method. After that we had used
this new FS method on three corpora via SVM classifier.
9. Computer Science & Information Technology (CS & IT) 115
Our approach results outperformed traditional methods, followed by most related works, which
proof the significance impact of using the three pre-processing steps, which we had adopted in
this work, and then illustrated the advantage of using sentiment keywords co-occurrence measure
SKCM for discovering term weights that helped in enhancing the SA accuracy.
REFERENCES
[1] B. Liu, Sentiment Analysis and Opinion Mining, Synthesis Lectures on Human Language
Technologies, Morgan & Claypool, 2012.
[2] F. Alqasemi, A. Abdelwahab, and H. Abdelkader, "Adapting Domain-specific Sentiment Lexicon
using New NLP-based Method in Arabic Language," International Journal of Computer Systems
(IJCS), 2016, pp. 188-193.
[3] A. Shoukry and A. Rafea, "Sentence-level Arabic sentiment analysis," Collaboration Technologies
and Systems (CTS), 2012 International Conference on. IEEE, 2012.
[4] M. Rushdi-Saleh, M. Teresa Martín-Valdivia, L. Alfonso Ureña-López, and José M. Perea-Ortega,
"OCA: Opinion Corpus for Arabic," Journal of the American Society for Infomration Science and
Technology, 2011, pp. 2045–2054.
[5] H. Cho and S. Choi, "Automatic construction of movie domain Korean sentiment dictionary using
online movie reviews," International Journal of Software Engineering and Its Applications 9.2, 2015,
pp. 251-260.
[6] M. Nabil, M. Aly, and A. F. Atiya, "LABR: a large scale arabic book reviews dataset," CoRR, abs,
2014 .
[7] Q. A. Al-Radaideh and L. M. Twaiq,Rough, "Set theory for Arabic sentiment classification," Future
Internet of Things and Cloud (FiCloud), 2014 International Conference on. IEEE, 2014.
[8] P. D. Turney, "Thumbs up or thumbs down?: semantic orientation applied to unsupervised
classification of reviews," Proceedings of the 40th annual meeting on association for computational
linguistics. Association for Computational Linguistics, 2002.
[9] A Bai, H. Hammer, A. Yazidi, and P. Engelstad, "Constructing sentiment lexicons in Norwegian from
a large text corpus," IEEE 17th International Conference on Computational Science and Engineering,
2014.
[10] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, "Lexicon-based methods for sentiment
analysis," Computational linguistics, 2011, pp. 37(2), 267-307.
[11] F. Alqasemi, A. Abdelwahab, and H. Abdelkader, "An enhanced feature extraction technique for
Improving se ntiment analysis in Arabic language," IEEE conference CSIST2016 Morocco, in press.
[12] N. Omar, M. Albared, T. Al-Moslmi, and A. Al-Shabi, "A Comparative Study of Feature Selection
and Machine Learning Algorithms for Arabic Sentiment Classification," Asia Information Retrieval
Symposium. Springer International Publishing, 2014.
[13] R. M. Duwairi, R. Marji, N. Sha’ban, and S. Rushaidat, "Sentiment analysis in arabic tweets,"
Information and communication systems (icics), 2014 5th international conference on. IEEE, 2014.
[14] J. Akaichi, Z. Dhouioui, and M. J. Pérez, "Text mining facebook status updates for sentiment
classification," System Theory, Control and Computing (ICSTCC), 2013 17th International
Conference. IEEE, 2013.
10. 116 Computer Science & Information Technology (CS & IT)
[15] A. L. Boudlal, and et al, "Alkhalil Morpho Sys1: A Morphosyntactic analysis system for Arabic
texts," In International Arab conference on information technology, 2010.
AUTHORS
Amira Abdelwahab, received BSc degree in computer science and information
systems from Faculty of Computers and Information, Helwan University, Egypt in
2000. and Ph.D. in information systems from Chiba University, Japan in 2012. In
2013, she was a postdoctoral fellow in Chiba University, Japan. Since 2012, she has
been an assistant professor in information systems department, Faculty of Computers
and Information, Menofia University, Egypt. Her research interests include Software
Engineering, Decision Support System, database Systems, Data Mining, Machine
Learning, Recommendation Systems, Intelligent Web applications, e-commerce, and
knowledge discovery in Big Data.
Fahd A. Alqasemi, received his Bsc. in Mathematical and Computer from Ibb
university,Yemen, then received his Master of Computer Information Systems from
Arabic Academy in Sana'a, Yemen. He worked as an instructor in UST, Sana'a,
Yemen. Currently, he is pursuing PhD In Menofia University, Menofia, Egypt. His
major fields of interest is Image Retrieval, Information Retrieval, Data and Text
Mining.
Prof. Hatem Abdelkader, obtained his BS and M.SC., both in electrical engineering
from the Alexandria University, faculty of Engineering, 1990 and 1995, respectively.
He obtained his Ph.D in electrical engineering also from faulty of engineering,
Alexandria University, Egypt 2001. His area of interest are data security, web
applications, artificial intelligence, and he is specialized neural networks. He is
currently a professor in the information system department, faculty of computers and
information, Menofia University, Egypt.