This paper introduces a syntactic analysis model that we propose to parse and process the Vietnamese questions about tablets in V-DLG~TABL system, which is a Vietnamese Question – Answering system working based on automatic dialog mechanism. The V-DLG~TABL system is built to support clients using Vietnamese questions for searching tablets based on interaction between the clients and the system. We apply the “Phrase Structure Grammar” of Noam Chomsky to develop a syntactic analysis model that is specific and suitable for the V-DLG~TABL system. This syntactic analysis model is used to implement the “V-DLG~TABL Syntactic Parsing and Processing” component of the system.
Building a vietnamese dialog mechanism for v dlg~tabl systemijnlc
This paper introduces a Vietnamese automatic dialog mechanism which allows the V-DLG~TABL system to automatically communicate with clients. This dialog mechanism is based on the Question – Answering engine of V-DLG~TABL system, and composes the following supplementary mechanisms: 1) the mechanism of choosing the suggested question; 2) the mechanism of managing the conversations and suggesting information; 3) the mechanism of resolving the questions having anaphora; 4) the scenarios of dialog (in this stage, there is only one simple scenario defined for the system).
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...ijseajournal
With the emergence of XML as de facto format for storing and exchanging information over the Internet, the search for ever more innovative and effective techniques for their querying is a major and current concern of the XML database community. Several studies carried out to help solve this problem are mostly oriented towards the evaluation of so-called exact queries which, unfortunately, are likely (especially in the case of semi-structured documents) to yield abundant results (in the case of vague queries) or empty results (in the case of very precise queries). From the observation that users who make requests are not necessarily interested in all possible solutions, but rather in those that are closest to their needs, an important field of research has been opened on the evaluation of preferences queries. In this paper, we propose an approach for the evaluation of such queries, in case the preferences concern the structure of the document. The solution investigated revolves around the proposal of an evaluation plan in three phases: rewriting-evaluation-merge. The rewriting phase makes it possible to obtain, from a partitioningtransformation operation of the initial query, a hierarchical set of preferences path queries which are holistically evaluated in the second phase by an instrumented version of the algorithm TwigStack. The merge phase is the synthesis of the best results.
Resolving the semantics of vietnamese questions in v news qaict systemijaia
Recently we have built a VNewsQA/ICT system which can read the titles of Vietnamese news in the domain
of information and communication technology, then process and use them to answer the Vietnamese
questions of users. The architecture of VNewsQA/ICT system has two main components: 1) the first
component treats the simple Vietnamese sentences as its natural language textual data which is used to
answer the user’s questions; 2) the second component resolves the semantics of Vietnamese questions
which query the system. This paper introduces a semantic representation model and a processing model to
revolve the Vietnamese questions in VNewsQA/ICT system. These semantic representation and processing
models are able to resolve the semantics of eight Vietnamese question classes which are used in our
system.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
Building a vietnamese dialog mechanism for v dlg~tabl systemijnlc
This paper introduces a Vietnamese automatic dialog mechanism which allows the V-DLG~TABL system to automatically communicate with clients. This dialog mechanism is based on the Question – Answering engine of V-DLG~TABL system, and composes the following supplementary mechanisms: 1) the mechanism of choosing the suggested question; 2) the mechanism of managing the conversations and suggesting information; 3) the mechanism of resolving the questions having anaphora; 4) the scenarios of dialog (in this stage, there is only one simple scenario defined for the system).
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...ijseajournal
With the emergence of XML as de facto format for storing and exchanging information over the Internet, the search for ever more innovative and effective techniques for their querying is a major and current concern of the XML database community. Several studies carried out to help solve this problem are mostly oriented towards the evaluation of so-called exact queries which, unfortunately, are likely (especially in the case of semi-structured documents) to yield abundant results (in the case of vague queries) or empty results (in the case of very precise queries). From the observation that users who make requests are not necessarily interested in all possible solutions, but rather in those that are closest to their needs, an important field of research has been opened on the evaluation of preferences queries. In this paper, we propose an approach for the evaluation of such queries, in case the preferences concern the structure of the document. The solution investigated revolves around the proposal of an evaluation plan in three phases: rewriting-evaluation-merge. The rewriting phase makes it possible to obtain, from a partitioningtransformation operation of the initial query, a hierarchical set of preferences path queries which are holistically evaluated in the second phase by an instrumented version of the algorithm TwigStack. The merge phase is the synthesis of the best results.
Resolving the semantics of vietnamese questions in v news qaict systemijaia
Recently we have built a VNewsQA/ICT system which can read the titles of Vietnamese news in the domain
of information and communication technology, then process and use them to answer the Vietnamese
questions of users. The architecture of VNewsQA/ICT system has two main components: 1) the first
component treats the simple Vietnamese sentences as its natural language textual data which is used to
answer the user’s questions; 2) the second component resolves the semantics of Vietnamese questions
which query the system. This paper introduces a semantic representation model and a processing model to
revolve the Vietnamese questions in VNewsQA/ICT system. These semantic representation and processing
models are able to resolve the semantics of eight Vietnamese question classes which are used in our
system.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
Purpose of the data base system, data abstraction, data model, data independence, data definition
language, data manipulation language, data base manager, data base administrator, data base users,
overall structure.
ER Models, entities, mapping constrains, keys, E-R diagram, reduction E-R diagrams to tables,
generatio, aggregation, design of an E-R data base scheme.
Oracle RDBMS, architecture, kernel, system global area (SGA), data base writer, log writer, process
monitor, archiver, database files, control files, redo log files, oracle utilities.
SQL: commands and data types, data definition language commands, data manipulation commands,
data query language commands, transaction language control commands, data control language
commands.
Joins, equi-joins, non-equi-joins, self joins, other joins, aggregate functions, math functions, string
functions, group by clause, data function and concepts of null values, sub-querries, views.
PL/SQL, basics of pl/sql, data types, control structures, database access with PL/SQL, data base
connections, transaction management, data base locking, cursor management.
The important problem of word segmentation in Thai language is sentential noun phrase. The existing
studies try to minimize the problem. But there is no research that solves this problem directly. This study
investigates the approach to resolve this problem using conditional random fields which is a probabilistic
model to segment and label sequence data. The results present that the corrected data of noun phrase was
detected more than 78.61 % based on our technique.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
Conceptual similarity measurement algorithm for domain specific ontology[Zac Darcy
This paper presents the similarity measurement algorithm for domain specific terms collected in the
ontology based data integration system. This similarity measurement algorithm can be used in ontology
mapping and query service of
ontology based data integration sy
stem. In this paper, we focus
o
n the web
query service to apply
this proposed algorithm
. Concepts similarity is important for web query service
because the words in user input query are not
same wholly with the concepts in
ontology. So, we need to
extract the possible concepts that are match or related to the input words with the help of machine readable
dictionary WordNet. Sometimes, we use the generated mapping rules in query generation procedure for
some words that canno
t be
confirmed the similarity of these words
by WordNet. We prove the effect
of this
algorithm with two degree semantic result of web minin
g by generating
the concepts results obtained form
the input query
Chunking means splitting the sentences into tokens and then grouping them in a meaningful way. When it comes to high-performance chunking systems, transformer models have proved to be the state of the art benchmarks. To perform chunking as a task it requires a large-scale high quality annotated corpus where each token is attached with a particular tag similar as that of Named Entity Recognition Tasks. Later these tags are used in conjunction with pointer frameworks to find the final chunk. To solve this for a specific domain problem, it becomes a highly costly affair in terms of time and resources to manually annotate and produce a large-high-quality training set. When the domain is specific and diverse, then cold starting becomes even more difficult because of the expected large number of manually annotated queries to cover all aspects. To overcome the problem, we applied a grammar-based text generation mechanism where instead of annotating a sentence we annotate using grammar templates. We defined various templates corresponding to different grammar rules. To create a sentence we used these templates along with the rules where symbol or terminal values were chosen from the domain data catalog. It helped us to create a large number of annotated queries. These annotated queries were used for training the machine learning model using an ensemble transformer-based deep neural network model [24.] We found that grammar-based annotation was useful to solve domain-based chunks in input query sentences without any manual annotation where it was found to achieve a classification F1 score of 96.97% in classifying the tokens for the out of template queries.
STATISTICAL FUNCTION TAGGING AND GRAMMATICAL RELATIONS OF MYANMAR SENTENCEScscpconf
This paper describes a context free grammar (CFG) based grammatical relations for Myanmar
sentences which combine corpus-based function tagging system. Part of the challenge of
statistical function tagging for Myanmar sentences comes from the fact that Myanmar has freephrase-order
and a complex morphological system. Function tagging is a pre-processing step to
show grammatical relations of Myanmar sentences. In the task of function tagging, which tags
the function of Myanmar sentences with correct segmentation, POS (part-of-speech) tagging
and chunking information, we use Naive Bayesian theory to disambiguate the possible function
tags of a word. We apply context free grammar (CFG) to find out the grammatical relations of
the function tags. We also create a functional annotated tagged corpus for Myanmar and propose the grammar rules for Myanmar sentences. Experiments show that our analysis achieves a good result with simple sentences and complex sentences.
Rhetorical Sentence Classification for Automatic Title Generation in Scientif...TELKOMNIKA JOURNAL
In this paper, we proposed a work on rhetorical corpus construction and sentence classification
model experiment that specifically could be incorporated in automatic paper title generation task for
scientific article. Rhetorical classification is treated as sequence labeling. Rhetorical sentence classification
model is useful in task which considers document’s discourse structure. We performed experiments using
two domains of datasets: computer science (CS dataset), and chemistry (GaN dataset). We evaluated the
models using 10-fold-cross validation (0.70-0.79 weighted average F-measure) as well as on-the-run
(0.30-0.36 error rate at best). We argued that our models performed best when handled using SMOTE
filter for imbalanced data.
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Networkkevig
In recent years, there has been an increasing use of social media among people in Myanmar and writing
review on social media pages about the product, movie, and trip are also popular among people. Moreover,
most of the people are going to find the review pages about the product they want to buy before deciding
whether they should buy it or not. Extracting and receiving useful reviews over interesting products is very
important and time consuming for people. Sentiment analysis is one of the important processes for extracting
useful reviews of the products. In this paper, the Convolutional LSTM neural network architecture is
proposed to analyse the sentiment classification of cosmetic reviews written in Myanmar Language. The
paper also intends to build the cosmetic reviews dataset for deep learning and sentiment lexicon in Myanmar
Language.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHScsandit
The paper proposes a method to assess short answer written by student using friendship matrix,
representation of friendship graph. The Short Answer is a type of answer which is based on
facts. These answers are quite different from long answers and Multiple Choice Question
(MCQ) type answers. The friendship graph is a graph which is based on friendship condition
i.e. the nodes have only one common neighbor. Friendship matrix is the matrix form of the
friendship graph. The student answer is stored in a friendship matrix and the teacher answer is
stored in another friendship matrix and both the matrices are compared. Based on the number
of errors encountered from student answer an error marks is calculated and that number is
subtracted from full marks to get student grade.
Review of Various Text Categorization Methodsiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKijnlc
In recent years, there has been an increasing use of social media among people in Myanmar and writing review on social media pages about the product, movie, and trip are also popular among people. Moreover, most of the people are going to find the review pages about the product they want to buy before deciding whether they should buy it or not. Extracting and receiving useful reviews over interesting products is very important and time consuming for people. Sentiment analysis is one of the important processes for extracting useful reviews of the products. In this paper, the Convolutional LSTM neural network architecture is proposed to analyse the sentiment classification of cosmetic reviews written in Myanmar Language. The paper also intends to build the cosmetic reviews dataset for deep learning and sentiment lexicon in Myanmar Language.
SL5 OBJECT: SIMPLER LEVEL 5 OBJECT EXPERT SYSTEM LANGUAGE ijscmcjournal
This paper introduces SL5 Object, the Simpler Level 5 Object Expert System Language. SL5 Object is a
rule-based language for specifying expert systems.This paper first introduces the concept of expert systems
and production systems, as well as the typical architecture of such a system. Then it presents a thorough
outline of the SL5 Object Language: the syntax of rules and Objects, allowed constructs, the module
structure. It also presents the execution cycle of the SL5 Object engine, as well as a number of methods to
influence the default progress of this cycle.Finally, this paper introduces an example of Cars diagnoses
problems to illustrate the capabilities of SL5 Object and the concepts presented.
Generation of Question and Answer from Unstructured Document using Gaussian M...IJACEE IJACEE
Question Answering (QA) system is one of the ever growing applications in Natural Language Processing. The purpose of Automatic Question and Answer Generation system is to generate all possible questions and its relevant answers from a given unstructured document. Complex sentences are simplified to make question generation easier. The accuracy of the generated questions is measured by identifying the subtopics from the text using Gaussian Mixture Neural Topic Model (GMNTM).The similarity between generated questions and text are calculated using Extended String Subsequence Kernel (ESSK). The syntactic correctness of the questions is measured by Syntactic Tree Kernel which computes the similarity scores between each sentence in the given context and generated questions. Based on the similarity score, questions are ranked. The answers for the generated questions are extracted using Pattern Matching Approach. This system is expected to produce better accuracy when compared with the system using Latent Dirichlet Allocation (LDA) for subtopic identification.
An in-depth exploration of Bangla blog post classificationjournalBEEI
Bangla blog is increasing rapidly in the era of information, and consequently, the blog has a diverse layout and categorization. In such an aptitude, automated blog post classification is a comparatively more efficient solution in order to organize Bangla blog posts in a standard way so that users can easily find their required articles of interest. In this research, nine supervised learning models which are Support Vector Machine (SVM), multinomial naïve Bayes (MNB), multi-layer perceptron (MLP), k-nearest neighbours (k-NN), stochastic gradient descent (SGD), decision tree, perceptron, ridge classifier and random forest are utilized and compared for classification of Bangla blog post. Moreover, the performance on predicting blog posts against eight categories, three feature extraction techniques are applied, namely unigram TF-IDF (term frequency-inverse document frequency), bigram TF-IDF, and trigram TF-IDF. The majority of the classifiers show above 80% accuracy. Other performance evaluation metrics also show good results while comparing the selected classifiers.
SEMANTIC PROCESSING MECHANISM FOR LISTENING AND COMPREHENSION IN VNSCALENDAR ...ijnlc
This paper presents some generalities about the VNSCalendar system, a tool able to understand users’
voice commands, would help users with managing and querying their personal calendar by Vietnamese
speech. The main feature of this system consists in the fact that it is equipped with a mechanism of
analyzing syntax and semantics of Vietnamese commands and questions. The syntactic and semantic
processing of Vietnamese sentences is solved by using DCG (Definite Clause Grammar) and the methods of
formal semantics. This is the first system in this field of voice application, which is equipped an effective
semantic processing mechanism of Vietnamese language. Having been built and tested in PC environment,
our system proves its accuracy attaining more than 91%.
Purpose of the data base system, data abstraction, data model, data independence, data definition
language, data manipulation language, data base manager, data base administrator, data base users,
overall structure.
ER Models, entities, mapping constrains, keys, E-R diagram, reduction E-R diagrams to tables,
generatio, aggregation, design of an E-R data base scheme.
Oracle RDBMS, architecture, kernel, system global area (SGA), data base writer, log writer, process
monitor, archiver, database files, control files, redo log files, oracle utilities.
SQL: commands and data types, data definition language commands, data manipulation commands,
data query language commands, transaction language control commands, data control language
commands.
Joins, equi-joins, non-equi-joins, self joins, other joins, aggregate functions, math functions, string
functions, group by clause, data function and concepts of null values, sub-querries, views.
PL/SQL, basics of pl/sql, data types, control structures, database access with PL/SQL, data base
connections, transaction management, data base locking, cursor management.
The important problem of word segmentation in Thai language is sentential noun phrase. The existing
studies try to minimize the problem. But there is no research that solves this problem directly. This study
investigates the approach to resolve this problem using conditional random fields which is a probabilistic
model to segment and label sequence data. The results present that the corrected data of noun phrase was
detected more than 78.61 % based on our technique.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
Conceptual similarity measurement algorithm for domain specific ontology[Zac Darcy
This paper presents the similarity measurement algorithm for domain specific terms collected in the
ontology based data integration system. This similarity measurement algorithm can be used in ontology
mapping and query service of
ontology based data integration sy
stem. In this paper, we focus
o
n the web
query service to apply
this proposed algorithm
. Concepts similarity is important for web query service
because the words in user input query are not
same wholly with the concepts in
ontology. So, we need to
extract the possible concepts that are match or related to the input words with the help of machine readable
dictionary WordNet. Sometimes, we use the generated mapping rules in query generation procedure for
some words that canno
t be
confirmed the similarity of these words
by WordNet. We prove the effect
of this
algorithm with two degree semantic result of web minin
g by generating
the concepts results obtained form
the input query
Chunking means splitting the sentences into tokens and then grouping them in a meaningful way. When it comes to high-performance chunking systems, transformer models have proved to be the state of the art benchmarks. To perform chunking as a task it requires a large-scale high quality annotated corpus where each token is attached with a particular tag similar as that of Named Entity Recognition Tasks. Later these tags are used in conjunction with pointer frameworks to find the final chunk. To solve this for a specific domain problem, it becomes a highly costly affair in terms of time and resources to manually annotate and produce a large-high-quality training set. When the domain is specific and diverse, then cold starting becomes even more difficult because of the expected large number of manually annotated queries to cover all aspects. To overcome the problem, we applied a grammar-based text generation mechanism where instead of annotating a sentence we annotate using grammar templates. We defined various templates corresponding to different grammar rules. To create a sentence we used these templates along with the rules where symbol or terminal values were chosen from the domain data catalog. It helped us to create a large number of annotated queries. These annotated queries were used for training the machine learning model using an ensemble transformer-based deep neural network model [24.] We found that grammar-based annotation was useful to solve domain-based chunks in input query sentences without any manual annotation where it was found to achieve a classification F1 score of 96.97% in classifying the tokens for the out of template queries.
STATISTICAL FUNCTION TAGGING AND GRAMMATICAL RELATIONS OF MYANMAR SENTENCEScscpconf
This paper describes a context free grammar (CFG) based grammatical relations for Myanmar
sentences which combine corpus-based function tagging system. Part of the challenge of
statistical function tagging for Myanmar sentences comes from the fact that Myanmar has freephrase-order
and a complex morphological system. Function tagging is a pre-processing step to
show grammatical relations of Myanmar sentences. In the task of function tagging, which tags
the function of Myanmar sentences with correct segmentation, POS (part-of-speech) tagging
and chunking information, we use Naive Bayesian theory to disambiguate the possible function
tags of a word. We apply context free grammar (CFG) to find out the grammatical relations of
the function tags. We also create a functional annotated tagged corpus for Myanmar and propose the grammar rules for Myanmar sentences. Experiments show that our analysis achieves a good result with simple sentences and complex sentences.
Rhetorical Sentence Classification for Automatic Title Generation in Scientif...TELKOMNIKA JOURNAL
In this paper, we proposed a work on rhetorical corpus construction and sentence classification
model experiment that specifically could be incorporated in automatic paper title generation task for
scientific article. Rhetorical classification is treated as sequence labeling. Rhetorical sentence classification
model is useful in task which considers document’s discourse structure. We performed experiments using
two domains of datasets: computer science (CS dataset), and chemistry (GaN dataset). We evaluated the
models using 10-fold-cross validation (0.70-0.79 weighted average F-measure) as well as on-the-run
(0.30-0.36 error rate at best). We argued that our models performed best when handled using SMOTE
filter for imbalanced data.
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Networkkevig
In recent years, there has been an increasing use of social media among people in Myanmar and writing
review on social media pages about the product, movie, and trip are also popular among people. Moreover,
most of the people are going to find the review pages about the product they want to buy before deciding
whether they should buy it or not. Extracting and receiving useful reviews over interesting products is very
important and time consuming for people. Sentiment analysis is one of the important processes for extracting
useful reviews of the products. In this paper, the Convolutional LSTM neural network architecture is
proposed to analyse the sentiment classification of cosmetic reviews written in Myanmar Language. The
paper also intends to build the cosmetic reviews dataset for deep learning and sentiment lexicon in Myanmar
Language.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHScsandit
The paper proposes a method to assess short answer written by student using friendship matrix,
representation of friendship graph. The Short Answer is a type of answer which is based on
facts. These answers are quite different from long answers and Multiple Choice Question
(MCQ) type answers. The friendship graph is a graph which is based on friendship condition
i.e. the nodes have only one common neighbor. Friendship matrix is the matrix form of the
friendship graph. The student answer is stored in a friendship matrix and the teacher answer is
stored in another friendship matrix and both the matrices are compared. Based on the number
of errors encountered from student answer an error marks is calculated and that number is
subtracted from full marks to get student grade.
Review of Various Text Categorization Methodsiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKijnlc
In recent years, there has been an increasing use of social media among people in Myanmar and writing review on social media pages about the product, movie, and trip are also popular among people. Moreover, most of the people are going to find the review pages about the product they want to buy before deciding whether they should buy it or not. Extracting and receiving useful reviews over interesting products is very important and time consuming for people. Sentiment analysis is one of the important processes for extracting useful reviews of the products. In this paper, the Convolutional LSTM neural network architecture is proposed to analyse the sentiment classification of cosmetic reviews written in Myanmar Language. The paper also intends to build the cosmetic reviews dataset for deep learning and sentiment lexicon in Myanmar Language.
SL5 OBJECT: SIMPLER LEVEL 5 OBJECT EXPERT SYSTEM LANGUAGE ijscmcjournal
This paper introduces SL5 Object, the Simpler Level 5 Object Expert System Language. SL5 Object is a
rule-based language for specifying expert systems.This paper first introduces the concept of expert systems
and production systems, as well as the typical architecture of such a system. Then it presents a thorough
outline of the SL5 Object Language: the syntax of rules and Objects, allowed constructs, the module
structure. It also presents the execution cycle of the SL5 Object engine, as well as a number of methods to
influence the default progress of this cycle.Finally, this paper introduces an example of Cars diagnoses
problems to illustrate the capabilities of SL5 Object and the concepts presented.
Generation of Question and Answer from Unstructured Document using Gaussian M...IJACEE IJACEE
Question Answering (QA) system is one of the ever growing applications in Natural Language Processing. The purpose of Automatic Question and Answer Generation system is to generate all possible questions and its relevant answers from a given unstructured document. Complex sentences are simplified to make question generation easier. The accuracy of the generated questions is measured by identifying the subtopics from the text using Gaussian Mixture Neural Topic Model (GMNTM).The similarity between generated questions and text are calculated using Extended String Subsequence Kernel (ESSK). The syntactic correctness of the questions is measured by Syntactic Tree Kernel which computes the similarity scores between each sentence in the given context and generated questions. Based on the similarity score, questions are ranked. The answers for the generated questions are extracted using Pattern Matching Approach. This system is expected to produce better accuracy when compared with the system using Latent Dirichlet Allocation (LDA) for subtopic identification.
An in-depth exploration of Bangla blog post classificationjournalBEEI
Bangla blog is increasing rapidly in the era of information, and consequently, the blog has a diverse layout and categorization. In such an aptitude, automated blog post classification is a comparatively more efficient solution in order to organize Bangla blog posts in a standard way so that users can easily find their required articles of interest. In this research, nine supervised learning models which are Support Vector Machine (SVM), multinomial naïve Bayes (MNB), multi-layer perceptron (MLP), k-nearest neighbours (k-NN), stochastic gradient descent (SGD), decision tree, perceptron, ridge classifier and random forest are utilized and compared for classification of Bangla blog post. Moreover, the performance on predicting blog posts against eight categories, three feature extraction techniques are applied, namely unigram TF-IDF (term frequency-inverse document frequency), bigram TF-IDF, and trigram TF-IDF. The majority of the classifiers show above 80% accuracy. Other performance evaluation metrics also show good results while comparing the selected classifiers.
SEMANTIC PROCESSING MECHANISM FOR LISTENING AND COMPREHENSION IN VNSCALENDAR ...ijnlc
This paper presents some generalities about the VNSCalendar system, a tool able to understand users’
voice commands, would help users with managing and querying their personal calendar by Vietnamese
speech. The main feature of this system consists in the fact that it is equipped with a mechanism of
analyzing syntax and semantics of Vietnamese commands and questions. The syntactic and semantic
processing of Vietnamese sentences is solved by using DCG (Definite Clause Grammar) and the methods of
formal semantics. This is the first system in this field of voice application, which is equipped an effective
semantic processing mechanism of Vietnamese language. Having been built and tested in PC environment,
our system proves its accuracy attaining more than 91%.
Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...kevig
This paper presents some generalities about the VNSCalendar system, a tool able to understand users’
voice commands, would help users with managing and querying their personal calendar by Vietnamese
speech. The main feature of this system consists in the fact that it is equipped with a mechanism of
analyzing syntax and semantics of Vietnamese commands and questions. The syntactic and semantic
processing of Vietnamese sentences is solved by using DCG (Definite Clause Grammar) and the methods of
formal semantics. This is the first system in this field of voice application, which is equipped an effective
semantic processing mechanism of Vietnamese language. Having been built and tested in PC environment,
our system proves its accuracy attaining more than 91%.
Testing Different Log Bases for Vector Model Weighting Techniquekevig
Information retrieval systems retrieves relevant documents based on a query submitted by the user. The documents are initially indexed and the words in the documents are assigned weights using a weighting technique called TFIDF which is the product of Term Frequency (TF) and Inverse Document Frequency (IDF). TF represents the number of occurrences of a term in a document. IDF measures whether the term is common or rare across all documents. It is computed by dividing the total number of documents in the system by the number of documents containing the term and then computing the logarithm of the quotient. By default, we use base 10 to calculate the logarithm. In this paper, we are going to test this weighting technique by using a range of log bases from 0.1 to 100.0 to calculate the IDF. Testing different log bases for vector model weighting technique is to highlight the importance of understanding the performance of the system at different weighting values. We use the documents of MED, CRAN, NPL, LISA, and CISI test collections that scientists assembled explicitly for experiments in data information retrieval systems.
Testing Different Log Bases for Vector Model Weighting Techniquekevig
Information retrieval systems retrieves relevant documents based on a query submitted by the user. The documents are initially indexed and the words in the documents are assigned weights using a weighting technique called TFIDF which is the product of Term Frequency (TF) and Inverse Document Frequency (IDF). TF represents the number of occurrences of a term in a document. IDF measures whether the term is common or rare across all documents. It is computed by dividing the total number of documents in the system by the number of documents containing the term and then computing the logarithm of the quotient. By default, we use base 10 to calculate the logarithm. In this paper, we are going to test this weighting technique by using a range of log bases from 0.1 to 100.0 to calculate the IDF. Testing different log bases for vector model weighting technique is to highlight the importance of understanding the performance of the system at different weighting values. We use the documents of MED, CRAN, NPL, LISA, and CISI test collections that scientists assembled explicitly for experiments in data information retrieval systems.
Evaluation of subjective answers using glsa enhanced with contextual synonymyijnlc
Evaluation of subjective answers submitted in an exam is an essential but one of the most resource consuming educational activity. This paper details experiments conducted under our project to build a software that evaluates the subjective answers of informative nature in a given knowledge domain. The paper first summarizes the techniques such as Generalized Latent Semantic Analysis (GLSA) and Cosine Similarity that provide basis for the proposed model. The further sections point out the areas of improvement in the previous work and describe our approach towards the solutions of the same. We then discuss the implementation details of the project followed by the findings that show the improvements achieved. Our approach focuses on comprehending the various forms of expressing same entity and thereby capturing the subjectivity of text into objective parameters. The model is tested by evaluating answers submitted by 61 students of Third Year B. Tech. CSE class of Walchand College of Engineering Sangli in a test on Database Engineering.
Apply deep learning to improve the question analysis model in the Vietnamese ...IJECEIAES
Question answering (QA) system nowadays is quite popular for automated answering purposes, the meaning analysis of the question plays an important role, directly affecting the accuracy of the system. In this article, we propose an improvement for question-answering models by adding more specific question analysis steps, including contextual characteristic analysis, pos-tag analysis, and question-type analysis built on deep learning network architecture. Weights of extracted words through question analysis steps are combined with the best matching 25 (BM25) algorithm to find the best relevant paragraph of text and incorporated into the QA model to find the best and least noisy answer. The dataset for the question analysis step consists of 19,339 labeled questions covering a variety of topics. Results of the question analysis model are combined to train the question-answering model on the data set related to the learning regulations of Industrial University of Ho Chi Minh City. It includes 17,405 pairs of questions and answers for the training set and 1,600 pairs for the test set, where the robustly optimized BERT pretraining approach (RoBERTa) model has an F1-score accuracy of 74%. The model has improved significantly. For long and complex questions, the mode has extracted weights and correctly provided answers based on the question’s contents.
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGijnlc
In this paper, we propose a novel algorithm that rearrange the topic assignment results obtained from topic
modeling algorithms, including NMF and LDA. The effectiveness of the algorithm is measured by how much
the results conform to expert opinion, which is a data structure called TDAG that we defined to represent the
probability that a pair of highly correlated words appear together. In order to make sure that the internal
structure does not get changed too much from the rearrangement, coherence, which is a well known metric
for measuring the effectiveness of topic modeling, is used to control the balance of the internal structure.
We developed two ways to systematically obtain the expert opinion from data, depending on whether the
data has relevant expert writing or not. The final algorithm which takes into account both coherence and
expert opinion is presented. Finally we compare amount of adjustments needed to be done for each topic
modeling method, NMF and LDA.
Query Sensitive Comparative Summarization of Search Results Using Concept Bas...CSEIJJournal
Query sensitive summarization aims at providing the users with the summary of the contents of single or
multiple web pages based on the search query. This paper proposes a novel idea of generating a
comparative summary from a set of URLs from the search result. User selects a set of web page links from
the search result produced by search engine. Comparative summary of these selected web sites is
generated. This method makes use of HTML DOM tree structure of these web pages. HTML documents are
segmented into set of concept blocks. Sentence score of each concept block is computed with respect to the
query and feature keywords. The important sentences from the concept blocks of different web pages are
extracted to compose the comparative summary on the fly. This system reduces the time and effort required
for the user to browse various web sites to compare the information. The comparative summary of the
contents would help the users in quick decision making.
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...cseij
Query sensitive summarization aims at providing the users with the summary of the contents of single or multiple web pages based on the search query. This paper proposes a novel idea of generating a comparative summary from a set of URLs from the search result. User selects a set of web page links from the search result produced by search engine. Comparative summary of these selected web sites is generated. This method makes use of HTML DOM tree structure of these web pages. HTML documents are segmented into set of concept blocks. Sentence score of each concept block is computed with respect to the query and feature keywords. The important sentences from the concept blocks of different web pages are extracted to compose the comparative summary on the fly. This system reduces the time and effort required for the user to browse various web sites to compare the information. The comparative summary of the contents would help the users in quick decision making.
Architecture of an ontology based domain-specific natural language question a...IJwest
Question answering (QA) system aims at retrieving precise information from a large collection of
documents against a query. This paper describes the architecture of a Natural Language Question
Answering (NLQA) system for a specifi
c domain based on the ontological information, a step towards
semantic web question answering. The proposed architecture defines four basic modules suitable for
enhancing current QA capabilities with the ability of processing complex questions. The first m
odule was
the question processing, which analyses and classifies the question and also reformulates the user query.
The second module allows the process of retrieving the relevant documents. The next module processes the
retrieved documents, and the last m
odule performs the extraction and generation of a response. Natural
language processing techniques are used for processing the question and documents and also for answer
extraction. Ontology and domain knowledge are used for reformulating queries and ident
ifying the
relations. The aim of the system is to generate short and specific answer to the question that is asked in the
natural language in a specific domain. We have achieved 94 % accuracy of natural language question
answering in our implementation
A prior case study of natural language processing on different domain IJECEIAES
In the present state of digital world, computer machine do not understand the human’s ordinary language. This is the great barrier between humans and digital systems. Hence, researchers found an advanced technology that provides information to the users from the digital machine. However, natural language processing (i.e. NLP) is a branch of AI that has significant implication on the ways that computer machine and humans can interact. NLP has become an essential technology in bridging the communication gap between humans and digital data. Thus, this study provides the necessity of the NLP in the current computing world along with different approaches and their applications. It also, highlights the key challenges in the development of new NLP model.
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDINGkevig
Applying natural language processing-related algorithms is currently a popular project in legal
applications, for instance, document classification of legal documents, contract review and machine
translation. Using the above machine learning algorithms, all need to encode the words in the document in
the form of vectors. The word embedding model is a modern distributed word representation approach and
the most common unsupervised word encoding method. It facilitates subjecting other algorithms and
subsequently performing the downstream tasks of natural language processing vis-à-vis. The most common
and practical approach of accuracy evaluation with the word embedding model uses a benchmark set with
linguistic rules or the relationship between words to perform analogy reasoning via algebraic calculation.
This paper proposes establishing a 1,256 Legal Analogical Reasoning Questions Set (LARQS) from the
2,388 Chinese Codex corpus using five kinds of legal relations, which are then used to evaluate the
accuracy of the Chinese word embedding model. Moreover, we discovered that legal relations might be
ubiquitous in the word embedding model.
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDINGkevig
Applying natural language processing-related algorithms is currently a popular project in legal
applications, for instance, document classification of legal documents, contract review and machine
translation. Using the above machine learning algorithms, all need to encode the words in the document in
the form of vectors. The word embedding model is a modern distributed word representation approach and
the most common unsupervised word encoding method. It facilitates subjecting other algorithms and
subsequently performing the downstream tasks of natural language processing vis-à-vis. The most common
and practical approach of accuracy evaluation with the word embedding model uses a benchmark set with
linguistic rules or the relationship between words to perform analogy reasoning via algebraic calculation.
This paper proposes establishing a 1,256 Legal Analogical Reasoning Questions Set (LARQS) from the
2,388 Chinese Codex corpus using five kinds of legal relations, which are then used to evaluate the
accuracy of the Chinese word embedding model. Moreover, we discovered that legal relations might be
ubiquitous in the word embedding model.
Convolutional recurrent neural network with template based representation for...IJECEIAES
Complex Question answering system is developed to answer different types of questions accurately. Initially the question from the natural language is transformed to an internal representation which captures the semantics and intent of the question. In the proposed work, internal representation is provided with templates instead of using synonyms or keywords. Then for each internal representation, it is mapped to relevant query against the knowledge base. In present work, the Template representation based Convolutional Recurrent Neural Network (T-CRNN) is proposed for selecting answer in Complex Question Answering (CQA) framework. Recurrent neural network is used to obtain the exact correlation between answers and questions and the semantic matching among the collection of answers. Initially, the process of learning is accomplished through Convolutional Neural Network (CNN) which represents the questions and answers separately. Then the representation with fixed length is produced for each question with the help of fully connected neural network. In order to design the semantic matching between the answers, the representation of Question Answer (QA) pair is given into the Recurrent Neural Network (RNN). Finally, for the given question, the correctly correlated answers are identified with the softmax classifier.
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...csandit
Text mining and Text classification are the two pro
minent and challenging tasks in the field of
Machine learning. Text mining refers to the process
of deriving high quality and relevant
information from text, while Text classification de
als with the categorization of text documents
into different classes. The real challenge in these
areas is to address the problems like handling
large text corpora, similarity of words in text doc
uments, and association of text documents with
a subset of class categories. The feature extractio
n and classification of such text documents
require an efficient machine learning algorithm whi
ch performs automatic text classification.
This paper describes the classification of product
review documents as a multi-label
classification scenario and addresses the problem u
sing Structured Support Vector Machine.
The work also explains the flexibility and performan
ce of the proposed approach for e
fficient text classification.
Similar to A syntactic analysis model for vietnamese questions in v dlg~tabl system (20)
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 3
A syntactic analysis model for vietnamese questions in v dlg~tabl system
1. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
DOI : 10.5121/ijnlc.2014.3101 01
A SYNTACTIC ANALYSIS MODEL FOR VIETNAMESE
QUESTIONS IN V-DLG~TABL SYSTEM
An Hoai Vo and Dang Tuan Nguyen
Faculty of Computer Science, University of Information Technology, Vietnam National
University – Ho Chi Minh City
ABSTRACT
This paper introduces a syntactic analysis model that we propose to parse and process the Vietnamese
questions about tablets in V-DLG~TABL system, which is a Vietnamese Question – Answering system
working based on automatic dialog mechanism. The V-DLG~TABL system is built to support clients using
Vietnamese questions for searching tablets based on interaction between the clients and the system. We
apply the “Phrase Structure Grammar” of Noam Chomsky to develop a syntactic analysis model that is
specific and suitable for the V-DLG~TABL system. This syntactic analysis model is used to implement the
“V-DLG~TABL Syntactic Parsing and Processing” component of the system.
KEYWORDS
Syntax, Parsing, Question Answering, Automatic Dialog Mechanism, Vietnamese Language Processing.
1. INTRODUCTION
In this paper, we present a syntactic analysis model that we propose to parse and process
Vietnamese questions about tablets in our V-DLG~TABL, which is an advanced Question –
Answering system working with a dialog mechanism based on scenarios. We hope to build this
system to help clients who want to buy tablets find the information about the ones they are
interested in, based on their interaction with the system by using Vietnamese language.
To build a system with such functions, we design the architecture of V-DLG~TABL system
based on major components as follows:
The component “V-DLG~TABL Syntactic Parsing and Processing”: the Vietnamese
question about tablets that clients enter to system is automatically analyzed based on “V-
DLG~TABL_PSG” grammar which has been defined in the system. Then, based on the
syntactic structure of the question, the system determines the syntactic elements which
contain the principal information of the question corresponding to the “information
structure model” proposed in [1].
The component “V-DLG~TABL Semantic Analyzing”: the important elements of the
syntactic structure of the question, which correspond to the components of the
“information structure model” proposed in [1], will be retained and transformed into
predicates in FOL (First-Order Logic) based on the implementation techniques used in [1]
and the programming methods proposed in [2].
The component “V-DLG~TABL Facts Database Querying”: searching data in the
database of facts, based on the methods and techniques proposed in [1].
2. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
2
The component “V-DLG~TABL Answer Creating”: creating Vietnamese answer by
using the analyzed syntactic structure of the question, based on the method proposed in
[1].
The component “V-DLG~TABL Dialog”: Operating the interactions between client and
system, and making suggestions relating to the information that the client is interested in
conversation process.
In this V-DLG~TABL system, structurally analyzing and processing the syntactic structure of
Vietnamese question is a fundamental task. This paper is limited to present the syntactic analysis
model that we propose to perform this task. In fact, this syntactic analysis model is used to build
the component “V-DLG~TABL Syntactic Parsing and Processing” of V-DLG~TABL system.
However, this research does not discuss in details on any implementation of the system.
Related works: Nguyễn Thành and Phạm Minh Tiến [1] built a basic Question – Answering
system that allows clients to use some simple forms of Vietnamese questions to query information
about tablets. In [1], the building of system has been based on Definite Clause Grammar (DCG)
[2], and the methods of computational semantics [3]. Based on [2] and [3], there were some other
Question – Answering systems built for Vietnamese language such as: [4], [5], [6], [7], [8].
2. SYNTACTIC ANALYSIS MODEL OF SYSTEM
In this research, we reuse the classification of Vietnamese questions about the tablets which has
been proposed in [1]. According to [1], the questions about the tablets which are distinguished
into three fundamental types:
Type 1: Questions about the features of tablets, or components of tablets.
Type 2: Yes/no questions about the components of tablets.
Type 3: The others for finding the tablets based on the query’s information.
The syntactic analysis model for processing Vietnamese questions in V-DLG~TABL system
relates to the following aspects:
The theoretic model of syntax that is applied for parsing Vietnamese questions.
The representation of information structure of Vietnamese questions.
The method for determining the information structure of Vietnamese question based on
its syntactic structure.
2.1. Defining the syntactic elements
In order to define “V-DLG~TABL_PSG” grammar of the system in Definite Clause Grammar
(DCG) [2], at first, we define the basic syntactic elements. We apply the way to name the
syntactic elements of Phan Thị Thể [10], [11] for nouns and noun phrases: a name begins with a
character (‘n’ for nouns, ‘np’ for noun phrase), and follows by an underscore symbol ‘_’, and
other description words.
- The np_tablet: this syntactic element is a noun phrase. It represents the name of tablet.
- The n_component_<name>: this syntactic element is a noun. It represents the name of
component of the tablet. The name of component consists of symbol “n_component_”
and name of component. The notation is: n_component_<name>.
For example: n_component_screen, n_component_bluetooth.
- The n_property_<name>: this syntactic element is a noun. It describes in details the tablet
or the component of tablet. Its notation is n_property_<name>.
3. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
3
For example: n_property_price, n_property_color.
2.2. Building the grammar of system
The syntactic rules of the system are defined in Definite Clause Grammar [2] through the
following steps:
1) Step 1: Define the basic syntactic elements.
Nouns and noun phrases:
- np_tablet, n_tablet, pn_tablet_name: the elements describe the name of tablet.
- n_component_<name>: the element describes the components of tablet.
- n_property_<name>: the element describes the features of components or tablet.
Verbs and other syntactic elements:
- ‘verb’: the verbs.
- ‘interrog’: the interrogative words.
- ‘wh_tablet’: the questions about the tablets.
The definitions of basic syntactic elements are listed in Table 1.
Table 1: Basic syntactic elements of “V-DLG~TABL_PSG” grammar
No. Syntactic elements Description
1 np_tablet Noun phrase for naming the tablets
2 n_tablet Words describe the tablets
3 pn_tablet_name Names of tablets
4 n_component_screen Noun describes the screen
5 n_component_sim Noun describes the SIM card
6 n_component_front_camera Noun describes the front camera
7 n_component_back_camera Noun describes the back camera
….
8 n_property_size Noun describes the size
9 n_property_weight Noun describes the weight
10 n_property_color Noun describes the color
11 n_property_price Noun describes the price
….
12 verb Verbs used in the grammar
13 interrog Words describe queried parts of questions
14 wh_tablet Words ask about the tablets
2) Step 2: Define the syntactic rules for phrases and question types.
The definitions of the syntactic rules based on the syntactic structures of Vietnamese question
types are described in the section 3.
2.3. Representing information structure of questions about tablets
After analyzing the syntax of Vietnamese question, the problem is how to determine the syntactic
elements representing important information to ask. In order to solve this issue, the syntactic
4. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
4
structure of the question has to be mapped on a predefined representation model of the
information structure of the question.
In this research, we reuse the “information structure model” proposed by Nguyễn Thành and
Phạm Minh Tiến [1] for representing the content of Vietnamese questions about tablets.
According to [1], the information structure of questions is analyzed by the following components
(cf. [1]):
The information about tablet.
The components of tablet.
The features about tablet or components.
The description about the features of tablet or components.
However, the way that we approach to analyze and transform the syntactic structure of
Vietnamese questions into “information structure model” has some differences from [1] as
follows:
In [1], Nguyễn Thành and Phạm Minh Tiến defined a Definite Clause Grammar (DCG)
for analyzing the predefined question types based on their own “syntactic structure
model” having the following elements [1]: “product”, “functionWord”, “properties” and
“value”. These “syntactic elements” exactly correspond to the components of the
“information structure model” of questions proposed in [1].
In our research, we analyze the syntactic structure of Vietnamese questions by using the
“Phrase Structure Grammar” theory of N. Chomsky [9]. Basing on the constituent
structure of questions, we propose an algorithm for determining the syntactic elements
corresponding to the components of the “information structure model” proposed in [1].
Briefly, Nguyễn Thành and Phạm Minh Tiến [1] analyzed Vietnamese questions about the tablets
by using their own “syntactic structure model” that is not a real syntactic model, and it directly
corresponds to the “information structure model” of the questions proposed in [1]. In [1], they did
not use any grammar theory to analyze Vietnamese questions.
Otherwise, we analyze the syntax of Vietnamese questions based on the “Phrase Structure
Grammar” theory of N. Chomsky [9] and then transform the syntactic structures into the
“information structure model” which has been proposed in [1].
3. SYNTACTIC STRUCTURES OF QUESTION TYPES USED IN SYSTEM
In this section, we present the syntactic structures of the question types. These syntactic structures
are viewed from the syntactic elements listed in Table 1, based on the “Phrase Structure
Grammar” of N. Chomsky [9].
3.1. Questions about the features of the tablets or components
According to [1], the questions about the features of the tablets or components are type 1. We
present the syntactic structures of this question type in Table 2.
5. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
5
Table 2: The syntactic structure of question type 1
Syntactic structure Examples
<np_tablet> verb <n_component> verb
<n_property> interrog
“Máy tính bảng Nexus 7 có màn hình là loại
gì?”
Syntactic tree: s(np_tablet(n_tablet(‘máy tính
bảng’), pn_tablet_name('Nexus 7')), vp(verb(có),
n_component_screen('màn hình'), verb(là),
n_property_type(loại)), interrog(gì)).
<np_tablet> verb <n_property> [preposition]
<n_component> interrog
“Máy tính bảng Nexus 7 có kích thước [của]
màn hình bao nhiêu?”
Syntactic tree: s(np_tablet(n_tablet('máy tính
bảng'), pn_tablet_name('Nexus 7')), vp(verb(có),
n_property_size('kích thước'),
n_component_screen('màn hình')), interrog('bao
nhiêu'))
<n_component> [preposition] <np_tablet>
verb <n_property> interrog
“Màn hình [của] máy tính bảng Nexus 7 là loại
gì?”
Syntactic tree: s(np(n_component_screen('màn
hình'), np_tablet(n_tablet('máy tính bảng'),
pn_tablet_name('Nexus 7'))), vp(verb(là),
n_property_type(loại)), interrog(gì))
<n_property> [preposition] <n_component>
<np_tablet> interrog
“Loại [của] màn hình máy tính bảng Nexus 7 là
gì?”
Syntactic tree: s(np(n_property_type(loại),
n_component_screen('màn hình'),
np_tablet(n_tablet('máy tính bảng'),
pn_tablet_name('Nexus 7'))), vp(verb(là),
interrog(gì))
3.2. Yes/no questions about the components
According to [1], yes/no questions about components of tablets are type 2. We present the
syntactic structures of this question type in Table 3.
Table 3: The syntactic structure of question type 2
Syntactic structure Examples
<np_tablet> verb <n_component> verb
<n_property> <literal> interrog
“Máy tính bảng Nexus 7 có màn hình là loại
‘cảm ứng’ phải không?”
Syntactic tree: s(np_tablet(n_tablet('máy tính
bảng'), pn_tablet_name('Nexus 7')), vp(verb(có),
n_component_screen('màn hình'), verb(là),
n_property_type(loại), literal('cảm ứng')),
interrog('phải không'))
<np_tablet> verb <n_property> [preposition]
<n_component> verb <literal> interrog
“Máy tính bảng Nexus 7 có kích thước [của]
màn hình là ‘7 inch’ phải không?”
Syntactic tree: s(np_tablet(n_tablet('máy tính
6. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
6
bảng'), pn_tablet_name('Nexus 7')), vp(verb(có),
n_property_size('kích thước'),
n_component_screen('màn hình'), verb(là),
literal('7 inch')), interrog('phải không'))
<n_component> [preposition] <np_tablet>
verb <n_property> verb <literal> interrog
“Màn hình [của] máy tính bảng Nexus 7 có loại
là ‘cảm ứng’ phải không?”
Syntactic tree: s(np(n_component_screen('màn
hình'), np_tablet(n_tablet('máy tính bảng'),
pn_tablet_name('Nexus 7')), vp(verb(có),
n_property_type(loại)), verb(là), literal('cảm
ứng')), interrog('phải không'))
<n_property> [preposition] <n_component>
[preposition] <np_tablet> verb <literal>
interrog
“Loại [của] màn hình [của] máy tính bảng
Nexus 7 là ‘cảm ứng’ phải không?”
Syntactic tree: s(np(n_property_type(loại),
n_component_screen('màn hình'),
np_tablet(n_tablet('máy tính bảng'),
pn_tablet_name('Nexus 7'))), vp(verb(là),
literal('cảm ứng')), interrog('phải không'))
3.3. Questions for finding the tablets by using query’s information
According to [1], questions for finding the tablets based on the query’s information are type 3.
We present the syntactic structures of this question type in Table 4.
Table 4: The syntactic structure of question type 3
Syntactic structure Examples
<wh_tablet> verb <n_component> verb
<n_property> verb <literal> interrog
“Máy tính bảng nào có màn hình là loại ‘cảm
ứng’?”
Syntactic tree: s(wh_tablet('máy tính bảng nào'),
vp(verb(có), n_component_screen('màn hình'),
verb(là), n_property_type(loại), literal('cảm
ứng')))
<wh_tablet> verb <n_property> [preposition]
<n_component> verb <literal> interrog
“Máy tính bảng nào có kích thước [của] màn
hình là ‘7 inch’?”
Syntactic tree: s(wh_tablet('máy tính bảng nào'),
vp(verb(có), n_property('kích thước'),
n_component_screen(‘màn hình’), verb(là),
literal('7 inch')))
<n_component> [preposition] <wh_tablet>
verb <n_property> verb <literal> interrog
“Màn hình [của] máy tính bảng nào có loại là
‘cảm ứng’?”
Syntactic tree: s(n_component_screen(‘màn
hình’), wh_tablet('máy tính bảng nào'),
vp(verb(có), n_property_type(loại), verb(là),
literal('cảm ứng')))
<n_property> [preposition] <n_component>
[preposition] <wh_tablet> verb <literal>
“Loại [của] màn hình [của] máy tính bảng nào
là ‘cảm ứng’?”
7. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
7
interrog Syntactic tree: s(n_property_type(loại),
n_component_screen(‘màn hình’),
wh_tablet('máy tính bảng nào'), vp(verb(là),
literal('cảm ứng'))
<wh_tablet> verb <n_component> interrog “Máy tính bảng nào có camera trước?”
Syntactic tree: s(wh_tablet('máy tính bảng
nào'), vp(verb(có),
n_component_front_camera(‘camera trước’))
Basing on the structures of these types presented in Table 2, Table 3 and Table 4, we define the
syntactic rules for the grammar of the system. We use Definite Clause Grammar (DCG) [2] to
define the syntactic rules for these question types which are handled in V-DLG~TABL.
In example 1, we illustrate a grammar which is built for analyzing a given question.
Example 1: Give the question “Máy tính bảng Nexus 7 có màn hình rộng bao nhiêu?”
(English: “How wide does the tablet Nexus 7 screen has?”)
- The syntactic element for the word “màn hình” is represented by n_component_screen.
- The syntactic component for the word “máy tính bảng” is represented by n_tablet.
The Definite Clause Grammar (DCG) in Table 5 is defined for analyzing the sentence in example
1.
Table 5: A Definite Clause Grammar (DCG) defined for analyzing the question in example 1
sentence(s(NP, VP, INTERROG)) --> np_tablet_nexus(NP), vp_have_screen(VP),
interrog_how_many(INTERROG).
vp_have_screen(vp(V, N)) --> v_have(V), n_screen(N).
v_have(verb('có')) --> ['có'].
n_screen(n_component_screen('màn hình')) --> [màn, hình].
np_tablet_nexus(np_tablet(N, PN)) --> n_tablet(N), pn_nexus_7(PN).
n_tablet(n_tablet('máy tính bảng')) --> [máy, tính, bảng].
pn_nexus_7(pn_tablet_name('Nexus 7')) --> ['Nexus 7'].
interrog_how_many(interrog('rộng bao nhiêu')) --> [rộng, bao, nhiêu].
With the Definite Clause Grammar (DCG) in Table 5, the system can returns the syntactic tree in
Prolog when a client inputs the following question:
sentence(S, [máy, tính, bảng, 'Nexus 7', có, màn, hình, rộng, bao, nhiêu], []).
The syntactic tree of example 1 is returned by Prolog as follows:
S = s(np_tablet(n_tablet('máy tính bảng'), pn_tablet_name('Nexus 7')), vp(verb(có),
n_component_screen('màn hình')), interrog('rộng bao nhiêu')).
8. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
8
Figure 1: The syntactic tree of the question in example 1
4. CONCLUSIONS
Basing on the distinction of question types and the “information structure model” of Vietnamese
questions about the tablets which are proposed in [1], we apply the “Phrase Structure Grammar”
of N. Chomsky [9] to develop a syntactic analysis model that is specific and suitable for V-
DLG~TABL system. This syntactic analysis model is used to implement “V-DLG~TABL
Syntactic Parsing and Processing” component of system.
We have tested “V-DLG~TABL Syntactic Parsing and Processing” component of the system to
evaluate the ability of answering Vietnamese questions. This component of system is able to
answer exactly 141 of 150 tested Vietnamese questions about tablets.
ACKNOWLEDGEMENTS
This research is funded by University of Information Technology, Vietnam National University –
Ho Chi Minh City (VNU-HCM), under grant number C2011CTTT-06.
REFERENCES
[1] Nguyễn Thành, Phạm Minh Tiến, "Xây dựng cơ chế hỏi đáp tiếng Việt cho hệ thống tìm kiếm sản
phẩm máy tính bảng", B.Sc. Thesis in Computer Science, University of Information Technology,
Vietnam National University – Ho Chi Minh City, 2012.
[2] Fernando C. N. Pereira, Stuart M. Shieber, Prolog and Natural-Language Analysis, Digital
Edition, Microtome Publishing, Brookline, Massachusetts, 2002.
[3] Patrick Blackburn, Johan Bos, Representation and Inference for Natural Language: A First
Course in Computational Semantics, September 3, 1999.
[4] Phạm Thế Sơn, Hồ Quốc Thịnh, "Mô hình ngữ nghĩa cho câu trần thuật và câu hỏi tiếng Việt trong
hệ thống vấn đáp kiến thức lịch sử Việt Nam", B.Sc. Thesis in Computer Science, University of
Information Technology, Vietnam National University – Ho Chi Minh City, 2012.
[5] Vũ Thế Nhân, Trần Thế Toàn, "Cơ chế phân tích nội dung câu hỏi dựa trên ngữ nghĩa hình thức
cho hệ thống hỏi đáp tiếng Việt", B.Sc. Thesis in Computer Science, University of Information
Technology, Vietnam National University – Ho Chi Minh City, 2012.
9. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
9
[6] Lâm Thanh Cường, Huỳnh Ngọc Khuê, "Biểu diễn và xử lý ngữ nghĩa dựa trên FOL (First Order
Logic) cho các dạng câu đơn tiếng Việt trong hệ thống hỏi đáp kiến thức xã hội", B.Sc. Thesis in
Computer Science, University of Information Technology, Vietnam National University – Ho Chi
Minh City, 2012.
[7] Vương Đức Hiền, “Xây dựng công cụ truy vấn tiếng Việt về các phần mềm máy tính”, B.Sc.
Thesis in Computer Science, University of Information Technology, Vietnam National University
– Ho Chi Minh City, 2013.
[8] Son The Pham and Dang Tuan Nguyen, "Processing Vietnamese News Titles to Answer Relative
Questions in VNEWSQA/ICT System", International Journal on Natural Language Computing
(IJNLC), Vol. 2, No. 6, December 2013, pp. 39-51. ISSN: 2278 - 1307 [Online]; 2319 - 4111
[Print].
[9] Noam Chomsky, Syntactic Structures, The Hague: Mouton & Co., 1957.
[10] Phan Thị Thể, "Cơ chế xử lý câu hỏi tiếng Việt cho hệ thống truy vấn thông tin đào tạo hệ tín
chỉ", Master Thesis in Data Transmission and Computer Network, Posts and Telecommunications
Institute of Technology, 2012.
[11] Phan Thị Thể, "Xây dựng cơ chế truy vấn dựa trên ngữ nghĩa của các cụm từ tiếng Việt cho hệ
thống tìm kiếm việc làm", Master Thesis in Information Technology (Computer Science),
University of Information Technology, Vietnam National University – Ho Chi Minh City, 2013.