This paper introduces a Vietnamese automatic dialog mechanism which allows the V-DLG~TABL system to automatically communicate with clients. This dialog mechanism is based on the Question – Answering engine of V-DLG~TABL system, and composes the following supplementary mechanisms: 1) the mechanism of choosing the suggested question; 2) the mechanism of managing the conversations and suggesting information; 3) the mechanism of resolving the questions having anaphora; 4) the scenarios of dialog (in this stage, there is only one simple scenario defined for the system).
OOAD - UML - Sequence and Communication Diagrams - LabVicter Paul
The document discusses interaction diagrams, specifically sequence diagrams and communication diagrams. It explains that interaction diagrams show interactions between objects by depicting the messages exchanged. A sequence diagram emphasizes the time ordering of messages, showing objects arranged from left to right and messages ordered from top to bottom. A communication diagram emphasizes the structural organization of objects, showing them as vertices connected by links along which messages pass. Both diagram types are semantically equivalent but visualize information differently based on their focus. Examples of sequence and communication diagrams are provided for processes like patient admission to a hospital.
OOAD - UML - Class and Object Diagrams - LabVicter Paul
The document discusses class diagrams and object diagrams. It explains that a class diagram shows the structure of a system by displaying classes, interfaces, and their relationships, while an object diagram shows specific instances of classes at a point in time. The document provides steps for constructing class diagrams, such as identifying classes and relationships. It also discusses how object diagrams are created based on class diagrams by instantiating classes and depicting their relationships.
1) The document discusses the development of theoretical bases for logical domain modeling of complex software systems. It proposes a mathematical formalization of the basic conceptual units used in logical domain models, including objects, relationships, attributes, and states.
2) Key concepts are defined, such as an "object" representing an abstraction of related domain elements, and "relationships" abstracting relationships between objects. A set of properties and characteristics are used to define objects and relationships.
3) A theorem is proposed stating that if two objects have a relationship, and one object is "less than or equal to" another object, then the relationship also exists between the less-than-or-equal objects. This formalizes the logical structure of
Resolving the semantics of vietnamese questions in v news qaict systemijaia
Recently we have built a VNewsQA/ICT system which can read the titles of Vietnamese news in the domain
of information and communication technology, then process and use them to answer the Vietnamese
questions of users. The architecture of VNewsQA/ICT system has two main components: 1) the first
component treats the simple Vietnamese sentences as its natural language textual data which is used to
answer the user’s questions; 2) the second component resolves the semantics of Vietnamese questions
which query the system. This paper introduces a semantic representation model and a processing model to
revolve the Vietnamese questions in VNewsQA/ICT system. These semantic representation and processing
models are able to resolve the semantics of eight Vietnamese question classes which are used in our
system.
An in-depth exploration of Bangla blog post classificationjournalBEEI
Bangla blog is increasing rapidly in the era of information, and consequently, the blog has a diverse layout and categorization. In such an aptitude, automated blog post classification is a comparatively more efficient solution in order to organize Bangla blog posts in a standard way so that users can easily find their required articles of interest. In this research, nine supervised learning models which are Support Vector Machine (SVM), multinomial naïve Bayes (MNB), multi-layer perceptron (MLP), k-nearest neighbours (k-NN), stochastic gradient descent (SGD), decision tree, perceptron, ridge classifier and random forest are utilized and compared for classification of Bangla blog post. Moreover, the performance on predicting blog posts against eight categories, three feature extraction techniques are applied, namely unigram TF-IDF (term frequency-inverse document frequency), bigram TF-IDF, and trigram TF-IDF. The majority of the classifiers show above 80% accuracy. Other performance evaluation metrics also show good results while comparing the selected classifiers.
This paper compares the performance of two Binary Decision Diagram (BDD) libraries in Java - JavaBDD and the author's own implementation. It uses differential testing to check that the libraries produce functionally equivalent results. Execution times are measured for constructing BDDs of different sizes. Statistical analysis using t-tests, histograms and boxplots shows that JavaBDD has significantly faster execution times than the author's library, and the difference increases as the BDD size increases. The paper concludes JavaBDD has better performance and advocates improving BDD library implementations.
Processing vietnamese news titles to answer relative questions in vnewsqa ict...ijnlc
This paper introduces two important elements of our VNewsQA/ICT system: its semantic models
of simple Vietnamese sentences and its semantic processing mechanism. The VNewsQA/ICT is a
Vietnamese based Question Answering system which has the ability to gather information from
some Vietnamese news title forms on the ICTnews websites (http://www.ictnews.vn), instead of
using a database or a knowledge base, to answer the related Vietnamese questions in the domain
of information and communications technology.
OOAD - UML - Sequence and Communication Diagrams - LabVicter Paul
The document discusses interaction diagrams, specifically sequence diagrams and communication diagrams. It explains that interaction diagrams show interactions between objects by depicting the messages exchanged. A sequence diagram emphasizes the time ordering of messages, showing objects arranged from left to right and messages ordered from top to bottom. A communication diagram emphasizes the structural organization of objects, showing them as vertices connected by links along which messages pass. Both diagram types are semantically equivalent but visualize information differently based on their focus. Examples of sequence and communication diagrams are provided for processes like patient admission to a hospital.
OOAD - UML - Class and Object Diagrams - LabVicter Paul
The document discusses class diagrams and object diagrams. It explains that a class diagram shows the structure of a system by displaying classes, interfaces, and their relationships, while an object diagram shows specific instances of classes at a point in time. The document provides steps for constructing class diagrams, such as identifying classes and relationships. It also discusses how object diagrams are created based on class diagrams by instantiating classes and depicting their relationships.
1) The document discusses the development of theoretical bases for logical domain modeling of complex software systems. It proposes a mathematical formalization of the basic conceptual units used in logical domain models, including objects, relationships, attributes, and states.
2) Key concepts are defined, such as an "object" representing an abstraction of related domain elements, and "relationships" abstracting relationships between objects. A set of properties and characteristics are used to define objects and relationships.
3) A theorem is proposed stating that if two objects have a relationship, and one object is "less than or equal to" another object, then the relationship also exists between the less-than-or-equal objects. This formalizes the logical structure of
Resolving the semantics of vietnamese questions in v news qaict systemijaia
Recently we have built a VNewsQA/ICT system which can read the titles of Vietnamese news in the domain
of information and communication technology, then process and use them to answer the Vietnamese
questions of users. The architecture of VNewsQA/ICT system has two main components: 1) the first
component treats the simple Vietnamese sentences as its natural language textual data which is used to
answer the user’s questions; 2) the second component resolves the semantics of Vietnamese questions
which query the system. This paper introduces a semantic representation model and a processing model to
revolve the Vietnamese questions in VNewsQA/ICT system. These semantic representation and processing
models are able to resolve the semantics of eight Vietnamese question classes which are used in our
system.
An in-depth exploration of Bangla blog post classificationjournalBEEI
Bangla blog is increasing rapidly in the era of information, and consequently, the blog has a diverse layout and categorization. In such an aptitude, automated blog post classification is a comparatively more efficient solution in order to organize Bangla blog posts in a standard way so that users can easily find their required articles of interest. In this research, nine supervised learning models which are Support Vector Machine (SVM), multinomial naïve Bayes (MNB), multi-layer perceptron (MLP), k-nearest neighbours (k-NN), stochastic gradient descent (SGD), decision tree, perceptron, ridge classifier and random forest are utilized and compared for classification of Bangla blog post. Moreover, the performance on predicting blog posts against eight categories, three feature extraction techniques are applied, namely unigram TF-IDF (term frequency-inverse document frequency), bigram TF-IDF, and trigram TF-IDF. The majority of the classifiers show above 80% accuracy. Other performance evaluation metrics also show good results while comparing the selected classifiers.
This paper compares the performance of two Binary Decision Diagram (BDD) libraries in Java - JavaBDD and the author's own implementation. It uses differential testing to check that the libraries produce functionally equivalent results. Execution times are measured for constructing BDDs of different sizes. Statistical analysis using t-tests, histograms and boxplots shows that JavaBDD has significantly faster execution times than the author's library, and the difference increases as the BDD size increases. The paper concludes JavaBDD has better performance and advocates improving BDD library implementations.
Processing vietnamese news titles to answer relative questions in vnewsqa ict...ijnlc
This paper introduces two important elements of our VNewsQA/ICT system: its semantic models
of simple Vietnamese sentences and its semantic processing mechanism. The VNewsQA/ICT is a
Vietnamese based Question Answering system which has the ability to gather information from
some Vietnamese news title forms on the ICTnews websites (http://www.ictnews.vn), instead of
using a database or a knowledge base, to answer the related Vietnamese questions in the domain
of information and communications technology.
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELijaia
The document presents a new model for extractive text summarization that uses BERT (Bidirectional Encoder Representations from Transformers), a pretrained deep bidirectional transformer model, as the text encoder. The model consists of a BERT encoder and a sentence classifier. Sentences are encoded using BERT and classified as to whether they should be included in the summary. Evaluation on the CNN/Daily Mail corpus shows the model achieves state-of-the-art results comparable to other top models according to automatic metrics and human evaluation, making it the first work to apply BERT to text summarization.
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET Journal
The document discusses sentiment analysis of online product reviews using machine learning algorithms. It first provides background on sentiment analysis and its uses. It then describes preprocessing customer review data and extracting features using count and TF-IDF vectorization. Three machine learning algorithms are tested - support vector machine (SVM), random forest, and XGBoost classifier. The results show that XGBoost achieved higher accuracy than SVM and random forest for sentiment classification of the product review data.
This document discusses fuzzy logical databases and an efficient algorithm for evaluating fuzzy equi-joins. It begins with an introduction to fuzzy concepts in databases, including representing imprecise data using fuzzy sets and membership functions. It then defines a new measure for fuzzy equality that is used to define a fuzzy equi-join. The document proposes a sort-merge join algorithm that sorts relations based on a partial order of intervals to efficiently evaluate the fuzzy equi-join in two phases: sorting and joining. Experimental results are said to show a significant improvement in efficiency when using this algorithm.
Assessment of Programming Language Reliability Utilizing Soft-Computingijcsa
The document discusses assessing programming language reliability using soft computing techniques like fuzzy logic and genetic algorithms. It proposes using these methods to model programming language reliability based on linguistic variables like "Reliable", "Moderately Reliable", and "Not Reliable". The key factors examined for determining a programming language's reliability include syntax consistency, semantic consistency, error handling, modularity, and documentation. A soft computing system is simulated to evaluate programming languages based on these reliability criteria.
This document summarizes Jessica Hullman's project modeling word sense disambiguation using support vector machines. She used a dataset from Senseval-2 and achieved an average accuracy of 87% at assigning word senses, with a standard deviation of 15% and median of 92%. The project involved modifying an existing implementation that used part-of-speech tags of neighboring words to classify word senses, training support vector machine classifiers on the Senseval-2 data.
The document discusses fuzzy logical databases and an efficient algorithm for evaluating fuzzy equi-joins. It introduces fuzzy concepts in databases, defines a new measure for fuzzy equality, and proposes a fuzzy equi-join based on this measure. It then presents a sort-merge join algorithm called SMFEJ that uses interval ordering to efficiently evaluate the fuzzy equi-join in two phases: sorting relations on join attributes, and joining tuples with overlapping ranges.
Extractive Summarization with Very Deep Pretrained Language Modelgerogepatton
The document describes a study that used BERT (Bidirectional Encoder Representations from Transformers), a pretrained language model, for extractive text summarization. The researchers developed a two-phase encoder-decoder model where BERT encoded sentences from documents and classified them as included or not included in the summary. They evaluated the model on the CNN/Daily Mail corpus and found it achieved state-of-the-art results comparable to previous models based on both automatic metrics and human evaluation.
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSijdms
In this paper, we describe the developed model of the Convolutional Neural Networks CNN to a
classification of advertisements. The developed method has been tested on both texts (Arabic and Slovak
texts).The advertisements are chosen on a classified advertisements websites as short texts. We evolved a
modified model of the CNN, we have implemented it and developed next modifications. We studied their
influence on the performing activity of the proposed network. The result is a functional model of the
network and its implementation in Java and Python. And analysis of model results using different
parameters for the network and input data. The results on experiments data show that the developed model
of CNN is useful in the domains of Arabic and Slovak short texts, mainly for some classification of
advertisements. This paper gives complete guidelines for authors submitting papers for the AIRCC
Journals.
The document provides information on databases, DBMS, database systems, advantages of DBMS over file processing systems, levels of data abstraction, integrity rules, extension and intension, System R, data independence, views, data models including E-R and object-oriented models, entities, entity sets, attributes, relations, relationships, keys, normalization, relational algebra, relational calculus, and other database concepts.
The document discusses various software design patterns including Strategy, State, Bridge, Composite, Flyweight, Interpreter, Decorator, Chain of Responsibility, Facade, Adapter, Proxy, Command, Memento, Iterator, Mediator, Observer, Template Method, Visitor, Factory Method, Prototype, Abstract Factory, Builder, and Singleton. For each pattern, it provides a brief definition and example use cases. It also includes links to Wikipedia pages with more detailed explanations of each design pattern.
This document summarizes an approach to segmenting search interfaces using a two-layered hidden Markov model (HMM). The first layer uses a T-HMM to tag interface components with semantic labels like attribute-name, operator, and operand. The second layer uses an S-HMM to segment the interface into logical attributes by grouping related tagged components. The approach models an artificial designer that learns to segment interfaces by training the HMMs on manually segmented examples. It was tested on 200 biology search interfaces and showed promising results for extracting the underlying database querying semantics from the interface structure. Future work aims to improve schema extraction and domain coverage.
The document defines various database concepts including database, DBMS, database system, data independence, data models, relational algebra, relational calculus, normalization, and SQL. It also describes database architecture including the RDBMS kernel, subsystems, data dictionary, and how users communicate with an RDBMS using SQL. The key differences between SQL and other languages are that SQL is non-procedural and declarative, allowing users to specify what data to retrieve rather than how to retrieve it.
The document discusses various Unified Modeling Language (UML) diagrams used to model different aspects of software systems. It describes structure diagrams like class diagrams that show system composition and deployment diagrams that map software to hardware. It also covers behavioral diagrams like use case diagrams, interaction diagrams (sequence and communication diagrams), state-chart diagrams, and activity diagrams that model dynamic system behavior through object interactions and state transitions. Specific examples are provided for how to construct and interpret sequence diagrams, state machine diagrams, and activity diagrams.
The document provides an overview of key concepts in object-oriented design, including objects, classes, inheritance, encapsulation, and polymorphism. It discusses how objects contain data and methods that operate on the data, and classes act as templates for creating objects. Inheritance allows new classes to extend existing classes, reusing and modifying their attributes and methods. Encapsulation involves objects communicating through messages and hiding their internal data within methods.
Object Oriented Programming Implementation of Scalar, Vector, and Tensor Vari...LLGYeo
This document summarizes and compares two object-oriented programming models - a polymorphic model and a template-based model - for implementing scalar, vector, and tensor variables for 1D to 3D calculations. The current Complex Systems Platform (CSP) library uses a template-based model with individual template classes for each variable type, which has issues with interoperability between classes. As an alternative, the author created a polymorphic model to address these interoperability issues, but found it was slower than the template model based on speed benchmarking tests. Both models showed no difference in storage requirements. While the polymorphic model solves interoperability, it comes at the cost of runtime speed.
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...csandit
Text mining and Text classification are the two pro
minent and challenging tasks in the field of
Machine learning. Text mining refers to the process
of deriving high quality and relevant
information from text, while Text classification de
als with the categorization of text documents
into different classes. The real challenge in these
areas is to address the problems like handling
large text corpora, similarity of words in text doc
uments, and association of text documents with
a subset of class categories. The feature extractio
n and classification of such text documents
require an efficient machine learning algorithm whi
ch performs automatic text classification.
This paper describes the classification of product
review documents as a multi-label
classification scenario and addresses the problem u
sing Structured Support Vector Machine.
The work also explains the flexibility and performan
ce of the proposed approach for e
fficient text classification.
Latent factor models for Collaborative Filteringsscdotopen
The document discusses latent factor models for collaborative filtering. It describes how latent factor models (1) map both users and items to a latent factor space to characterize them, (2) approximate ratings as the dot product of user and item vectors, and (3) can be used to predict unknown ratings. It also covers techniques like stochastic gradient descent and alternating least squares for training latent factor models on explicit and implicit feedback data.
This document provides an overview of object-oriented design (OOD) using the Unified Modeling Language (UML). It defines key OOD concepts like entity objects, interface objects, control objects, and persistence classes. It also describes design relationships like dependency and navigability. The document outlines the process of OOD including refining use cases, modeling object interactions, identifying object states and responsibilities, and updating class diagrams. It provides examples of UML diagrams used in design like sequence diagrams, collaboration diagrams, component diagrams, and deployment diagrams.
This document describes a verb-based sentiment analysis of Manipuri language documents using conditional random fields for part-of-speech tagging and a manually annotated verb lexicon to determine sentiment polarity. The system was tested on 550 letters to newspaper editors, achieving an average recall of 72.1%, precision of 78.14%, and F-measure of 75% for sentiment classification. The authors conclude the work is an initial effort in sentiment analysis for the highly agglutinative Manipuri language and more methods are needed to improve accuracy.
An expert system for automatic reading of a text written in standard arabicijnlc
In this work we present our expert system of Automatic reading or speech synthesis based on a text
written in Standard Arabic, our work is carried out in two great stages: the creation of the sound data
base, and the transformation of the written text into speech (Text To Speech TTS). This transformation is
done firstly by a Phonetic Orthographical Transcription (POT) of any written Standard Arabic text with
the aim of transforming it into his corresponding phonetics sequence, and secondly by the generation of
the voice signal which corresponds to the chain transcribed. We spread out the different of conception of
the system, as well as the results obtained compared to others works studied to realize TTS based on
Standard Arabic.
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELijaia
The document presents a new model for extractive text summarization that uses BERT (Bidirectional Encoder Representations from Transformers), a pretrained deep bidirectional transformer model, as the text encoder. The model consists of a BERT encoder and a sentence classifier. Sentences are encoded using BERT and classified as to whether they should be included in the summary. Evaluation on the CNN/Daily Mail corpus shows the model achieves state-of-the-art results comparable to other top models according to automatic metrics and human evaluation, making it the first work to apply BERT to text summarization.
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET Journal
The document discusses sentiment analysis of online product reviews using machine learning algorithms. It first provides background on sentiment analysis and its uses. It then describes preprocessing customer review data and extracting features using count and TF-IDF vectorization. Three machine learning algorithms are tested - support vector machine (SVM), random forest, and XGBoost classifier. The results show that XGBoost achieved higher accuracy than SVM and random forest for sentiment classification of the product review data.
This document discusses fuzzy logical databases and an efficient algorithm for evaluating fuzzy equi-joins. It begins with an introduction to fuzzy concepts in databases, including representing imprecise data using fuzzy sets and membership functions. It then defines a new measure for fuzzy equality that is used to define a fuzzy equi-join. The document proposes a sort-merge join algorithm that sorts relations based on a partial order of intervals to efficiently evaluate the fuzzy equi-join in two phases: sorting and joining. Experimental results are said to show a significant improvement in efficiency when using this algorithm.
Assessment of Programming Language Reliability Utilizing Soft-Computingijcsa
The document discusses assessing programming language reliability using soft computing techniques like fuzzy logic and genetic algorithms. It proposes using these methods to model programming language reliability based on linguistic variables like "Reliable", "Moderately Reliable", and "Not Reliable". The key factors examined for determining a programming language's reliability include syntax consistency, semantic consistency, error handling, modularity, and documentation. A soft computing system is simulated to evaluate programming languages based on these reliability criteria.
This document summarizes Jessica Hullman's project modeling word sense disambiguation using support vector machines. She used a dataset from Senseval-2 and achieved an average accuracy of 87% at assigning word senses, with a standard deviation of 15% and median of 92%. The project involved modifying an existing implementation that used part-of-speech tags of neighboring words to classify word senses, training support vector machine classifiers on the Senseval-2 data.
The document discusses fuzzy logical databases and an efficient algorithm for evaluating fuzzy equi-joins. It introduces fuzzy concepts in databases, defines a new measure for fuzzy equality, and proposes a fuzzy equi-join based on this measure. It then presents a sort-merge join algorithm called SMFEJ that uses interval ordering to efficiently evaluate the fuzzy equi-join in two phases: sorting relations on join attributes, and joining tuples with overlapping ranges.
Extractive Summarization with Very Deep Pretrained Language Modelgerogepatton
The document describes a study that used BERT (Bidirectional Encoder Representations from Transformers), a pretrained language model, for extractive text summarization. The researchers developed a two-phase encoder-decoder model where BERT encoded sentences from documents and classified them as included or not included in the summary. They evaluated the model on the CNN/Daily Mail corpus and found it achieved state-of-the-art results comparable to previous models based on both automatic metrics and human evaluation.
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSijdms
In this paper, we describe the developed model of the Convolutional Neural Networks CNN to a
classification of advertisements. The developed method has been tested on both texts (Arabic and Slovak
texts).The advertisements are chosen on a classified advertisements websites as short texts. We evolved a
modified model of the CNN, we have implemented it and developed next modifications. We studied their
influence on the performing activity of the proposed network. The result is a functional model of the
network and its implementation in Java and Python. And analysis of model results using different
parameters for the network and input data. The results on experiments data show that the developed model
of CNN is useful in the domains of Arabic and Slovak short texts, mainly for some classification of
advertisements. This paper gives complete guidelines for authors submitting papers for the AIRCC
Journals.
The document provides information on databases, DBMS, database systems, advantages of DBMS over file processing systems, levels of data abstraction, integrity rules, extension and intension, System R, data independence, views, data models including E-R and object-oriented models, entities, entity sets, attributes, relations, relationships, keys, normalization, relational algebra, relational calculus, and other database concepts.
The document discusses various software design patterns including Strategy, State, Bridge, Composite, Flyweight, Interpreter, Decorator, Chain of Responsibility, Facade, Adapter, Proxy, Command, Memento, Iterator, Mediator, Observer, Template Method, Visitor, Factory Method, Prototype, Abstract Factory, Builder, and Singleton. For each pattern, it provides a brief definition and example use cases. It also includes links to Wikipedia pages with more detailed explanations of each design pattern.
This document summarizes an approach to segmenting search interfaces using a two-layered hidden Markov model (HMM). The first layer uses a T-HMM to tag interface components with semantic labels like attribute-name, operator, and operand. The second layer uses an S-HMM to segment the interface into logical attributes by grouping related tagged components. The approach models an artificial designer that learns to segment interfaces by training the HMMs on manually segmented examples. It was tested on 200 biology search interfaces and showed promising results for extracting the underlying database querying semantics from the interface structure. Future work aims to improve schema extraction and domain coverage.
The document defines various database concepts including database, DBMS, database system, data independence, data models, relational algebra, relational calculus, normalization, and SQL. It also describes database architecture including the RDBMS kernel, subsystems, data dictionary, and how users communicate with an RDBMS using SQL. The key differences between SQL and other languages are that SQL is non-procedural and declarative, allowing users to specify what data to retrieve rather than how to retrieve it.
The document discusses various Unified Modeling Language (UML) diagrams used to model different aspects of software systems. It describes structure diagrams like class diagrams that show system composition and deployment diagrams that map software to hardware. It also covers behavioral diagrams like use case diagrams, interaction diagrams (sequence and communication diagrams), state-chart diagrams, and activity diagrams that model dynamic system behavior through object interactions and state transitions. Specific examples are provided for how to construct and interpret sequence diagrams, state machine diagrams, and activity diagrams.
The document provides an overview of key concepts in object-oriented design, including objects, classes, inheritance, encapsulation, and polymorphism. It discusses how objects contain data and methods that operate on the data, and classes act as templates for creating objects. Inheritance allows new classes to extend existing classes, reusing and modifying their attributes and methods. Encapsulation involves objects communicating through messages and hiding their internal data within methods.
Object Oriented Programming Implementation of Scalar, Vector, and Tensor Vari...LLGYeo
This document summarizes and compares two object-oriented programming models - a polymorphic model and a template-based model - for implementing scalar, vector, and tensor variables for 1D to 3D calculations. The current Complex Systems Platform (CSP) library uses a template-based model with individual template classes for each variable type, which has issues with interoperability between classes. As an alternative, the author created a polymorphic model to address these interoperability issues, but found it was slower than the template model based on speed benchmarking tests. Both models showed no difference in storage requirements. While the polymorphic model solves interoperability, it comes at the cost of runtime speed.
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...csandit
Text mining and Text classification are the two pro
minent and challenging tasks in the field of
Machine learning. Text mining refers to the process
of deriving high quality and relevant
information from text, while Text classification de
als with the categorization of text documents
into different classes. The real challenge in these
areas is to address the problems like handling
large text corpora, similarity of words in text doc
uments, and association of text documents with
a subset of class categories. The feature extractio
n and classification of such text documents
require an efficient machine learning algorithm whi
ch performs automatic text classification.
This paper describes the classification of product
review documents as a multi-label
classification scenario and addresses the problem u
sing Structured Support Vector Machine.
The work also explains the flexibility and performan
ce of the proposed approach for e
fficient text classification.
Latent factor models for Collaborative Filteringsscdotopen
The document discusses latent factor models for collaborative filtering. It describes how latent factor models (1) map both users and items to a latent factor space to characterize them, (2) approximate ratings as the dot product of user and item vectors, and (3) can be used to predict unknown ratings. It also covers techniques like stochastic gradient descent and alternating least squares for training latent factor models on explicit and implicit feedback data.
This document provides an overview of object-oriented design (OOD) using the Unified Modeling Language (UML). It defines key OOD concepts like entity objects, interface objects, control objects, and persistence classes. It also describes design relationships like dependency and navigability. The document outlines the process of OOD including refining use cases, modeling object interactions, identifying object states and responsibilities, and updating class diagrams. It provides examples of UML diagrams used in design like sequence diagrams, collaboration diagrams, component diagrams, and deployment diagrams.
This document describes a verb-based sentiment analysis of Manipuri language documents using conditional random fields for part-of-speech tagging and a manually annotated verb lexicon to determine sentiment polarity. The system was tested on 550 letters to newspaper editors, achieving an average recall of 72.1%, precision of 78.14%, and F-measure of 75% for sentiment classification. The authors conclude the work is an initial effort in sentiment analysis for the highly agglutinative Manipuri language and more methods are needed to improve accuracy.
An expert system for automatic reading of a text written in standard arabicijnlc
In this work we present our expert system of Automatic reading or speech synthesis based on a text
written in Standard Arabic, our work is carried out in two great stages: the creation of the sound data
base, and the transformation of the written text into speech (Text To Speech TTS). This transformation is
done firstly by a Phonetic Orthographical Transcription (POT) of any written Standard Arabic text with
the aim of transforming it into his corresponding phonetics sequence, and secondly by the generation of
the voice signal which corresponds to the chain transcribed. We spread out the different of conception of
the system, as well as the results obtained compared to others works studied to realize TTS based on
Standard Arabic.
Smart grammar a dynamic spoken language understanding grammar for inflective ...ijnlc
1. The document proposes SmartGrammar, a new method for developing spoken language understanding grammars for inflectional languages like Italian.
2. SmartGrammar uses a morphological analyzer to convert user utterances into their canonical forms before parsing, allowing the grammar to contain only canonical word forms rather than all possible inflections.
3. This significantly reduces the complexity and size of grammars for inflectional languages by representing many possible inflected forms with a single canonical form entry, making grammar development and management easier.
Hybrid part of-speech tagger for non-vocalized arabic textijnlc
The document presents a hybrid part-of-speech tagging method for Arabic text that combines rule-based and statistical approaches. Rule-based tagging alone can misclassify words and leave some untagged, so the method integrates it with a Hidden Markov Model tagger. The hybrid approach is evaluated on two Arabic corpora and achieves accuracy rates of 97.6% and 98%, outperforming the individual rule-based and HMM taggers.
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONijnlc
This document presents a multi-stream HMM approach for offline handwritten Arabic word recognition. It extracts two sets of features from each word using a sliding window approach and VH2D projection approach. These features are input to separate HMM classifiers, and the outputs are combined in a multi-stream HMM to provide more reliable recognition. The system is evaluated on 200 words, achieving a recognition rate of 83.8% using the multi-stream approach compared to 78.2% and 76.6% for the individual classifiers.
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...ijnlc
The document discusses the implementation of a natural language interface (NLization) framework for Punjabi using the EUGENE system. It focuses on generating Punjabi sentences from UNL representations for verbs, pronouns, and determiners. Rules and dictionaries are added to EUGENE to analyze the UNL syntax and semantics and output corresponding Punjabi sentences without human intervention. Examples are provided to illustrate the NLization process for sentences containing different parts of speech.
CLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSINGijnlc
The process of browsing Search Results is one of the major problems with traditional Web search engines
for English, European, and any other languages generally, and for Arabic Language particularly. This
process is absolutely time consuming and the browsing style seems to be unattractive. Organizing Web
search results into clusters facilitates users quick browsing through search results. Traditional clustering
techniques (data-centric clustering algorithms) are inadequate since they don't generate clusters with
highly readable names or cluster labels. To solve this problem, Description-centric algorithms such as
Suffix Tree Clustering (STC) algorithm have been introduced and used successfully and extensively with
different adapted versions for English, European, and Chinese Languages. However, till the day of writing
this paper, in our knowledge, STC algorithm has been never applied for Arabic Web Snippets Search
Results Clustering.
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATIONijnlc
The document discusses handling unknown words in named entity recognition using transliteration. It proposes an approach where named entities in training data are transliterated into other languages and stored in transliteration files. During testing, if an unknown entity is encountered, it is checked against the transliteration files and assigned the corresponding tag if found. The approach is shown to achieve 95.8% recall, 96.3% precision and 96.04% F-measure on a multilingual named entity recognition task handling words from English, Hindi, Marathi, Punjabi and Urdu. Performance metrics for named entity recognition systems such as precision, recall and F-measure are also discussed.
This paper deals about the chunking of the Manipuri language, which is very highly agglutinative in
Nature. The system works in such a way that the Manipuri text is clean upto the gold standard. The text is
processed for Part of Speech (POS) tagging using Conditional Random Field (CRF). The output file is
treated as an input file for the CRF based Chunking system. The final output is a completely chunk tag
Manipuri text. The system shows a recall of 71.30%, a precision of 77.36% and a F-measure of 74.21%.
Developemnt and evaluation of a web based question answering system for arabi...ijnlc
Question Answering (QA) systems are gaining great importance due to the increasing amount of web
content and the high demand for digital information that regular information retrieval techniques cannot
satisfy. A question answering system enables users to have a natural language dialog with the machine,
which is required for virtually all emerging online service systems on the Internet. The need for such
systems is higher in the context of the Arabic language. This is because of the scarcity of Arabic QA
systems, which can be attributed to the great challenges they present to the research community,including
theparticularities of Arabic, such as short vowels, absence of capital letters, complex morphology, etc. In
this paper, we report the design and implementation of an Arabic web-based question answering
system,which we called “JAWEB”, the Arabic word for the verb “answer”. Unlike all Arabic questionanswering
systems, JAWEB is a web-based application,so it can be accessed at any time and from
anywhere. Evaluating JAWEBshowed that it gives the correct answer with 100% recall and 80% precision
on average. When comparedto ask.com, the well-established web-based QA system, JAWEBprovided 15-
20% higher recall.These promising results give clear evidence that JAWEB has great potential as a QA
platform and is much needed by Arabic-speaking Internet users across the world.
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODELijnlc
This document describes a tool called NERHMM for performing named entity recognition using hidden Markov models. The tool allows users to annotate raw text to create tagged corpora, train hidden Markov models on annotated data to calculate parameters, and test new text data to produce named entity tags. The tool works for multiple languages and can handle diverse tag sets. It provides a simple interface for tasks involved in named entity recognition like corpus development and parameter estimation for hidden Markov models. Evaluation on data shows the tool's performance increases with more training data.
An improved apriori algorithm for association rulesijnlc
There are several mining algorithms of association rules. One of the most popular algorithms is Apriori
that is used to extract frequent itemsets from large database and getting the association rule for
discovering the knowledge. Based on this algorithm, this paper indicates the limitation of the original
Apriori algorithm of wasting time for scanning the whole database searching on the frequent itemsets, and
presents an improvement on Apriori by reducing that wasted time depending on scanning only some
transactions. The paper shows by experimental results with several groups of transactions, and with
several values of minimum support that applied on the original Apriori and our implemented improved
Apriori that our improved Apriori reduces the time consumed by 67.38% in comparison with the original
Apriori, and makes the Apriori algorithm more efficient and less time consuming
A comparative analysis of particle swarm optimization and k means algorithm f...ijnlc
The volume of digitized text documents on the web have been increasing rapidly. As there is huge collection
of data on the web there is a need for grouping(clustering) the documents into clusters for speedy
information retrieval. Clustering of documents is collection of documents into groups such that the
documents within each group are similar to each other and not to documents of other groups. Quality of
clustering result depends greatly on the representation of text and the clustering algorithm. This paper
presents a comparative analysis of three algorithms namely K-means, Particle swarm Optimization (PSO)
and hybrid PSO+K-means algorithm for clustering of text documents using WordNet. The common way of
representing a text document is bag of terms. The bag of terms representation is often unsatisfactory as it
does not exploit the semantics. In this paper, texts are represented in terms of synsets corresponding to a
word. Bag of terms data representation of text is thus enriched with synonyms from WordNet. K-means,
Particle Swarm Optimization (PSO) and hybrid PSO+K-means algorithms are applied for clustering of
text in Nepali language. Experimental evaluation is performed by using intra cluster similarity and inter
cluster similarity.
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...ijnlc
The document proposes using part-of-speech tagging and stemming to improve Gujarati to Hindi machine translation through transliteration. It presents a system that applies stemming and POS tagging to Gujarati text before transliterating to resolve ambiguities. An evaluation of the system on 500 sentences found that transliteration and translation matched for 54.48% of Gujarati words, and overall transliteration efficiency was 93.09%. The approach aims to improve over direct transliteration for the highly inflected Gujarati language.
An exhaustive font and size invariant classification scheme for ocr of devana...ijnlc
The document presents a classification scheme for recognizing Devanagari characters that is invariant to font and size. It identifies the basic symbols that commonly appear in the middle zone of Devanagari text across different fonts and sizes. Through an analysis of over 465,000 words from various sources, it finds that 345 symbols account for 99.97% of text and aims to classify these into groups based on structural properties like the presence or absence of vertical bars. The proposed classification scheme is validated on 25 fonts and 3 sizes to demonstrate its font and size invariance.
Evaluation of subjective answers using glsa enhanced with contextual synonymyijnlc
Evaluation of subjective answers submitted in an exam is an essential but one of the most resource consuming educational activity. This paper details experiments conducted under our project to build a software that evaluates the subjective answers of informative nature in a given knowledge domain. The paper first summarizes the techniques such as Generalized Latent Semantic Analysis (GLSA) and Cosine Similarity that provide basis for the proposed model. The further sections point out the areas of improvement in the previous work and describe our approach towards the solutions of the same. We then discuss the implementation details of the project followed by the findings that show the improvements achieved. Our approach focuses on comprehending the various forms of expressing same entity and thereby capturing the subjectivity of text into objective parameters. The model is tested by evaluating answers submitted by 61 students of Third Year B. Tech. CSE class of Walchand College of Engineering Sangli in a test on Database Engineering.
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELSijnlc
Globalization and growth of Internet users truly demands for almost all internet based applications to
support
l
oca
l l
anguages. Support
of
l
oca
l
l
anguages can be
given in all internet based applications by
means of Machine Transliteration
and
Machine Translation
.
This paper provides the thorough survey on
machine transliteration models and machine learning
approaches
used for machine transliteration
over the
period
of more than two decades
for internationally used languages as well as Indian languages.
Survey
shows that linguistic approach provides better results for the closely related languages and probability
based statistical approaches are good when one of the
languages is phonetic and other is non
-
phonetic.
B
etter accuracy can be achieved only by using Hybrid and Combined models.
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAlijnlc
The rise of social media such as blogs and social n
etworks has fueled interest in sentiment analysis.
With
the proliferation of reviews, ratings, recommendati
ons and other forms of online expression, online op
inion
has turned into a kind of virtual currency for busi
nesses looking to market their products, identify n
ew
opportunities and manage their reputations, therefo
re many are now looking to the field of sentiment
analysis. In this paper, we present a feature-based
sentence level approach for Arabic sentiment analy
sis.
Our approach is using Arabic idioms/saying phrases
lexicon as a key importance for improving the
detection of the sentiment polarity in Arabic sente
nces as well as a number of novels and rich set of
linguistically motivated features (contextual Inten
sifiers, contextual Shifter and negation handling),
syntactic features for conflicting phrases which en
hance the sentiment classification accuracy.
Furthermore, we introduce an automatic expandable w
ide coverage polarity lexicon of Arabic sentiment
words. The lexicon is built with gold-standard sent
iment words as a seed which is manually collected a
nd
annotated and it expands and detects the sentiment
orientation automatically of new sentiment words us
ing
synset aggregation technique and free online Arabic
lexicons and thesauruses. Our data focus on modern
standard Arabic (MSA) and Egyptian dialectal Arabic
tweets and microblogs (hotel reservation, product
reviews, etc.). The experimental results using our
resources and techniques with SVM classifier indica
te
high performance levels, with accuracies of over 95
%.
Conceptual framework for abstractive text summarizationijnlc
As the volume of information available on the Internet increases, there is a growing need for tools helping users to find, filter and manage these resources. While more and more textual information is available on-line, effective retrieval is difficult without proper indexing and summarization of the content. One of the possible solutions to this problem is abstractive text summarization. The idea is to propose a system that will accept single document as input in English and processes the input by building a rich semantic graph and then reducing this graph for generating the final summary.
El documento lista diferentes tipos de música como reggae, rap, flamenco y house junto con artistas representativos de cada género. También muestra las ventas de discos de cada artista y el número total de personas que escuchan cada tipo de música. En total se vendieron 11051 discos y 63 personas escuchan los diferentes géneros musicales.
A syntactic analysis model for vietnamese questions in v dlg~tabl systemijnlc
This paper introduces a syntactic analysis model that we propose to parse and process the Vietnamese questions about tablets in V-DLG~TABL system, which is a Vietnamese Question – Answering system working based on automatic dialog mechanism. The V-DLG~TABL system is built to support clients using Vietnamese questions for searching tablets based on interaction between the clients and the system. We apply the “Phrase Structure Grammar” of Noam Chomsky to develop a syntactic analysis model that is specific and suitable for the V-DLG~TABL system. This syntactic analysis model is used to implement the “V-DLG~TABL Syntactic Parsing and Processing” component of the system.
Apply deep learning to improve the question analysis model in the Vietnamese ...IJECEIAES
Question answering (QA) system nowadays is quite popular for automated answering purposes, the meaning analysis of the question plays an important role, directly affecting the accuracy of the system. In this article, we propose an improvement for question-answering models by adding more specific question analysis steps, including contextual characteristic analysis, pos-tag analysis, and question-type analysis built on deep learning network architecture. Weights of extracted words through question analysis steps are combined with the best matching 25 (BM25) algorithm to find the best relevant paragraph of text and incorporated into the QA model to find the best and least noisy answer. The dataset for the question analysis step consists of 19,339 labeled questions covering a variety of topics. Results of the question analysis model are combined to train the question-answering model on the data set related to the learning regulations of Industrial University of Ho Chi Minh City. It includes 17,405 pairs of questions and answers for the training set and 1,600 pairs for the test set, where the robustly optimized BERT pretraining approach (RoBERTa) model has an F1-score accuracy of 74%. The model has improved significantly. For long and complex questions, the mode has extracted weights and correctly provided answers based on the question’s contents.
NLP Project: Machine Comprehension Using Attention-Based LSTM Encoder-Decoder...Eugene Nho
Machine comprehension remains a challenging open area of research. While many question answering models have been explored for existing datasets, little work has been done with the newly released MS MARCO dataset, which mirrors the reality much more closely and poses many unique challenges. We explore an end-to-end neural architecture with attention mechanisms for comprehending relevant information and generating text answers for MS MARCO.
SEMANTIC PROCESSING MECHANISM FOR LISTENING AND COMPREHENSION IN VNSCALENDAR ...ijnlc
The document describes the VNSCalendar system, which allows users to manage their personal calendar using Vietnamese speech commands. It has three key capabilities:
1. It can recognize Vietnamese speech commands and questions, convert them to text, analyze the syntax and semantics, generate database queries, and return results to the user.
2. The system's semantic processing mechanism analyzes syntax and semantics of Vietnamese commands and questions using Definite Clause Grammar and formal semantics techniques.
3. An evaluation of the system found it could accurately recognize over 91% of Vietnamese speech commands after being built and tested in a PC environment.
Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...kevig
This paper presents some generalities about the VNSCalendar system, a tool able to understand users’
voice commands, would help users with managing and querying their personal calendar by Vietnamese
speech. The main feature of this system consists in the fact that it is equipped with a mechanism of
analyzing syntax and semantics of Vietnamese commands and questions. The syntactic and semantic
processing of Vietnamese sentences is solved by using DCG (Definite Clause Grammar) and the methods of
formal semantics. This is the first system in this field of voice application, which is equipped an effective
semantic processing mechanism of Vietnamese language. Having been built and tested in PC environment,
our system proves its accuracy attaining more than 91%.
Generating requirements analysis models from textual requiremenfortes
This document describes a process for generating use case models from textual requirements. The process uses the EA-Miner tool to analyze textual requirements and extract information like functional concerns, RDL sentences, and a syntactically tagged document. This extracted information is used to derive initial candidate use cases, actors, and relationships. The candidate model is then refined by activities like removing undesirable use cases, completing abstraction names, adding new use cases/actors, and defining relationships between use cases. The overall goal is to reduce the time and effort required to produce requirements artifacts from textual specifications.
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATIONcsandit
The document proposes a new text categorization framework called LATTICE-CELL that is based on concepts lattice and cellular automata. It models concept structures using a Cellular Automaton for Symbolic Induction (CASI) in order to reduce the time complexity of categorization caused by concept lattices. The framework consists of a preprocessing module to create a vector representation of documents and a categorization module that generates the categorization model by representing the concept lattice structure as a cellular lattice. Experiments show the approach improves performance while reducing categorization time compared to other algorithms such as Naive Bayes and k-nearest neighbors.
IRJET- Chatbot Using Gated End-to-End Memory NetworksIRJET Journal
The document describes a proposed chatbot system that uses a gated end-to-end memory network model for hospital appointment booking. The model is trained on dialog data consisting of user utterances and bot responses related to booking appointments. It uses an attention mechanism over the dialog memory to select relevant parts of the conversation. The model is trained end-to-end to dynamically regulate interactions with the memory. Experiments show it can handle new combinations of fields when booking appointments in a simulated hospital reservation scenario.
hExarAbax makkAmasjix samayaM anni rojulu 5:00 am - 9:00 pm
(Mecca Masjid timings in Hyderabad - All days 5:00 am - 9:00 pm)
User query: makkAmasjix PIju eVMwa?
(What is the fee for Mecca Masjid?)
POS-tagger: makkAmasjix PIju/WQ eVMwa
Replace with root word: makkAmasjix PIju/WQ eMwa
Context Handler: Updates context to 'makkAmasjix'
Advanced Filter: Keywords - makkAmas
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSINGIJCI JOURNAL
This document summarizes a research paper that proposes using a combination of Natural Language Processing and statistical models to match features between different datasets. Specifically, it uses BERT (Bidirectional Encoder Representations from Transformers), a pretrained NLP model, in parallel with Jaccard similarity to measure similarity between feature lists. The hybrid approach reduces time required for manual feature matching compared to previous methods. The paper describes preprocessing data, generating embeddings with BERT, calculating similarity scores with BERT and Jaccard, and outputting top matches above a threshold. It provides example results matching house sales and movie metadata features. The hybrid approach leverages strengths of BERT's semantic understanding and Jaccard's flexibility for special characters.
IRJET- Implementation of Automatic Question Paper Generator SystemIRJET Journal
This document describes a proposed system for automatically generating question papers from input documents. The system performs several steps: it converts PDF/document files to text, conducts preprocessing like removing stop words, uses natural language processing and TF-IDF for key phrase extraction, checks phrases against Wikipedia for domain knowledge, generates triplets for questions, and checks the quality of generated questions using linguistic rules. The system aims to make the question paper generation process faster, more randomized and secure compared to traditional manual methods.
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKijnlc
In recent years, there has been an increasing use of social media among people in Myanmar and writing review on social media pages about the product, movie, and trip are also popular among people. Moreover, most of the people are going to find the review pages about the product they want to buy before deciding whether they should buy it or not. Extracting and receiving useful reviews over interesting products is very important and time consuming for people. Sentiment analysis is one of the important processes for extracting useful reviews of the products. In this paper, the Convolutional LSTM neural network architecture is proposed to analyse the sentiment classification of cosmetic reviews written in Myanmar Language. The paper also intends to build the cosmetic reviews dataset for deep learning and sentiment lexicon in Myanmar Language.
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Networkkevig
In recent years, there has been an increasing use of social media among people in Myanmar and writing
review on social media pages about the product, movie, and trip are also popular among people. Moreover,
most of the people are going to find the review pages about the product they want to buy before deciding
whether they should buy it or not. Extracting and receiving useful reviews over interesting products is very
important and time consuming for people. Sentiment analysis is one of the important processes for extracting
useful reviews of the products. In this paper, the Convolutional LSTM neural network architecture is
proposed to analyse the sentiment classification of cosmetic reviews written in Myanmar Language. The
paper also intends to build the cosmetic reviews dataset for deep learning and sentiment lexicon in Myanmar
Language.
IRJET- Factoid Question and Answering SystemIRJET Journal
This document describes a factoid question answering system that uses neural networks and the Tensorflow framework. The system takes in a text document and question as input. It then processes the input using techniques like gated recurrent units and support vector machines to classify the question. The system calculates attention between facts and the question, modifies its memory, and identifies the word closest to the answer to output as the response. Key aspects of the system include training a question answering engine with Tensorflow, storing and retrieving data, and generating the final answer.
The document discusses various topics related to computer programming including:
- What a program is and how it is made up of instructions for input, processing, and output.
- Common programming tools and concepts such as algorithms, data structures, programming methodologies, compilers, and operating systems.
- Specific data structures like arrays, linked lists, stacks, queues, and trees.
- Algorithm efficiency analysis including big O notation and analyzing best, average, and worst case running times.
- Problem solving approaches like formulating specifications, designing strategies, and implementing solutions recursively or iteratively.
IRJET- Semantics based Document ClusteringIRJET Journal
This document describes a proposed ontology-based document clustering system. The system uses a two-step clustering algorithm that first applies K-means partitioning clustering followed by hierarchical agglomerative clustering. Ontology is introduced through a weighting scheme that integrates traditional TF-IDF word weights with weights of semantic relations between words from the ontology. The goal is to produce document clusters that are semantically meaningful by accounting for relationships between words, rather than just word co-occurrence. An overview of the system architecture and modules is provided, along with descriptions of preprocessing, concept weighting, clustering approaches, and initial implementation results.
IRJET- An Efficient Way to Querying XML Database using Natural LanguageIRJET Journal
This document discusses an efficient way to query XML databases using natural language. It proposes a framework that can accept English language queries and translate them into XQuery or SQL expressions to retrieve data from an XML database. The system performs linguistic processing to map tokens in the natural language query to XQuery fragments, then executes the translated query against the database. Existing approaches are discussed that typically use semantic and syntactic analysis to represent the query logically before translation, but have limitations in handling ambiguity. The proposed system aims to improve query translation accuracy by leveraging token relationships and classifications determined from natural language parsing.
Sensing Trending Topics in Twitter for Greater Jakarta Area IJECEIAES
Information and communication technology grows so fast nowadays, especially related to the internet. Twitter is one of internet applications that produce a large amount of textual data called tweets. The tweets may represent real-world situation discussed in a community. Therefore, Twitter can be an important media for urban monitoring. The ability to monitor the situations may guide local government to respond quickly or make public policy. Topic detection is an important automatic tool to understand the tweets, for example, using non-negative matrix factorization. In this paper, we conducted a study to implement Twitter as a media for the urban monitoring in Jakarta and its surrounding areas called Greater Jakarta. Firstly, we analyze the accuracy of the detected topics in term of their interpretability level. Next, we visualize the trend of the topics to identify popular topics easily. Our simulations show that the topic detection methods can extract topics in a certain level of accuracy and draw the trends such that the topic monitoring can be conducted easily.
Quality Assurance. Quality Assurance Approach. White BoxKimberly Jones
The document discusses using the Unified Modeling Language (UML) to model database systems and computer applications. It describes how UML diagrams like use case diagrams, class diagrams, sequence diagrams, and deployment diagrams can be used at different stages of the software development process. The paper examines how these UML diagrams integrate with various programming methodologies and how they provide a standardized way to visually define and model the design and structure of software systems, including defining objects in an object-oriented design approach.
Question Classification using Semantic, Syntactic and Lexical featuresdannyijwest
This document summarizes research on improving question classification accuracy through the use of machine learning and a combination of semantic, syntactic, and lexical features. The researchers tested various classifiers like Naive Bayes, k-Nearest Neighbors, and Support Vector Machines on the UIUC question classification dataset. Their best results were achieved using a Support Vector Machine classifier trained on features including question headwords, hypernyms from WordNet, part-of-speech tags, and word shapes, achieving 96.2% accuracy for coarse-grained and 91.1% for fine-grained classification. This outperformed previous state-of-the-art results, demonstrating that combining semantic and syntactic features with lexical features improves automated question classification.
Similar to Building a vietnamese dialog mechanism for v dlg~tabl system (20)
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Building a vietnamese dialog mechanism for v dlg~tabl system
1. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
DOI : 10.5121/ijnlc.2014.3104 31
BUILDING A VIETNAMESE DIALOG MECHANISM
FOR V-DLG~TABL SYSTEM
An Hoai Vo and Dang Tuan Nguyen
Faculty of Computer Science, University of Information Technology, Vietnam National
University – Ho Chi Minh City
ABSTRACT
This paper introduces a Vietnamese automatic dialog mechanism which allows the V-DLG~TABL system to
automatically communicate with clients. This dialog mechanism is based on the Question – Answering
engine of V-DLG~TABL system, and composes the following supplementary mechanisms: 1) the mechanism
of choosing the suggested question; 2) the mechanism of managing the conversations and suggesting
information; 3) the mechanism of resolving the questions having anaphora; 4) the scenarios of dialog (in
this stage, there is only one simple scenario defined for the system).
KEYWORDS
Semantics, Automatic Dialog Mechanism, Question – Answering Engine, Vietnamese Language
Processing.
1. INTRODUCTION
In this paper, we focus on introducing the Vietnamese automatic dialog mechanism of our V-
DLG~TABL system. This automatic dialog mechanism allows the system to communicate
effectively with clients, helps them know the needed information about the tablets which they
expect to find.
The V-DLG~TABL system’s automatic dialog architecture composes the following
supplementary mechanisms: 1) the mechanism of choosing the suggested question; 2) the
mechanism of managing the conversations and suggesting information; 3) the mechanism for
resolving the questions having anaphora; 4) the scenarios of dialog (in this first time, we only
developed one simple scenario for the system). This automatic dialog mechanism is used to build
the component “V-DLG~TABL Dialog” of the system.
In addition, the automatic dialog mechanism of our system bases on an important engine of
answering Vietnamese questions about tablets. This Vietnamese language based Question –
Answering engine is built by applying the theoretic methods and logic programming techniques
of computational semantics proposed by Patrick Blackburn and Johan Bos [1], and by reusing and
referring to some basic implementations (including technical solutions and program source codes)
of Nguyễn Thành and Phạm Minh Tiến [2].
2. USING SIMPLE LAMBDA EXPRESSION FOR REPRESENTING THE
SEMANTICS OF VIETNAMESE QUESTIONS
To represent and analyze the semantics of Vietnamese questions, we apply the approaches,
methods and implementation techniques which are proposed in [1], [2]. According to [1], the
2. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
32
semantics of sentences, including questions, can be represented by lambda expressions (cf. [1],
[2]):
Following [1], each unidentified part of a lambda expression is symbolized by a variable
going after an operator λ, and the operator @ is used to assign an argument to a variable
in a lambda expression (cf. [1], [2]).
Following [1], the β-transformation uses operator @ to combine the lambda expressions
of syntactic elements with each others. We use the algorithm proposed in [1], which is
based on β transformation, to combine the semantic components of the Vietnamese
question (cf. [1], [2]).
Based on [1], we propose a semantic analysis algorithm for Vietnamese questions in V-
DLG~TABL system as follows:
ALGORITHM:
1) Step 1: Remove the supplementary syntactic elements in the syntactic tree of the
question. The principal syntactic elements are retained: n_tablet, pn_tablet_name,
n_component_<name>, n_property_<name>, and “literal”.
2) Step 2: Identify a lambda expression corresponding to each syntactic element which is
retained after removing the supplementary syntactic elements.
Based on the proposed methods in [1] and their applications in [2], the lambda
expressions corresponding to the retained syntactic elements are defined as follows (cf.
[1], [2]):
Syntactic element Lambda expression
pn_tablet_name λx.x@<tablet_name>
n_component_<name> λx.component_name(x)
n_property_<name> λx.λy.property_name(x, y)
literal λx.x@<literal>
3) Step 3: Apply β-transformation [1] to combine the lambda expressions of syntactic
elements with each other (cf. [1], [2]). The complete FOL (First-Order Logic)
expression can represent the semantics of Vietnamese question.
Example 1: “Máy tính bảng Nexus 7 có kích thước màn hình là bao nhiêu?”
(English: “How much Nexus 7 tablet screen size is”)
The system resolves this question in the following steps:
Step 1: The syntactic tree of the question in example 1 is presented in Figure 1.
3. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
33
Figure 1: The syntactic tree of the Vietnamese question in example 1
Step 2 (a): Remove the supplementary syntactic elements in the syntactic tree. The
syntactic tree becomes as in Figure 2.
Figure 2: The syntactic tree of the Vietnamese question in example 1 (after removing the
supplementary syntactic elements)
Step 2 (b): Determine the lambda expressions corresponding to each retained syntactic
element:
+ The lambda expression for the syntactic element n_property_size:
λx.λy.kích_thước(x, y).
+ The lambda expression for the syntactic element n_component_screen:
λx.màn_hình(x).
+ The lambda expression for the syntactic element pn_tablet: λx.x@Nexus_7.
4. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
34
Figure 3: The lambda expressions corresponding to retained syntactic elements of the Vietnamese
question in example 1
Step 3: The lambda expressions representing retained syntactic elements will be
combined by using β transformation, in the following steps:
- Combine ingredients lambda expressions to create a new lambda expression:
[λt.t@Nexus_7]@[λx.λy.kích_thước(x, y)]@[λz.màn_hình(z)]
- Apply β transformation for ingredients lambda expressions to create a completed lambda
expression:
[λt.t@Nexus_7]@([λx.λy.kích_thước(x, y)]@[λz.màn_hình(z)])
= [λx.x@Nexus_7]@[λy.kích_thước(λz.màn_hình(z), y)]
= [λx.x@Nexus_7]@[λz.λy.kích_thước(màn_hình(z) , y)]
Figure 4: Each step for applying β transformation to create a completed lambda expression of the
Vietnamese question in example 1
- Continue to apply β transformation for the other side of the lambda expression:
5. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
35
= [λz.λy.kích_thước(màn_hình(z), y)]@Nexus_7
= [λy.kích_thước(màn_hình(Nexus_7), y)]
= λy.kích_thước(màn_hình(Nexus_7), y)
Figure 5: The completed lambda expression of the Vietnamese question in example 1 (after
applying β transformation)
- Finally, the completed lambda expression of the question in example 1 is:
λy.kích_thước(màn_hình(Nexus_7), y). In this lambda expression, y is the unidentified
variable, and represents the value of the screen size asked in the question.
3. ANSWERING VIETNAMESE QUESTIONS ABOUT TABLETS
To answer Vietnamese questions in V-DLG~TABL system, we need to define the data to
describe the information about the tablets. Based on [1], we apply the methods and techniques of
Nguyễn Thành and Phạm Minh Tiến [2] to define descriptive facts and organize the modules in
Prolog.
When a Vietnamese input question is entered into the system, the component “V-DLG~TABL
Syntactic Parsing and Processing” will analyze it to return its syntactic tree. The retained
syntactic elements of the syntactic tree, which are determined by our algorithm mentioned above,
will correspond to the components of the “information structure model” proposed in [2]. These
retained syntactic elements are then processed by the component “V-DLG~TABL Semantic
Analyzing” to determine the lambda expression representing the semantics of the Vietnamese
question. To create a lambda expression for each Vietnamese question, we use the methods and
implementation techniques proposed in [1] (cf. [2]).
For example, the lambda expression of the question in example 1 is λy.kích_thước
(màn_hình(nexus_7), y). In this lambda expression, variable y indicates the unidentified
element in the question.
Basing on the methods and techniques of implementation proposed in [1], and referring to their
applications presented in [2], the system uses the β-transformation to reduce the lambda
expression of the Vietnamese question, and acquires the complete FOL expression; then, this
FOL expression is used to query the database of facts in Prolog and returns the found result.
Following the techniques proposed in [1] (these techniques have been applied in [2], [3], [4], [5]),
6. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
36
the querying mechanism is implemented by using the unification between the FOL expression of
the question and the facts of the system (cf. [1], [2]).
Example 2: Based on [1], [2] the lambda expression λy.kích_thước(màn_hình(nexus_7),
y) of the Vietnamese question in example 1 will be unified with the fact
kich_thuoc(man_hinh(nexus_7), ‘7 inch’)) in the database of facts. The unification result
of y is “7 inch”. The value of y is the screen size of the tablet nexus_7.
To create Vietnamese answer in V-DLG~TABL system, we apply the method which is developed
in [2] (the original idea of this approach was proposed in [7], [9]): based on the unidentified node
in the syntactic tree of the question, the system queries the database of facts to find the data, and
creates the answer.
Example 3: The system performs the following steps to resolve Vietnamese question in
example 1:
Step 1: The syntactic structure of the question is presented in Figure 6.
Figure 6: The syntactic tree of the Vietnamese question in example 1
Step 2: Query and take the result: “7 inch”.
Step 3: Find the value which answers the interrogative node in the syntactic tree: see
Figure 7.
7. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
37
Figure 7: The interrogative information is answered in the syntactic tree of the Vietnamese
question in example 1
Step 4: Create the answer based on combining the information components in the
syntactic tree, from left to right: “Máy tính bảng Nexus 7 có kích thước màn hình 7 inch”.
4. BUILDING DIALOG MECHANISM OF SYSTEM
The dialog mechanism of V-DLG~TABL system is built basing on the following mechanisms:
- The mechanism for choosing the suggested question.
- The mechanism for managing the dialog and suggesting the information.
- The mechanism for resolving the questions having anaphora.
- The sample scenarios.
4.1. The mechanism for choosing the suggested question
The mechanism of choosing the suggested question is based on some predefined samples of
suggested questions. Depending on the communication scenario between client and the system,
the system will create the appropriate suggested questions.
- Type 1: These are the suggested questions about other information of the tablet which the
client has just been interested in.
For example, as soon as the client asks the information about screen, size, and
weight of a tablet, the system suggests a question about its color.
- Type 2: These are the suggested questions about the tablets which similar to the one
which client has just been interested in.
For example, after the client asks about tablet Nexus 7 having screen size ‘7
inch’, the system can suggests some more different tablets having the same
screen size.
- Type 3: These are the suggested question about other information when the above cases
do not occur. These questions can be: “Bạn có muốn biết thêm thông tin nào không?”
(English: “Would you like to know any information more?”), or “Bạn có muốn biết thêm
8. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
38
thông tin về máy tính bảng nào khác không?” (English: “Would you like to know any
information more about another tablet?”), ...
Table 1 presents the samples of suggested questions which are defined in V-DLG~TABL system.
Table 1: The samples of suggested questions
Suggested question types Elementary questions
Type 1 Question 1.1: <component>
Question 1.2: <component> <tablet>
Question 1.3: <property>, <component>, <tablet>
Question 1.4: <component>, <tablet >
Type 2 Question 2.1: <tablet>
Question 2.2: <same_tablet>
Type 3 Question 3.1: <what_tablet>
Question 3.2: <what_component>
Question 3.3: <what_info>
Table 2 presents some examples for the suggested question samples.
Table 2: Some examples for the suggested question samples
Suggested question types Elementary questions
Type 1 Question 1.1: Bạn có muốn biết thông tin <thành phần> không?
Question 1.2: Bạn có muốn biết thành phần <thành phần> của
máy tính bảng trên không?
Question 1.3: Bạn cần biết thông tin <thuôc tính> của <thành
phần> máy tỉnh bảng <X> không?
Question 1.4: Bạn có biết thông tin về <thành phần> máy tỉnh
bảng <X> không?
Type 2 Question 2.1: Bạn có muốn biết thêm thông tin máy tính bảng
<X> hay không?
Question 2.2: Bạn có muốn biết thêm thông tin các máy tính bảng
cùng loại không?
Type 3 Question 3.1: Bạn có muốn biết thêm thông tin máy tính bảng
nào không?
Question 3.2: Bạn có muốn biết thêm thông tin các thành phần
nào khác không?
Question 3.3: Bạn có muốn biết thêm thông tin nào không?
4.2. Managing the conversations and suggested information
The mechanism of managing the conversations and suggested information is integrated into the
Question – Answering engine, and based on the following tasks:
- Management of time: operate based on managing downtime between each question and
answer, and between user and system.
- Auto generation of suggested question: after identifying the suggested question type, the
suggested question is created based on the suggested question samples.
9. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
39
- The mechanism for managing the dialog: This mechanism is integrated into the Question
– Answering engine, implemented some basis tasks:
+ Saving to communication information between the client and the system.
+ Managing time with the component “Manage time” and giving the suggestion
with the component “Auto-generate the suggested question”.
4.3. Revolving the questions having anaphora
In a series of Vietnamese questions and answers following in a dialog, usually there are the
existences of anaphora. According to [8] and [9], the anaphora can be omitted, or can refer to a
queried object in previous questions.
Based on [8] and [9], we propose a method for treating the anaphora problems existing in
Vietnamese questions about tablets.
4.3.1. Anaphora in questions about tablets/components
Example 4: Consider following dialog:
User: Nexus 7 sử dụng loại màn hình gì?
(English: What type of screen does Nexus 7 use?)
System: Nexus 7 sử dụng loại màn hình WXGA.
(English: Nexus 7 uses WXGA screen)
User: Máy tính bảng đó có kích thước màn hình bao nhiêu?
(English: What screen size of the tablet is?)
In the above example, “máy tính bảng đó” at the second question of the client refers to the same
object with the “Nexus 7” at the previous question.
4.3.2. Anaphora referring to queried tablets/components in previous questions
Example 5: Consider following dialog:
User: Máy tính bảng nào có chức năng đàm thoại?
(English: Which tablets have conversation function?)
System: Nexus 7 có chức năng đàm thoại.
(English: Nexus 7 has conversation function)
User: Máy tính bảng đó có kích thước màn hình bao nhiêu?
(English: What screen size of the tablet is?)
In the above example, the information about the tablet at the second question of the client can be
identified via the queried information in the first question: this anaphora can only be identified
after answering the previous question.
4.3.3. Elliptical anaphora in questions about tablets/components
Example 6: Consider following dialog:
User: Nexus 7 sử dụng loại màn hình gì?
(English: What type of screen does Nexus 7 use?)
10. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
40
System: Nexus 7 sử dụng loại màn hình WXGA.
(English: Nexus 7 uses WXGA screen)
User: Có kích thước màn hình bao nhiêu?
(English: What screen size is?)
In the above example, the second question of the client does not contain any information about
the tablet although this is mandatory information of the question. Therefore, this information is
omitted. The system can identify it basing on the previous question.
4.3.4. Resolution of anaphora in V-DLG~TABL system
The anaphora appearing in syntactic tree of question is processed by an algorithm which we
propose based on the information in Table 3.
Table 3: Anaphoric nodes in V-DLG~TABL system
Anaphora types Anaphoric nodes Descriptive words
Type 1 this_tablet, that_tablet “máy tính bảng/tablet” + “này / đó /
trên”Type 2
Type 3 na_tablet Representing the omitted information
in question
We propose an algorithm for resolving the anaphora existing in Vietnamese questions about
tablets:
Algorithm:
Input: The syntactic structure of Vietnamese question.
Step 1: Searching anaphoric nodes based on the listed cases in Table 3.
Step 2: For each anaphoric node, searching a node corresponds to the anaphoric node in the
syntactic structure of the question.
- Type 1: The corresponding node of an anaphoric node in the current question is the
node of noun phrase describing tablet or components in the previous question.
- Type 2: The corresponding node of an anaphoric node in the current question is a
queried node needing to be answered in the previous question.
- Type 3: The corresponding node of an omitted anaphoric node in the current question is
node of noun phrase describing tablet or components in the previous question.
Step 3: Replacing an anaphoric node with corresponding node in syntactic structure of the
question.
11. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
41
4.4. Building the scenario
To set up and build the dialog system, we build a scenario based on mechanism of choosing the
suggested question, mechanism of managing the dialog, and mechanisms of resolving anaphora.
Example 7: A dialog follows the scenario:
User: Nexus 7 sử dụng loại màn hình gì?
(English: What type of screen does Nexus 7 use?)
System: Nexus 7 sử dụng loại màn hình WXGA.
(English: Nexus 7 uses WXGA screen)
User: Có kích thước màn hình bao nhiêu?
(English: What screen size is?)
System: Có kích thước màn hình 7 inch
(English: the screen size is 7 inches)
…20s
System: Bạn có muốn biết thông tin bộ nhớ không?
(English: Would you like to know the information of its memory )
User: Có
(English: Yes)
System: Bộ nhớ trong là 115 MB.
(English: Memory is 115 MB)
…20s
System: Bạn có muốn biết thêm thông tin nào không?
(English: Would you like to know any other information more?)
User: Không
(English: No)
5. CONCLUSIONS
This paper presents the model of Vietnamese automatic dialog mechanism that we propose for V-
DLG~TABL system. This dialog mechanism bases on supplemental mechanisms: choosing the
suggested questions, managing the conversations and suggesting information, revolving the
anaphora in conversion, and Question – Answering engine. Among these supplemental
mechanisms, the engine of answering Vietnamese questions, which is built based on [1] and [2],
is an important element of our dialog mechanism.
We have implemented this dialog mechanism to build V-DLG~TABL system. We have tested the
system according to the defined scenario. Although at the current time, the system only operates
according to one simple dialog scenario, but this scenario is quite general to cover some simple
communication of the user with the system. However, we see that need to developer the other
scenarios for the system.
ACKNOWLEDGEMENTS
This research is funded by University of Information Technology, Vietnam National University –
Ho Chi Minh City (VNU-HCM) under grant number C2011CTTT-06.
12. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014
42
REFERENCES
[1] Patrick Blackburn, Johan Bos, Representation and Inference for Natural Language: A First
Course in Computational Semantics, September 3, 1999.
[2] Nguyễn Thành, Phạm Minh Tiến, "Xây dựng cơ chế hỏi đáp tiếng Việt cho hệ thống tìm kiếm sản
phẩm máy tính bảng", B.Sc. Thesis in Computer Science, University of Information Technology,
Vietnam National University – Ho Chi Minh City, 2012.
[3] Vương Đức Hiền, “Xây dựng công cụ truy vấn tiếng Việt về các phần mềm máy tính”, B.Sc.
Thesis in Computer Science, University of Information Technology, Vietnam National University
– Ho Chi Minh City, 2013.
[4] Phạm Thế Sơn, Hồ Quốc Thịnh, "Mô hình ngữ nghĩa cho câu trần thuật và câu hỏi tiếng Việt trong
hệ thống vấn đáp kiến thức lịch sử Việt Nam", B.Sc. Thesis in Computer Science, University of
Information Technology, Vietnam National University – Ho Chi Minh City, 2012.
[5] Lâm Thanh Cường, Huỳnh Ngọc Khuê, "Biểu diễn và xử lý ngữ nghĩa dựa trên FOL (First Order
Logic) cho các dạng câu đơn tiếng Việt trong hệ thống hỏi đáp kiến thức xã hội", B.Sc. Thesis in
Computer Science, University of Information Technology, Vietnam National University – Ho Chi
Minh City, 2012.
[6] Son The Pham and Dang Tuan Nguyen, "Processing Vietnamese News Titles to Answer Relative
Questions in VNEWSQA/ICT System", International Journal on Natural Language Computing
(IJNLC), Vol. 2, No. 6, December 2013, pp. 39-51. ISSN: 2278 - 1307 [Online]; 2319 - 4111
[Print].
[7] Dang Tuan Nguyen, An Hoai Vo, Phuc Tri Nguyen, “A Semantic Approach to Answer
Vietnamese Questions in OpenCourseWare Retrieval System”, Proceedings of the 2011
International Conference on Software Technology and Engineering (ICSTE 2011), August 12-13,
2011, Kuala Lumpur, Malaysia, pp. 314-320.
[8] Dang Tuan Nguyen, An Hoai Vo, Phuc Tri Nguyen, "Answering a Series of Vietnamese Questions
in Library Retrieval System", Proceedings of the 2011 2nd International Conference on Future
Information Technology (ICFIT 2011), September 16-18, 2011, Singapore, pp. 445-449. ISBN:
978-981-08-9916-5. IPCSIT, Vol. 13, 2011.
[9] Võ Hoài An, Nguyễn Trí Phúc, “Mô hình xử lý cú pháp và ngữ nghĩa cho các chuỗi câu hỏi tiếng
Việt trong hệ thống tìm kiếm thư viện”, B.Sc. Thesis in Computer Science, University of
Information Technology, Vietnam National University – Ho Chi Minh City, 2011.