This project aims to develop a system which converts a natural language statement into MySQL query to retrieve information from respective database. The system mainly focuses on creation of complex queries which includes nested queries with more than two-level depth, queries with aggregate functions, having clause, group by clause and co-related queries which are formed due to constraint on aggregate function. The natural language input statement taken from the user is passed through various OpenNLP natural language processing techniques like Tokenization, Parts of Speech Tagging, Stemming and Lemmatization to get the statement in the desired form. The statement is further processed to extract the type of query, the basic clause, which specifies the required entities from the database and the condition clause, which specifies constraints on the basic clause. The final query is generated by converting the basic and condition clauses to their query form and then concatenating the condition query to the basic query. Currently, the system works only with MySQL database.
Research Inventy : International Journal of Engineering and Scienceresearchinventy
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
Research Inventy : International Journal of Engineering and Scienceresearchinventy
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text
classification. In this paper, Fast Fuzzy Feature clustering for text classification is proposed. It
is based on the framework proposed by Jung-Yi Jiang, Ren-Jia Liou and Shie-Jue Lee in 2011.
The word in the feature vector of the document is grouped into the cluster in less iteration. The
numbers of iterations required to obtain cluster centers are reduced by transforming clusters
center dimension from n-dimension to 2-dimension. Principle Component Analysis with slit
change is used for dimension reduction. Experimental results show that, this method improve
the performance by significantly reducing the number of iterations required to obtain the cluster
center. The same is being verified with three benchmark datasets
Clustering of Deep WebPages: A Comparative Studyijcsit
The internethas massive amount of information. This information is stored in the form of zillions of
webpages. The information that can be retrieved by search engines is huge, and this information constitutes
the ‘surface web’.But the remaining information, which is not indexed by search engines – the ‘deep web’,
is much bigger in size than the ‘surface web’, and remains unexploited yet.
Several machine learning techniques have been commonly employed to access deep web content. Under
machine learning, topic models provide a simple way to analyze large volumes of unlabeled text. A ‘topic’is
a cluster of words that frequently occur together and topic models can connect words with similar
meanings and distinguish between words with multiple meanings. In this paper, we cluster deep web
databases employing several methods, and then perform a comparative study. In the first method, we apply
Latent Semantic Analysis (LSA) over the dataset. In the second method, we use a generative probabilistic
model called Latent Dirichlet Allocation(LDA) for modeling content representative of deep web
databases.Both these techniques are implemented after preprocessing the set of web pages to extract page
contents and form contents.Further, we propose another version of Latent Dirichlet Allocation (LDA) to the
dataset. Experimental results show that the proposed method outperforms the existing clustering methods.
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESijnlc
Relation extraction is one of the most important parts of natural language processing. It is the process of extracting relationships from a text. Extracted relationships actually occur between two or more entities of a certain type and these relations may have different patterns. The goal of the paper is to find out the noisy patterns for relation extraction of Bangla sentences. For the research work, seed tuples were needed containing two entities and the relation between them. We can get seed tuples from Freebase. Freebase is a large collaborative knowledge base and database of general, structured information for public use. But for Bangla language, there is no available Freebase. So we made Bangla Freebase which was the real challenge and it can be used for any other NLP based works. Then we tried to find out the noisy patterns for relation extraction by measuring conflict score.
Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...dannyijwest
Due to the explosion of information/knowledge on the web and wide use of search engines for desired
information,the role of knowledge management(KM) is becoming more significant in an organization.
Knowledge Management in an Organization is used to create ,capture, store, share, retrieve and manage
information efficiently. The semantic web, an intelligent and meaningful web, tend to provide a promising
platform for knowledge management systems and vice versa, since they have the potential to give each
other the real substance for machine-understandable web resources which in turn will lead to an
intelligent, meaningful and efficient information retrieval on web. Today,the challenge for web community
is to integrate the distributed heterogeneous resources on web with an objective of an intelligent web
environment focusing on data semantics and user requirements. Semantic Annotation(SA) is being widely
used which is about assigning to the entities in the text and links to their semantic descriptions. Various
tools like KIM, Amaya etc may be used for semantic Annotation.
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...IJwest
The increasing interest in developing efficient and effective optimization techniques has conducted researchers to turn their attention towards biology. It has been noticed that biology offers many clues for designing novel optimization techniques, these approaches exhibit self-organizing capabilities and permit the reachability of promising solutions without the existence of a central coordinator. In this paper we handle the problem of dynamic web service composition, by using the clonal selection algorithm. In order to assess the optimality rate of a given composition, we use the QOS attributes of the services involved in the workflow as well as, the semantic similarity between these components. The experimental evaluation shows that the proposed approach has a better performance in comparison with other approaches such as the genetic algorithm.
An Ontology Model for Knowledge Representation over User ProfilesIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASESIJCSEIT Journal
Keyword search in relational databases allows user to search information without knowing database
schema and using structural query language (SQL). In this paper, we address the problem of generating
and evaluating candidate networks. In candidate network generation, the overhead is caused by raising the
number of joining tuples for the size of minimal candidate network. To reduce overhead, we propose
candidate network generation algorithms to generate a minimum number of joining tuples according to the
maximum number of tuple set. We first generate a set of joining tuples, candidate networks (CNs). It is
difficult to obtain an optimal query processing plan during generating a number of joins. We also develop a
dynamic CN evaluation algorithm (D_CNEval) to generate connected tuple trees (CTTs) by reducing the
size of intermediate joining results. The performance evaluation of the proposed algorithms is conducted
on IMDB and DBLP datasets and also compared with existing algorithms.
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Abstract Generally, computer system is handled by the English language only. But the person who is unaware of the English language and structure of query language cannot handle the system. This paper proposed a new approach for accessing the database easily without knowing English. So, the database is accessed with the help of natural languages such as Hindi, Marathi etc. Natural language processing (NLP) is the study of mathematical and computational modeling of various aspects of language and the development of a wide range of systems. Natural Language Processing holds great promise for making computer interfaces that are easier to use for people, since people will be able to talk to the computer in their own language, rather than learn a specialized language of computer commands. Keywords: NLP, Mathematical modeling, Computational modeling
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text
classification. In this paper, Fast Fuzzy Feature clustering for text classification is proposed. It
is based on the framework proposed by Jung-Yi Jiang, Ren-Jia Liou and Shie-Jue Lee in 2011.
The word in the feature vector of the document is grouped into the cluster in less iteration. The
numbers of iterations required to obtain cluster centers are reduced by transforming clusters
center dimension from n-dimension to 2-dimension. Principle Component Analysis with slit
change is used for dimension reduction. Experimental results show that, this method improve
the performance by significantly reducing the number of iterations required to obtain the cluster
center. The same is being verified with three benchmark datasets
Clustering of Deep WebPages: A Comparative Studyijcsit
The internethas massive amount of information. This information is stored in the form of zillions of
webpages. The information that can be retrieved by search engines is huge, and this information constitutes
the ‘surface web’.But the remaining information, which is not indexed by search engines – the ‘deep web’,
is much bigger in size than the ‘surface web’, and remains unexploited yet.
Several machine learning techniques have been commonly employed to access deep web content. Under
machine learning, topic models provide a simple way to analyze large volumes of unlabeled text. A ‘topic’is
a cluster of words that frequently occur together and topic models can connect words with similar
meanings and distinguish between words with multiple meanings. In this paper, we cluster deep web
databases employing several methods, and then perform a comparative study. In the first method, we apply
Latent Semantic Analysis (LSA) over the dataset. In the second method, we use a generative probabilistic
model called Latent Dirichlet Allocation(LDA) for modeling content representative of deep web
databases.Both these techniques are implemented after preprocessing the set of web pages to extract page
contents and form contents.Further, we propose another version of Latent Dirichlet Allocation (LDA) to the
dataset. Experimental results show that the proposed method outperforms the existing clustering methods.
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESijnlc
Relation extraction is one of the most important parts of natural language processing. It is the process of extracting relationships from a text. Extracted relationships actually occur between two or more entities of a certain type and these relations may have different patterns. The goal of the paper is to find out the noisy patterns for relation extraction of Bangla sentences. For the research work, seed tuples were needed containing two entities and the relation between them. We can get seed tuples from Freebase. Freebase is a large collaborative knowledge base and database of general, structured information for public use. But for Bangla language, there is no available Freebase. So we made Bangla Freebase which was the real challenge and it can be used for any other NLP based works. Then we tried to find out the noisy patterns for relation extraction by measuring conflict score.
Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...dannyijwest
Due to the explosion of information/knowledge on the web and wide use of search engines for desired
information,the role of knowledge management(KM) is becoming more significant in an organization.
Knowledge Management in an Organization is used to create ,capture, store, share, retrieve and manage
information efficiently. The semantic web, an intelligent and meaningful web, tend to provide a promising
platform for knowledge management systems and vice versa, since they have the potential to give each
other the real substance for machine-understandable web resources which in turn will lead to an
intelligent, meaningful and efficient information retrieval on web. Today,the challenge for web community
is to integrate the distributed heterogeneous resources on web with an objective of an intelligent web
environment focusing on data semantics and user requirements. Semantic Annotation(SA) is being widely
used which is about assigning to the entities in the text and links to their semantic descriptions. Various
tools like KIM, Amaya etc may be used for semantic Annotation.
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...IJwest
The increasing interest in developing efficient and effective optimization techniques has conducted researchers to turn their attention towards biology. It has been noticed that biology offers many clues for designing novel optimization techniques, these approaches exhibit self-organizing capabilities and permit the reachability of promising solutions without the existence of a central coordinator. In this paper we handle the problem of dynamic web service composition, by using the clonal selection algorithm. In order to assess the optimality rate of a given composition, we use the QOS attributes of the services involved in the workflow as well as, the semantic similarity between these components. The experimental evaluation shows that the proposed approach has a better performance in comparison with other approaches such as the genetic algorithm.
An Ontology Model for Knowledge Representation over User ProfilesIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASESIJCSEIT Journal
Keyword search in relational databases allows user to search information without knowing database
schema and using structural query language (SQL). In this paper, we address the problem of generating
and evaluating candidate networks. In candidate network generation, the overhead is caused by raising the
number of joining tuples for the size of minimal candidate network. To reduce overhead, we propose
candidate network generation algorithms to generate a minimum number of joining tuples according to the
maximum number of tuple set. We first generate a set of joining tuples, candidate networks (CNs). It is
difficult to obtain an optimal query processing plan during generating a number of joins. We also develop a
dynamic CN evaluation algorithm (D_CNEval) to generate connected tuple trees (CTTs) by reducing the
size of intermediate joining results. The performance evaluation of the proposed algorithms is conducted
on IMDB and DBLP datasets and also compared with existing algorithms.
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Abstract Generally, computer system is handled by the English language only. But the person who is unaware of the English language and structure of query language cannot handle the system. This paper proposed a new approach for accessing the database easily without knowing English. So, the database is accessed with the help of natural languages such as Hindi, Marathi etc. Natural language processing (NLP) is the study of mathematical and computational modeling of various aspects of language and the development of a wide range of systems. Natural Language Processing holds great promise for making computer interfaces that are easier to use for people, since people will be able to talk to the computer in their own language, rather than learn a specialized language of computer commands. Keywords: NLP, Mathematical modeling, Computational modeling
Pattern based approach for Natural Language Interface to DatabaseIJERA Editor
Natural Language Interface to Database (NLIDB) is an interesting and widely applicable research field. As the name suggests an NLIDB allows a naive user to ask query to database in natural language. This paper presents an NLIDB namely Pattern based Natural Language Interface to Database (PBNLIDB) in which patterns for simple query, aggregate function, relational operator, short-circuit logical operator and join are defined. The patterns are categorized into valid and invalid. Valid patterns are directly used to translate natural language query into Structured Query Language (SQL) query whereas an invalid pattern assists the query authoring service in generating options for user so that the query could be framed correctly. The system takes an English language query as input, recognizes pattern in the query, selects one of the before mentioned features of SQL based on the pattern, prepares an SQL statement, fires it on database and displays the result.
The paper presents a model for developing intelligent query processing in Malayalam. For this the
investigator has selected a domain as time enquiry system in Malayalam language. This work discusses
issues involved in Natural Language Processing. NLQPS is a restricted domain system, deals with the
natural Language Queries on time enquiry for different modes of transportation. The system performs a
shallow syntactic and semantic analysis of the input query. After the knowledge level understanding of the
query, the system triggers a reasoning process to determine the type of query and the result slots that are
required. The investigator tries to extract the hidden intelligent behind a Natural Language Query
submitted by a user.
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
Natural Language Processing (NLP) techniques are one of the most used techniques in the field of computer applications. It has become one of the vast and advanced techniques. Language is the means of communication or interaction among humans and in present scenario when everything is dependent on machine or everything is computerized, communication between computer and human has become a necessity. To fulfill this necessity NLP has been emerged as the means of interaction which narrows the gap between machines (computers) and humans. It was evolved from the study of linguistics which was passed through the Turing test to check the similarity between data but it was limited to small set of data. Later on various algorithms were developed along with the concept of AI (Artificial Intelligence) for the successful execution of NLP. In this paper, the main emphasis is on the different techniques of NLP which have been developed till now, their applications and the comparison of all those techniques on different parameters.
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
Abstract—The Information retrieval system is taking an important role in current search engine which performs searching operation based on keywords which results in an enormous amount of data available to the user, from which user cannot figure out the essential and most important information. This limitation may be overcome by a new web architecture known as the semantic web which overcome the limitation of the keyword based search technique called the conceptual or the semantic search technique. Natural language processing technique is mostly implemented in a QA system for asking user’s questions and several steps are also followed for conversion of questions to the query form for retrieving an exact answer. In conceptual search, search engine interprets the meaning of the user’s query and the relation among the concepts that document contains with respect to a particular domain that produces specific answers instead of showing lists of answers. In this paper, we proposed the ontology based semantic information retrieval system and the Jena semantic web framework in which, the user enters an input query which is parsed by Standford Parser then the triplet extraction algorithm is used. For all input queries, the SPARQL query is formed and further, it is fired on the knowledge base (Ontology) which finds appropriate RDF triples in knowledge base and retrieve the relevant information using the Jena framework.
Novel Database-Centric Framework for Incremental Information Extractionijsrd.com
Information extraction (IE) has been an active research area that seeks techniques to uncover information from a large collection of text. IE is the task of automatically extracting structured information from unstructured and/or semi structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in document processing like automatic annotation and content extraction could be seen as information extraction. Many applications call for methods to enable automatic extraction of structured information from unstructured natural language text. Due to the inherent challenges of natural language processing, most of the existing methods for information extraction from text tend to be domain specific. In this project a new paradigm for information extraction. In this extraction framework, intermediate output of each text processing component is stored so that only the improved component has to be deployed to the entire corpus. Extraction is then performed on both the previously processed data from the unchanged components as well as the updated data generated by the improved component. Performing such kind of incremental extraction can result in a tremendous reduction of processing time and there is a mechanism to generate extraction queries from both labeled and unlabeled data. Query generation is critical so that casual users can specify their information needs without learning the query language.
2. an efficient approach for web query preprocessing edit satIAESIJEECS
The emergence of the Web technology generated a massive amount of raw data by enabling Internet users to post their opinions, comments, and reviews on the web. To extract useful information from this raw data can be a very challenging task. Search engines play a critical role in these circumstances. User queries are becoming main issues for the search engines. Therefore a preprocessing operation is essential. In this paper, we present a framework for natural language preprocessing for efficient data retrieval and some of the required processing for effective retrieval such as elongated word handling, stop word removal, stemming, etc. This manuscript starts by building a manually annotated dataset and then takes the reader through the detailed steps of process. Experiments are conducted for special stages of this process to examine the accuracy of the system.
Architecture of an ontology based domain-specific natural language question a...IJwest
Question answering (QA) system aims at retrieving precise information from a large collection of
documents against a query. This paper describes the architecture of a Natural Language Question
Answering (NLQA) system for a specifi
c domain based on the ontological information, a step towards
semantic web question answering. The proposed architecture defines four basic modules suitable for
enhancing current QA capabilities with the ability of processing complex questions. The first m
odule was
the question processing, which analyses and classifies the question and also reformulates the user query.
The second module allows the process of retrieving the relevant documents. The next module processes the
retrieved documents, and the last m
odule performs the extraction and generation of a response. Natural
language processing techniques are used for processing the question and documents and also for answer
extraction. Ontology and domain knowledge are used for reformulating queries and ident
ifying the
relations. The aim of the system is to generate short and specific answer to the question that is asked in the
natural language in a specific domain. We have achieved 94 % accuracy of natural language question
answering in our implementation
Class Diagram Extraction from Textual Requirements Using NLP Techniquesiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
QUrdPro: Query processing system for Urdu LanguageIJERA Editor
The tremendous increase in the multilingual data on the internet has increased the demand for efficient retrieval of information. Urdu is one of the widely spoken and written languages of south Asia. Due to unstructured format of Urdu language information retrieval of information is a big challenge. Question Answering systems aims to retrieve point-to-point answers rather than flooding with documents. It is needed when the user gets an in depth knowledge in a particular domain. When user needs some information, it must give the relevant answer. The question-answer retrieval of ontology knowledge base provides a convenient way to obtain knowledge for use, but the natural language need to be mapped to the query statement of ontology. This paper describes a query processing system QUrdPro based on ontology. This system is a combination of NLP and Ontology. It makes use of ontology in several phases for efficient query processing. Our focus is on the knowledge derived from the concepts used in the ontology and the relationship between these concepts. In this paper we describe the architecture of QUrdPro ,query processing system for Urdu and process model for the system is also discussed in detail.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Automobile Management System Project Report.pdfKamal Acharya
The proposed project is developed to manage the automobile in the automobile dealer company. The main module in this project is login, automobile management, customer management, sales, complaints and reports. The first module is the login. The automobile showroom owner should login to the project for usage. The username and password are verified and if it is correct, next form opens. If the username and password are not correct, it shows the error message.
When a customer search for a automobile, if the automobile is available, they will be taken to a page that shows the details of the automobile including automobile name, automobile ID, quantity, price etc. “Automobile Management System” is useful for maintaining automobiles, customers effectively and hence helps for establishing good relation between customer and automobile organization. It contains various customized modules for effectively maintaining automobiles and stock information accurately and safely.
When the automobile is sold to the customer, stock will be reduced automatically. When a new purchase is made, stock will be increased automatically. While selecting automobiles for sale, the proposed software will automatically check for total number of available stock of that particular item, if the total stock of that particular item is less than 5, software will notify the user to purchase the particular item.
Also when the user tries to sale items which are not in stock, the system will prompt the user that the stock is not enough. Customers of this system can search for a automobile; can purchase a automobile easily by selecting fast. On the other hand the stock of automobiles can be maintained perfectly by the automobile shop manager overcoming the drawbacks of existing system.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENT
1. International Journal on Natural Language Computing (IJNLC) Vol.7, No.3, June 2018
DOI: 10.5121/ijnlc.2018.7301 1
AUTOMATED SQL QUERY GENERATOR BY
UNDERSTANDING A NATURAL LANGUAGE
STATEMENT
Amit Pagrut, Ishant Pakmode, Shambhoo Kariya, Vibhavari Kamble and Yashodhara Haribhakta
Department of Computer Engineering and Information Technology College of Engineering Pune
Wellesley Road, Shivajinagar, Pune, Maharashtra, India
ABSTRACT
This project aims to develop a system which converts a natural language statement into MySQL query to
retrieve information from respective database. The system mainly focuses on creation of complex queries
which includes nested queries with more than two-level depth, queries with aggregate functions, having
clause, group by clause and co-related queries which are formed due to constraint on aggregate function.
The natural language input statement taken from the user is passed through various OpenNLP natural
language processing techniques like Tokenization, Parts of Speech Tagging, Stemming and Lemmatization
to get the statement in the desired form. The statement is further processed to extract the type of query, the
basic clause, which specifies the required entities from the database and the condition clause, which
specifies constraints on the basic clause. The final query is generated by converting the basic and condition
clauses to their query form and then concatenating the condition query to the basic query. Currently, the
system works only with MySQL database.
1. INTRODUCTION
Almost all applications in today‘s world make use of collected data to fulfill the intended require-
ments and to enhance their functionalities. The main objective remains the efficient storage and
fast retrieval of this data. Databases provide a better provision and are the most suitable solu-
tion which addresses these objectives. The relational databases provide a structured way to store
the huge collection of data and provide real-time accessibility. Relational database management
system is representation of domain entities and their respective attributes in the tabular form.
Many organizations and social networking sites make use of relational databases for storage and
analysis. In this modern world, everyone aims to be more dynamic in terms of information gath-
ering, retrieval and sharing using existing systems of database storage, but are not enough well
versed with the underlying technology. Users need to learn the underlying database language and
database properties to generate queries. Natural language is used in day-to-day life, and if in-
formation sharing can be made easy with the use of natural language, it will reduce the cost of
learning and understanding the technology used forit.
NLIDB(Natural Language Interface to Database Systems) is the provision under which the
natural language is used to interact with databases. There are many existing NLIDB systems
which allows user to work with databases using their own languages, although the research on
these systems started a few decades ago, there is still no perfect system that fulfills all the
objectives of NLIDB.
This system is a type of NLIDB system which concentrates on formulating complex queries,
along with simple queries. In this project, we take the user natural language query input in
English language through the graphical user interface. In Java, OpenNLP library provides various
natural language processing modules. The input is then processed through various Natural
language Processing processes like tokenizing the sentence by the space delimiter, mapping each
token with its Parts Of Speech(POS) tag using POS Tagger, lemmatizing each token to convert it
into its basic form, and these lemmas and tags are stored. The converted statement is now broken
2. International Journal on Natural Language Computing (IJNLC) Vol.7, No.3, June 2018
2
down into two parts : Basic clause and Condition clause. The Basic clause identifies the
attributes used to form the query and the Condition clause identifies the constraints on these
attributes and are mapped to their respective attribute. A Table Linking Algorithm(TLA) is
incorporated to find the links between different tables of the query. The basic query and
condition query is then joined together to form the final resultant query.
2. RELATED WORK
[1] Lunar Involved a system that answered questions about Rock Samples brought back from the
moon. Two databases were used, the chemical analyses and literature references.
Lifer/Ladder, it was designed as a natural language interface to database of information about
US Navy ships.
It used a semantic grammar to parse Questions and query a distributed database.EasyAsk2, Also
known as English wizard, is a commercial application that offers both keyword and natural lan-
guage search over relational databases.EQ, Stands for English query. It was implemented by Mi-
crosoft. It creates a model, collecting database objects (tables, fields, joins) and semantic objects
(Entities).
(Siasar et al., May 2008) [2] gave an insight on how the machine understands natural language.
They proposed an expert system taking into account the concepts of syntactic and semantic
knowl- edge. They also suggested a selection algorithm to select most appropriate query from the
sug- gested possible query.
Mrs. Neelu Nihalani , Dr. Sanjay Silakari , Dr. Mahesh Motwani, March 2011[is an introduction
to Intelligent Database System and Natural Language Interface to databases. It gives a brief
overview of NLIDB subcomponents and also explains NLIDB architectures and various
approaches for the development of NLIDBsystems.
Akshay G. Satav, Archana B. Ausekar, Radhika M. Bihani, Mr Abid Shaikh, March 2014 [3] pro-
vides result to users for any type of query accurately and efficiently, even if any user make
spelling mistake the system will autocorrect the spelling and experimental result will be shown.
They used a query mapping algorithm where query of any form in English language mapped
according to syntax of SQL query that provides user accurate data from the database after
execution of the query.
Alessandra Giordani and Alessandro Moschitti [4]used linguistic dependencies and metadata to
build sets of possible SELECT and WHERE clauses and then exploit again the metadata to build
FROM clause enriched with meaningful joins. Finally, combining all the clauses they got the
set of all possible SQL queries. The algorithm can be recursively applied to deal with complex
questions, requiring nested SELECTinstructions.
(Garima Singh, Arun Solanki, September 2016) [5]combined Artificial Intelligence (AI) and Lin-
guistics to develop programs helped to understand and produce information in a natural language.
The NLIDBS are built for to optimize the search results and produce information with more accu-
racy. This paper extends the existing work further by processing more complex queries along
with ambiguity removal.
K. Javubar Sathick and A. Jaya, 2014 [6]provides an user friendly interface between end user
and the database for easy access of social web data from different web sources such as facebook,
twitter and linkedIn etc. by deriving a query translator which take natural language statement as
input.R-tool is used to collect the data from social web sources.
Abhijeet Gupta and Rajeev Sangal [7] In this paper, they introduced an aggregation processing
framework, which can handle different types of aggregation operations in a natural language
query, including direct quantitative as well as indirect qualitative aggregations, and those which
combine quantifiers or relational operators with aggregations.
3. International Journal on Natural Language Computing (IJNLC) Vol.7, No.3, June 2018
3
(Hu Li and Yong Shi, 2010) [8]This paper presents the framework based on ontology techniques
to implement a portable NLIDBs that makes it easier to migrate from one domain to another.
(Ashish Palakurthi, Ruthu S. M, Arjun R. Akula and Radhika Mamidi, September 2015) [9] uses
a statistical classifier trained on manually prepared data. They report their results on three
different domains also shows how the system can be used for generating a complete SQL query.
They also used Conditional Random Fields (CRF) for classifying the explicit attributes in a
natural language query to different SQL clauses.
(Abhijeet R. Sontakke, Amit Pimpalkar, July 2015) [10] have developed the rule based system
which accepts Hindi language as query and gives output in Hindi languageonly.
(Nandan Sukthankar, Pranay Deshmukh 2016) [11]The research focuses on incorporating com-
plex queries along with simple queries irrespective of the database.The system accommodates
aggregate functions, multiple conditions in WHERE clause, advanced clauses like
ORDER BY, GROUP BY and HAVING. The system handles single sentence natural language
inputs, which are with respect to selecteddatabase.
3. OUR APPROACH
3.1. SYSTEM ARCHITECTURALLAYOUT
Figure 1.0 gives the simple architecture of the system. It consists of tokenization,
lemmatization of tokens to convert them into their basic forms called lemmas, parts of speech
tagging of each lemma, using OpenNLP APIs. The input natural language statement goes through
these above mentioned steps and forms parsed data. Each keyword in the parsed data is processed
through semantic analysis and relation mapping which makes use of data dictionaries (contains
database schema information like synonyms of SQL clause words, aggregation words, database
attributes and table names). The synonyms are found using WordNet data dictionary in NLTK.
Figure 1. System Architecture
4. International Journal on Natural Language Computing (IJNLC) Vol.7, No.3, June 2018
4
The processed data is then used to generate final MySQL query by using semantic conceptual rules and
clause templates(query templates e.g. SELECT_FROM_WHERE etc.)
3.2 MORPHOLOGICAL AND LEXICAL ANALYSIS
3.2.1 Tokenization
Tokenization refers to the breakdown of the natural language sentence into tokens with ‗white
space‘ as delimiter. This step is required to convert each token into its root form in the next step.
3.2.2 Lemmatization
Lemmatization is the process of converting a word into its root form. The tokens from the earlier
step are converted into their respective root form e.g (‘employees‘ to ‘employee‘, ‘saw‘ to ‘see‘)
to extract their actual meaning so as to incorporate the processing algorithms on them. These
converted tokens are calledlemmas.
3.2.3 PARTS OF SPEECH (POS) TAGGING
The lemmas are mapped to their Parts Of Speech tags so as to derive their semantic meaning for
further processing using the Parts of Speech tags. For example, the Parts of Speech tags are used
to find the nouns and verbs in the input sentence, and further help to find the tables and
attributes in the query, considering the fact that table and attribute names are nouns and verbs.
Ex.1. Find the names of employees with salary greater than 50000. Here,
Lemmas Parts of Speech tags
show VBZ
the DT
name NN
of IN
employee NN
with IN
salary NN
great JJ
than IN
50000 CD
3.3CONDITION CLAUSE EXTRACTION
3.3.1 Determination of attribute values
The attribute values are derived from the user query with the help of the Parts of Speech tags
provided by lexical analysis stage. The Parts of Speech tags are checked for tags like ―NNP‖
for string values, as attribute values in the database are usually proper nouns and ―CD‖ for
integer values. Ex.2. Show the names of employees whose salaries are greater than 50000 and
department is Computer Science. Here, 50000 is CD and ‘Computer Science‘ isNNP.
5. International Journal on Natural Language Computing (IJNLC) Vol.7, No.3, June 2018
5
3.3.2 Mapping relational operators to attributevalues
The attribute values are related to their respective attribute through the relational operators like
equal to, is greater than, etc. These operators define the condition or the constraint on the
attribute, and would contribute in making the query more descriptive. The operator symbol for
each operator is stored in advance. The mapping is stored as => RELATIONAL OPERATOR —
OPERATOR SYMBOL — ATTRIBUTE VALUE. In Ex.2, 50000 is mapped to ‘>‘ and
‘Computer Science‘ is mapped to ‘=‘.
3.3.3 Mapping of attribute values to the respective attributes
The above extracted attribute values are linked to their respective attributes using the attribute
mapping algorithm, which searches for the most probable attribute to be mapped to the value. The
attribute names are replaced by their original names in the database. This mapping is stored as =>
ATTRIBUTE TABLE - ATTRIBUTE NAME - OPERATOR SYMBOL - ATTRIBUTE
VALUE.
In Ex.2, the mapping is => {salary, salary, >, 50000}, {department, depart name, =, ‘Computer
Science‘}
3.3.4 Check for aggregation function on theattribute
The query is processed to check if the mapped attributes have an aggregation function applied on
them, by extracting the synonyms of the aggregation words(MINIMUM, MAXIMUM, COUNT,
AVERAGE, SUM) and then mapping the actual aggregation function word with the attribute.
3.3.5 Removing the condition clause part from theQuery
The condition clause part is removed from the query to obtain the basic clause required to process
the remaining part of the SQL query. In Ex.2, the remaining part of the query is => show the
name of employee.
3.4 BASIC CLAUSE EXTRACTION
The basic clause will determine the ―SELECT‖ part of the SQL query. The select clause of the
SQL query contains the names of the attributes, which are to be extracted from the user query.
After the extraction and removal of the condition clause part from the user query, the basic clause
part is evaluated and the required entities are mapped and stored. The execution of the basic
query part will be done using of the followingalgorithms:
3.4.1 Table Linking Algorithm(TLA)
In TLA, the tables present in the database schema are represented as a data structure entity and
has followingproperties:
tableName
allAttributes
foreignKeys
primaryKey
The main focus of the algorithm is to find a path between two referenced tables, the source table
and the destination table.
The algorithm proceeds in the following fashion:
• First, the data structure references of the source and destination tables are retrieved. All the
foreign keys present in the source table are obtained. Now, for each foreign key, the key‘s
6. International Journal on Natural Language Computing (IJNLC) Vol.7, No.3, June 2018
6
referencing table is linked to the source table, with the use of the foreign key reference
and the fact that the attribute would be present in both tables, this path is appended to the
available paths. The referencing table now becomes the source table and now, source and
destination tables are fed to the algorithm to find a path between them, making the algorithm
recursive. If the referencing table at some point matches the destination table, the path till
now plus the path appended between the current source and destination table becomes the
final path.
• If the source table doesn‘t contain any foreign key attributes, the source and destination
tables are swapped, and then they are passed again to the algorithm, but with an indicator
that the tables have been swapped. If the indicator indicates that the tables are swapped, the
path between those tables is reversed to obtain the correct path and then appended to the
availablepaths.
• If the source and destination tables don‘t contain any foreign key attributes, a common
table linking both the source and destination table is found out, by considering the fact that
the common table would contain at least one common attribute of both tables. Now the
path between source table and common table is first appended and the path between the
common table and the destination table is appended later and this path is appended to the
available paths. Out of the found common tables, only one has the actual semantic of both
the joining tables and only the query generated through this common table is correct. Ex.3,
Find employees in Human Resource department, While finding path between employee and
department tables, depart emp and depart manager are tables which has foreign keys of
both. So, two paths will be generated by system, but by semantic only path with depart emp
is correct.
3.4.2 NATURAL JOIN Algorithm
NATURAL JOIN is used in the FROM clause when the SELECT clause contains attributes which are
to be extracted from different tables. Ex.3 Show salary and names of employee. Here, attribute
‘salary‘ is from ‘salary‘ table and ‘name‘ is from ‘employee‘ table. The tables to be joined using
NATURAL JOIN are tables of the attributes present in the SELECT clause as well as the tables
of the attributes present in the WHERE clause. The WHERE clause tables are retrieved from the
Condition Clause Extraction stage, whereas the tables of the SELECT clause are found out and
the link between these tables is to be identified using the Table Linking Algorithm, in the
following manner:
• The tables of the SELECT clause are linked with each other by passing them one by one
in the Table Linking Algorithm and finding their intermediate tables in the path joining the
tables. The tables obtained are stored in a set.
• Now, each table from the SELECT clause is to be linked with each table of the WHERE
clause, in the similar above fashion, and the tables identified are appended to the set.
• Now, the resultant tables in the set are used to form the FROM clause of the SQL query,
using the ‗NATURAL JOIN‘ keyword betweenthem.
3.5 THE AGGREGATION FUNCTIONEXTRACTION
The aggregated functions provide the constraints on the result set of the SQL query. The aggre-
gate functions are handled by an RDBMS independent layer called Aggregation Extraction Layer
(AEL). The steps in AEL are:
• The synonyms of the aggregated words are extracted from the user query and are mapped
with their actual aggregationfunction.
• The query is then processed to relate the aggregate words with their respective attributes and
stored.
7. International Journal on Natural Language Computing (IJNLC) Vol.7, No.3, June 2018
7
• The query is further processed to find if there is any constraint on the aggregation function.
If present, the constraint is filtered and mapped to the corresponding aggregate function.
• The final mapping is stored as => [Constraint on Aggregation, Aggregation function, At-
tribute name].
• The GROUP BY clause is then formulated by taking the attributes other than the aggregate
attribute from the SELECT clause (ifany).
3.6. CORRELATED QUERY EXTRACTION
Correlated query is formed whenever there is a constraint on the aggregation functions like MAX
or MIN. The correlated query is formed by deriving the constraint value mapped to the attribute
and formulating the SQL query in the below givenform:
SELECT T1.attribute FROM table AS T1 where (N - 1) = (SELECT COUNT (DISTINCT
T2.attribute) FROM table AS T2 WHERE T1.attribute <operator> T2.attribute)
Where, N => constraint value of the aggregation function
<operator> => depends on the type of aggregation function (Minimum or Maximum)
3.7 LIMIT AND ORDER BYCLAUSES
The LIMIT clause is formed when there is a constraint on the number of records to be shown in
the result set of the SQL query. The query is processed to find LIMIT value to which the LIMIT
clause word is mapped and stored.
The ORDER BY clause is formed when there is need of sorting the records of the result set in
ascending or descending order according to one or more attributes. The ORDER BY clause is
formulated by extracting the synonyms of the clause words and then mapping them to the actual
ORDER BY clause words like ASC or DESC, for ascending and descending respectively.
4. RESULTS
We have tested our system on a synthesized corpus of natural language statements related to em-
ployee database. Employee Database contains 6 tables. The system has tested with around 50
queries with single sentence natural language inputs. The accuracy comes out be 82%. Fallowing
are the example of inputs given to thesystem
4.1 SYSTEM OUTPUT
4.1.1 Simple Query
Query: Find the salary of employee whose id is 4
8. International Journal on Natural Language Computing (IJNLC) Vol.7, No.3, June 2018
8
Figure 2. Flow of execution
Figure 2 shows the implementation steps by which the natural language statement is converted into Final
MySQL query.
In this query, ‘salary‘ and ‘employee‘ tables are linked together using the Table Linking
Algorithm. Here, the WHERE clause is formed by using the integer value ‘4‘.
Output: SELECT salary.salary FROM salary WHERE emp _id IN (SELECT emp id FROM
employee WHERE employee.emp_id = 4 )
Query: Show the department manager with department name Techonology
In this query, ‘depart manager‘ and ‘department‘ tables are joined together using Table Linking
Algorithm. Here, the WHERE clause is formed using literal value ‘TECHNOLOGY‘.
Output: SELECT * FROM depart manager WHERE dep id IN
(SELECT dep_id FROM department WHERE department.dep_name = ‘TECHONOLOGY‘)
Query: Show all employees
Output: SELECT * FROM employee
4.1.2 Queries with same semantics
Query 1: Show the salaries of Human Resource department employees
Query 2: Determine the salaries of employees whose department is Human Resource Query 3:
Choose the salaries of Human Resource department
Query 4: Find the salaries of employees if they are in Human Re- source department
The system is able to handle the queries with same semantics up to some level. The above queries
have same semantics. Therefore, they form the same result query. One out of the two queries
9. International Journal on Natural Language Computing (IJNLC) Vol.7, No.3, June 2018
9
given by the system is correct due to the fact given in theTable Linking Algorithm.
Output 1: SELECT salary.salary FROM salary WHERE emp_id IN (SELECT emp id FROM
employee WHERE emp _id IN (SELECT emp_id FROM depart_manager WHERE dep_id IN
(SELECT dep_id FROM department WHERE department.dep_name = ‘HUMAN
RESOURCE‘)))
Output 2: SELECT salary.salary FROM salary WHERE emp_id IN (SELECT emp _id FROM
employee WHERE emp_ id IN (SE LECT emp_id FROM depart_emp WHERE dep_ id IN
(SELECT dep_id FROM department WHERE department.dep_name = ‘HUMAN
RESOURCE‘)))
4.1.3 Queries with advanced conditionclause
Query: Show the id of employee, birth date of employees with department name computer
science and names are Amit and Alex
Output 1: SELECT employee.emp_id, employee.birth_date FROM employee NATURAL JOIN
department NATURAL JOIN salary NATURAL JOIN depart_manager WHERE
department.dep_name = ‘COMPUTER SCIENCE‘ AND employee.name = ‘AMIT‘ AND
employee.name = ‘ALEX
Output 2: SELECT employee.emp_id, employee.birth_date FROM employee NATURAL JOIN
department NATURAL JOIN salary NATURAL JOIN depart_emp WHERE
department.dep_name
= ‘COMPUTER SCIENCE‘ AND employee.name = ‘AMIT‘ AND employee.name = ‘ALEX‘
4.1.4 Queries with aggregationfunctions
Query 1: show the name of employee whose salary is maximum
In this query, the aggregation function ‘MAX‘ is mapped to ‘salary‘. The aggregation function is
in the Condition clause elements and therefore it is used to form the WHERE clause.
Output: SELECT employee.name, salary.salary FROM employee NATURAL JOIN salary
WHERE salary.salary IN (SELECT MAX(salary.salary) FROM salary)
Query 2: Show the maximum salary and employee id, employee name, title of employee whose id
is 4
In this query, aggregation function ‘MAX‘ is mapped to ‘salary‘. But, the aggregation function is
in the Basic clause elements and therefore it is used to form the SELECT clause.
Output: SELECT MAX (salary.salary), employee.emp_id, employee.name, title.title FROM
salary NATURAL JOIN employee NATURAL JOIN title GROUP BY employee.emp_id,
employee.name, title.title HAV- ING employee.emp_id = 4
4.1.5 Queries with BETWEEN and LIKEclause
Query Determine the birth date of employees having salary between 40000 and 50000
Output: SELECT employee.birth date FROM employee WHERE emp id in (SELECT emp id
FROM salary WHERE salary.salary BETWEEN 40000 AND 50000)
Query 2: Show the title of employee where employee name starts with Sham
Output: SELECT title.title FROM title WHERE emp_id IN (SELECT emp_id FROM employee
WHERE employee.name LIKE ‘SHAM‘
10. International Journal on Natural Language Computing (IJNLC) Vol.7, No.3, June 2018
10
4.1.6 Correlated Queries
Query: Find the 4th maximum salary
Here, aggregation function ‘MAX‘ will be mapped to ‘salary‘ but aggregation has constraint on it
and the query is formed according to the format given in Correlated queryExtraction.
Output: SELECT T1.salary FROM salary as T1 WHERE 3 = ( SELECT COUNT(DISTINCT
T2.salary) FROM salary as T2 WHERE T1.salary > T2.salary)
4.1.7 Queries with LIMIT and ORDER BYclause
Query: show the first 5 names of employee in ascending order of salary
Output: SELECT employee.name FROM employee NATURAL JOIN salary ORDER BY
salary.salary ASC LIMIT 5
5. LIMITATIONS
• Queries which have another query embedded in it. e.g Find the department manager of
employees whose employee id is greater than Alex employee id. Here we are unable to find
correct condition clause elements as it is another query.
• Query ‖ Who is Alex‘s department manager‖, here Alex will be mapped to department
manager but query semantic demands to find department manager of employee Alex
• Queries with qualitative quantifiers can not be processed in the system e.g. Find the
youngest employee. Here, ‘youngest‘ is the qualitative quantifier and the system is not able
to map it with birth date of employee.
6. CONCLUSION AND FUTURE SCOPE
The user of the system will be able to give a natural language input statement and formulate the
following types of MySQL queries:
• Generate nested queries with more than two- leveldepth.
• Formulate aggregate queries along with HAVING and GROUP BY clause.
• Generate correlated queries having constraints on aggregation function.
• Generate queries with NATURALJOIN.
• Generate queries with LIMIT and ORDER BYclauses.
• Generate queries with BETWEEN, LIKE and RANGEclauses.
The system can be further developed by incorporating the following future works, which are yet
to be implemented in the present system. The development on the points mentioned in future
work is in progress.
• The system can be developed to handle qualitative quantifiers in the user query.
• A recursive algorithm can be designed to handle advanced nested queries which have an
independent SQL query in the conditionclause.
• Machine Learning algorithms can be incorporated to determine the most efficient query
amongst all the possible SQL queries for a user query.
• There can be a situation where the user query has more than one se- mantic. An algorithm
can be designed to formulate a SQL query with respect to each derivedsemantic.
• The system uses only MySQL database. It can be broadly developed to accept other
database or unstructured database systems.
11. International Journal on Natural Language Computing (IJNLC) Vol.7, No.3, June 2018
11
REFERENCES
[1] William Woods, Ronald Kaplan, and Bonnie Webber, (01 1972) ―The lunar science natural
language information system: Final report‖.F Siasar djahantighi, Mohammad Norouzifard, Seyed
Hashem Davarpanah, and M. H. Shenassa, ―Using natural language processing in order to create sql
queries‖, (2008).
[2] Mr Abid Shaikh Akshay Satav Archana B. Ausekar, Radhika M. Bihani, ―A proposed natural
language query processing system‖, (2014).
[3] Giordani Alessandra and Moschitti Alessandro, (2012) ―Generating sql queries using natural
language syntactic dependencies and metadata‖, in NLDB.
[4] Arun Solanki Garima Singh, ―An algorithm to transform natural language into sql queries
for relational databases‖, (2016).
[5] Javubar Sathick K. and Jaya A., (2015) ―Natural language to sql generation for semantic knowledge
extraction in social web sources‖, Indian Journal of Science and Technology, Vol. 8, No. 1.
[6] Abhijeet Gupta and Rajeev Sangal, (December 2013) ―A novel approach to aggregation pro- cessing
in natural language interfaces to databases‖, in Proceedings of the 2013 International Conference on
Natural Language Processing, CDAC Noida, India, International Institute of Information
TechnologyHyderabad.
[7] Li Hu and Shi Yong, (Feb 2010) ―A wordnet-based natural language interface to relational
databases‖, in 2010 The 2nd International Conference on Computer and Automation Engi- neering
(ICCAE), Vol. 1, pp. 514–518.
[8] Ashish Palakurthi, Ruthu S. M., Arjun R. Akula, and Radhika Mamidi, ―Classification of attributes
in a natural language query into different sql clauses‖,(2015).
[9] Amit Pimpalkar Abhijeet R. Sontakke, ―A rule based graphical user interface to relational database
using nlp‖.
[10] Nandan Sukthankar, Sanket Maharnawar, Pranay Deshmukh, Yashodhara Haribhakta, and Vibhavari
Kamble, ―nquery - a natural language statement to sql query generator‖, (2017).