Information extraction (IE) has been an active research area that seeks techniques to uncover information from a large collection of text. IE is the task of automatically extracting structured information from unstructured and/or semi structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in document processing like automatic annotation and content extraction could be seen as information extraction. Many applications call for methods to enable automatic extraction of structured information from unstructured natural language text. Due to the inherent challenges of natural language processing, most of the existing methods for information extraction from text tend to be domain specific. In this project a new paradigm for information extraction. In this extraction framework, intermediate output of each text processing component is stored so that only the improved component has to be deployed to the entire corpus. Extraction is then performed on both the previously processed data from the unchanged components as well as the updated data generated by the improved component. Performing such kind of incremental extraction can result in a tremendous reduction of processing time and there is a mechanism to generate extraction queries from both labeled and unlabeled data. Query generation is critical so that casual users can specify their information needs without learning the query language.
Expression of Query in XML object-oriented databaseEditor IJCATR
Upon invent of object-oriented database, the concept of behavior in database was propounded. Before, relational database only provided a logical modeling of data and paid no attention to the operations applied on data in the system. In this paper, a method is presented for query of object-oriented database. This method has appropriate results when the user explains restrictions in a combinational matter (disjunctive and conjunctive) and assumes a weight for each one of restrictions based on their importance. Later, the obtained results are sorted based on their belonging rate to the response set. In continue, queries are explained using XML labels. The purpose is simplifying queries and objects resulted from queries to be very close to the user need and meet his expectation.
The enormous amount of information stored in unstructured texts cannot simply be used for further
processing by computers, which typically handle text as simple sequences of character strings. Therefore, specific
(pre-) processing methods and algorithms are required in order to extract useful patterns. Text Mining is the
discovery of valuable, yet hidden, information from the text document. Text classification (Also called Text
Categorization) is one of the important research issues in the field of text mining. It is necessary to
classify/categorize large texts (documents) into specific classes. Text Classification assigns a text document to one of a
set of predefined classes. This paper covers different text classification techniques and also includes Classifier
Architecture and Text Classification Applications.
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...IJNSA Journal
In health research, one of the major tasks is to retrieve, and analyze heterogeneous databases containing
one single patient’s information gathered from a large volume of data over a long period of time. The
main objective of this paper is to represent our ontology-based information retrieval approach for
clinical Information System. We have performed a Case Study in the real life hospital settings. The results
obtained illustrate the feasibility of the proposed approach which significantly improved the information
retrieval process on a large volume of data over a long period of time from August 2011 until January
2012
An Efficient Approach for Keyword Selection ; Improving Accessibility of Web ...dannyijwest
General search engines often provide low precise results even for detailed queries. So there is a vital need
to elicit useful information like keywords for search engines to provide acceptable results for user’s search
queries. Although many methods have been proposed to show how to extract keywords automatically, all
attempt to get a better recall, precision and other criteria which describe how the method has done its job
as an author. This paper presents a new automatic keyword extraction method which improves accessibility
of web content by search engines. The proposed method defines some coefficients determining features
efficiency and tries to optimize them by using a genetic algorithm. Furthermore, it evaluates candidate
keywords by a function that utilizes the result of search engines. When comparing to the other methods,
experiments demonstrate that by using the proposed method, a higher score is achieved from search
engines without losing noticeable recall or precision.
The classical or traditional information system provides answer after a user submits a complete query. It is even
noticed that presently, almost all the relational database systems rely on the query which has syntax and semantics
defined completely to access data. But often it is the case that we are willing to use vague terms in our query. The main
objective of database management system is to provide an environment that is both convenient and efficient for people
to use in storing and retrieving information. A recent trend of supporting auto complete is a first step to cope up with
this problem. We can have design of both classical and fuzzy database and can use effectively fuzzy queries on these
databases. Fuzzy databases are developed to manipulate the incomplete, unclear and vague data such as low, fast, very
high, about etc. The primary focus of fuzzy logic is on the natural language. This Paper provides the users the flexibility
or freedom to query database using natural language. Here this paper implements “interactive fuzzy search”. This
framework for interactive fuzzy search permits the user to explore the data as they type even in the presence of some
minor errors. This paper applies fuzzy queries on relational database so that it is possible to have the precise result as
well as the output for the uncertain terms we generally use based on some membership function
A template based algorithm for automatic summarization and dialogue managemen...eSAT Journals
Abstract This paper describes an automated approach for extracting significant and useful events from unstructured text. The goal of research is to come out with a methodology which helps in extracting important events such as dates, places, and subjects of interest. It would be also convenient if the methodology helps in presenting the users with a shorter version of the text which contain all non-trivial information. We also discuss implementation of algorithms which exactly does this task, developed by us. Key Words: Cosine Similarity, Information, Natural Language, Summarization, Text Mining
This paper proposes Natural language based Discourse Analysis method used for extracting
information from the news article of different domain. The Discourse analysis used the Rhetorical Structure
theory which is used to find coherent group of text which are most prominent for extracting information
from text. RST theory used the Nucleus- Satellite concept for finding most prominent text from the text
document. After Discourse analysis the text analysis has been done for extracting domain related object
and relates this object. For extracting the information knowledge based system has been used which
consist of domain dictionary .The domain dictionary has a bag of words for domain. The system is
evaluated according gold-of-art analysis and human decision for extracted information.
Expression of Query in XML object-oriented databaseEditor IJCATR
Upon invent of object-oriented database, the concept of behavior in database was propounded. Before, relational database only provided a logical modeling of data and paid no attention to the operations applied on data in the system. In this paper, a method is presented for query of object-oriented database. This method has appropriate results when the user explains restrictions in a combinational matter (disjunctive and conjunctive) and assumes a weight for each one of restrictions based on their importance. Later, the obtained results are sorted based on their belonging rate to the response set. In continue, queries are explained using XML labels. The purpose is simplifying queries and objects resulted from queries to be very close to the user need and meet his expectation.
The enormous amount of information stored in unstructured texts cannot simply be used for further
processing by computers, which typically handle text as simple sequences of character strings. Therefore, specific
(pre-) processing methods and algorithms are required in order to extract useful patterns. Text Mining is the
discovery of valuable, yet hidden, information from the text document. Text classification (Also called Text
Categorization) is one of the important research issues in the field of text mining. It is necessary to
classify/categorize large texts (documents) into specific classes. Text Classification assigns a text document to one of a
set of predefined classes. This paper covers different text classification techniques and also includes Classifier
Architecture and Text Classification Applications.
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...IJNSA Journal
In health research, one of the major tasks is to retrieve, and analyze heterogeneous databases containing
one single patient’s information gathered from a large volume of data over a long period of time. The
main objective of this paper is to represent our ontology-based information retrieval approach for
clinical Information System. We have performed a Case Study in the real life hospital settings. The results
obtained illustrate the feasibility of the proposed approach which significantly improved the information
retrieval process on a large volume of data over a long period of time from August 2011 until January
2012
An Efficient Approach for Keyword Selection ; Improving Accessibility of Web ...dannyijwest
General search engines often provide low precise results even for detailed queries. So there is a vital need
to elicit useful information like keywords for search engines to provide acceptable results for user’s search
queries. Although many methods have been proposed to show how to extract keywords automatically, all
attempt to get a better recall, precision and other criteria which describe how the method has done its job
as an author. This paper presents a new automatic keyword extraction method which improves accessibility
of web content by search engines. The proposed method defines some coefficients determining features
efficiency and tries to optimize them by using a genetic algorithm. Furthermore, it evaluates candidate
keywords by a function that utilizes the result of search engines. When comparing to the other methods,
experiments demonstrate that by using the proposed method, a higher score is achieved from search
engines without losing noticeable recall or precision.
The classical or traditional information system provides answer after a user submits a complete query. It is even
noticed that presently, almost all the relational database systems rely on the query which has syntax and semantics
defined completely to access data. But often it is the case that we are willing to use vague terms in our query. The main
objective of database management system is to provide an environment that is both convenient and efficient for people
to use in storing and retrieving information. A recent trend of supporting auto complete is a first step to cope up with
this problem. We can have design of both classical and fuzzy database and can use effectively fuzzy queries on these
databases. Fuzzy databases are developed to manipulate the incomplete, unclear and vague data such as low, fast, very
high, about etc. The primary focus of fuzzy logic is on the natural language. This Paper provides the users the flexibility
or freedom to query database using natural language. Here this paper implements “interactive fuzzy search”. This
framework for interactive fuzzy search permits the user to explore the data as they type even in the presence of some
minor errors. This paper applies fuzzy queries on relational database so that it is possible to have the precise result as
well as the output for the uncertain terms we generally use based on some membership function
A template based algorithm for automatic summarization and dialogue managemen...eSAT Journals
Abstract This paper describes an automated approach for extracting significant and useful events from unstructured text. The goal of research is to come out with a methodology which helps in extracting important events such as dates, places, and subjects of interest. It would be also convenient if the methodology helps in presenting the users with a shorter version of the text which contain all non-trivial information. We also discuss implementation of algorithms which exactly does this task, developed by us. Key Words: Cosine Similarity, Information, Natural Language, Summarization, Text Mining
This paper proposes Natural language based Discourse Analysis method used for extracting
information from the news article of different domain. The Discourse analysis used the Rhetorical Structure
theory which is used to find coherent group of text which are most prominent for extracting information
from text. RST theory used the Nucleus- Satellite concept for finding most prominent text from the text
document. After Discourse analysis the text analysis has been done for extracting domain related object
and relates this object. For extracting the information knowledge based system has been used which
consist of domain dictionary .The domain dictionary has a bag of words for domain. The system is
evaluated according gold-of-art analysis and human decision for extracted information.
Architecture of an ontology based domain-specific natural language question a...IJwest
Question answering (QA) system aims at retrieving precise information from a large collection of
documents against a query. This paper describes the architecture of a Natural Language Question
Answering (NLQA) system for a specifi
c domain based on the ontological information, a step towards
semantic web question answering. The proposed architecture defines four basic modules suitable for
enhancing current QA capabilities with the ability of processing complex questions. The first m
odule was
the question processing, which analyses and classifies the question and also reformulates the user query.
The second module allows the process of retrieving the relevant documents. The next module processes the
retrieved documents, and the last m
odule performs the extraction and generation of a response. Natural
language processing techniques are used for processing the question and documents and also for answer
extraction. Ontology and domain knowledge are used for reformulating queries and ident
ifying the
relations. The aim of the system is to generate short and specific answer to the question that is asked in the
natural language in a specific domain. We have achieved 94 % accuracy of natural language question
answering in our implementation
Towards a Query Rewriting Algorithm Over Proteomics XML ResourcesCSCJournals
Querying and sharing Web proteomics data is not an easy task. Given that, several data sources can be used to answer the same sub-goals in the Global query, it is obvious that we can have many candidates rewritings. The user-query is formulated using Concepts and Properties related to Proteomics research (Domain Ontology). Semantic mappings describe the contents of underlying sources. In this paper, we propose a characterization of query rewriting problem using semantic mappings as an associated hypergraph. Hence, the generation of candidates’ rewritings can be formulated as the discovery of minimal Transversals of an hypergraph. We exploit and adapt algorithms available in Hypergraph Theory to find all candidates rewritings from a query answering problem. Then, in future work, some relevant criteria could help to determine optimal and qualitative rewritings, according to user needs, and sources performances.
A Semantic Retrieval System for Extracting Relationships from Biological Corpusijcsit
The World Wide Web holds a large size of different information. Sometimes while searching the World Wide Web, users always do not gain the type of information they expect. In the subject of information extraction, extracting semantic relationships between terms from documents become a challenge. This
paper proposes a system helps in retrieving documents based on the query expansion and tackles the extracting of semantic relationships from biological documents. This system retrieved documents that are relevant to the input terms then it extracts the existence of a relationship. In this system, we use Boolean
model and the pattern recognition which helps in determining the relevant documents and determining the place of the relationship in the biological document. The system constructs a term-relation table that accelerates the relation extracting part. The proposed method offers another usage of the system so the
researchers can use it to figure out the relationship between two biological terms through the available information in the biological documents. Also for the retrieved documents, the system measures the percentage of the precision and recall.
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
Abstract—The Information retrieval system is taking an important role in current search engine which performs searching operation based on keywords which results in an enormous amount of data available to the user, from which user cannot figure out the essential and most important information. This limitation may be overcome by a new web architecture known as the semantic web which overcome the limitation of the keyword based search technique called the conceptual or the semantic search technique. Natural language processing technique is mostly implemented in a QA system for asking user’s questions and several steps are also followed for conversion of questions to the query form for retrieving an exact answer. In conceptual search, search engine interprets the meaning of the user’s query and the relation among the concepts that document contains with respect to a particular domain that produces specific answers instead of showing lists of answers. In this paper, we proposed the ontology based semantic information retrieval system and the Jena semantic web framework in which, the user enters an input query which is parsed by Standford Parser then the triplet extraction algorithm is used. For all input queries, the SPARQL query is formed and further, it is fired on the knowledge base (Ontology) which finds appropriate RDF triples in knowledge base and retrieve the relevant information using the Jena framework.
The Statement of Conjunctive and Disjunctive Queries in Object Oriented Datab...Editor IJCATR
Entrance of object orienting concept in database caused the relation database gradually to replace with object oriented
database in various fields. On the other hand for solving the problem of real world uncertain data, several methods were presented.
One of these methods for modeling database is an approach wich couples object-oriented database modeling with fuzzy logic. Many
queries that users to pose are expressed on the basis of linguistic variables. Because of classical databases are not able to support these
variables, leads to fuzzy approaches are considered. We investigate databases queries in this study both simple and complex ways. In
the complex way, we use conjunctive and disjunctive queries. In the following, we use the XML labels to express inqueries into fuzzy.
We can also communicate with other sections of software by entering into XML world as the most reliable opportunity. Also we want
to correct conjunctive and disjunctive queries related to fuzzy object oriented database using the concept of dependency measure and
weight, and weight be assigned to different phrases of a query based on user emphasis. The other aim of this research is mapping fuzzy
queries to fuzzy-XML. It is expected to be simple implement of query, and output of execution of queries be greatly closer to users'
needs and fulfill her expect. The results show that the proposed method explains the possible conjunctive and disjunctive queries the
database in the form of Fuzzy-XML.
Text preprocessing is a vital stage in text classification (TC) particularly and text mining generally. Text preprocessing tools is to reduce multiple forms of the word to one form. In addition, text preprocessing techniques are provided a lot of significance and widely studied in machine learning. The basic phase in text classification involves preprocessing features, extracting relevant features against the features in a database. However, they have a great impact on reducing the time requirement and speed resources needed. The effect of the preprocessing tools on English text classification is an area of research. This paper provides an evaluation study of several preprocessing tools for English text classification. The study includes using the raw text, the tokenization, the stop words, and the stemmed. Two different methods chi-square and TF-IDF with cosine similarity score for feature extraction are used based on BBC English dataset. The Experimental results show that the text preprocessing effect on the feature extraction methods that enhances the performance of English text classification especially for small threshold values.
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
From last few years online information is growing tremendously on World Wide Web or on user’s desktops and thus online information gains much more attention in the field of automatic text summarization. Text mining has become a significant research field as it produces valuable data from unstructured and large amount of texts. Summarization systems provide the possibility of searching the important keywords of the texts and so the consumer will expend less time on reading the whole document. Main objective of summarization system is to generate a new form which expresses the key meaning of the contained text. This paper study on various existing techniques with needs of novel Multi-Document summarization schemes. This paper is motivated by arising need to provide high quality summary in very short period of time. In proposed system, user can quickly and easily access correctly-developed summaries which expresses the key meaning of the contained text. The primary focus of this paper lies with thef_β-optimal merge function, a function recently presented here, that uses the weighted harmonic mean to discover a harmony in the middle of precision and recall. Proposed system utilizes Bisect K-means clustering to improve the time and Neural Networks to improve the accuracy of summary generated by NEWSUM algorithm.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Survey of Machine Learning Techniques in Textual Document ClassificationIOSR Journals
Classification of Text Document points towards associating one or more predefined categories based
on the likelihood expressed by the training set of labeled documents. Many machine learning algorithms plays
an important role in training the system with predefined categories. The importance of Machine learning
approach has felt because of which the study has been taken up for text document classification based on the
statistical event models available. The aim of this paper is to present the important techniques and
methodologies that are employed for text documents classification, at the same time making awareness of some
of the interesting challenges that remain to be solved, focused mainly on text representation and machine
learning techniques.
I gave this talk in the EDBT 2014 conference, which tool place in Athens, Greece.
I show how data examples can be used to characterize the behavior of scientific modules. I present a new methods that automatically generate the data examples, and show that such data examples are useful for the human user to understand the task of the modules, and that they can be used to assist curators in repairing broken workflows (i.e., workflows for which one or more modules are no longer supplied by their providers)
Experimental Result Analysis of Text Categorization using Clustering and Clas...ijtsrd
In a world that routinely produces more textual data. It is very critical task to managing that textual data. There are many text analysis methods are available to managing and visualizing that data, but many techniques may give less accuracy because of the ambiguity of natural language. To provide the ne grained analysis, in this paper introduce e cient machine learning algorithms for categorize text data. To improve the accuracy, in proposed system I introduced Natural language toolkit NLTK python library to perform natural language processing. The main aim of proposed system is to generalize the model for real time text categorization applications by using e cient text classi cation as well as clustering machine learning algorithms and nd the efficient and accurate model for input dataset using performance measure concept. Patil Kiran Sanajy | Prof. Kurhade N. V. ""Experimental Result Analysis of Text Categorization using Clustering and Classification Algorithms"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd25077.pdf
Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/25077/experimental-result-analysis-of-text-categorization-using-clustering-and-classification-algorithms/patil-kiran-sanajy
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
Improved Video Transmission over MANETs using MDSR with MDCijsrd.com
MANET does not have any fixed infrastructure, so the mobile nodes are free to move within a network which results in dynamic change of network topology. The Real-time video transport has rigid bandwidth, delay, and a loss requirement to support this application in Mobile Ad-Hoc Networks is a challenge. MANET consists of mobile nodes that cause frequent link failures. This link failure causes two main problems. First, when a route break occurs, all packets that have already transmitted on that route are dropped which degrades the video quality and it decreases the average packet delivery ratio (PDR). Second, the transmission of data traffic is halted for the time till a new route is discovered and it increasing the average end-to-end delay. For that we have proposed Node-Disjoint Multipath Routing Based on DSR Protocol with Multiple Description Coding Technique (MDC). Node-Disjoint Path means there is no common node between two paths and MDC encode a media source into two or more sub-bit streams. The sub-streams, also called descriptions. The experiment has been done using NS2 simulator with Evalvid for evaluating the video quality. Our proposed scheme will improve Packet Delivery Ratio, Throughput and Decreased Average End-to-End Delay.
A Survey of Source Authentication Schemes for Multicast transfer in Adhoc Net...ijsrd.com
An adhoc network is a collection of autonomous nodes with dynamically changing infrastructure. Multicast is a good mechanism for group communication. It can be used in the group oriented applications like video/audio conference, interactive group games, video on demand etc. The security problems obstruct the large deployment of the multicast communication model. Multicast data origin authentication is the main component in the security architecture. The authentication schemes should scalable and efficient against packet loss. In this article we discuss varies authentication scheme for multicast data origin with their advantage and disadvantage
Microcontroller Based Sign Language Gloveijsrd.com
The people who are speech impaired and paralyzed patients those have difficulty in communication. So that patients cannot speak and hear properly and they have problem in communication to other people who don't understand sign languages. So at that time electronic hand glove is used for communication and for that one hand is used for making position of different fingers using flex sensors. The objective of my project is to develop a electronic device for the people who suffer from speech impairment and paralyzed patients. In this, Flex sensor glove is used and Indian sign language's alphabets make using different position of fingers and thumb and their output are shown in the LCD.
Fingerprinting Based Indoor Positioning System using RSSI Bluetoothijsrd.com
Positioning is basis for providing location information to mobile users, however, with the growth of wireless and mobile communications technologies. Mobile phones are equipped with several radio frequency technologies for driving the positioning information like GSM, Wi-Fi or Bluetooth etc. In this way, the objective of this thesis was to implement an indoor positioning system relying on Bluetooth Received Signal Strength (RSS) technology and it integrates into the Global Positioning Module (GPM) to provide precise information inside the building. In this project, we propose indoor positioning system based on RSS fingerprint and footprint architecture that smart phone users can get their position through the assistance collections of Bluetooth signals, confining RSSs by directions, and filtering burst noises that can overcome the server signal fluctuation problem inside the building. Meanwhile, this scheme can raise more accuracy in finding the position inside the building.
A Review on Various Most Common Symmetric Encryptions Algorithmsijsrd.com
Security is the most challenging aspects in the internet and network application. Internet and networks applications are growing very fast, so the importance and the value of the exchanged data over the internet or other media types are increasing. Information security has been very important issue in data communication. Any loss or threat to information can prove to be great loss to the organization. Encryption technique plays a main role in information security system. This paper gives a comparison of various encryption algorithms and then finds best available one algorithm for the network security.
Extracting and Reducing the Semantic Information Content of Web Documents to ...ijsrd.com
Ranking and optimization of web service compositions represent challenging areas of research with significant implication for realization of the "Web of Services" vision. The semantic web, where the semantics information is indicated using machine-process able language such as the Web Ontology Language (OWL) "Semantic web service" use formal semantic description of web service functionality and enable automated reasoning over web service compositions. These semantic web services can then be automatically discovered, composed into more complex services, and executed. Automating web service composition through the use of semantic technologies calculating the semantic similarities between outputs and inputs of connected constituent services, and aggregate these values into a measure of semantics quality for the composition. It propose a novel and extensible model balancing the new dimensions of semantic quality ( as a functional quality metric) with QoS metric, and using them together as a ranking and optimization criteria. It also demonstrates the utility of Genetic Algorithms to allow optimization within the context of a large number of services foreseen by the "Web of Service" vision. To reduce the semantics of the web documents then to support semantic document retrieval by using Network Ontology Language (NOL) and to improve QoS as a ranking and optimization.
Architecture of an ontology based domain-specific natural language question a...IJwest
Question answering (QA) system aims at retrieving precise information from a large collection of
documents against a query. This paper describes the architecture of a Natural Language Question
Answering (NLQA) system for a specifi
c domain based on the ontological information, a step towards
semantic web question answering. The proposed architecture defines four basic modules suitable for
enhancing current QA capabilities with the ability of processing complex questions. The first m
odule was
the question processing, which analyses and classifies the question and also reformulates the user query.
The second module allows the process of retrieving the relevant documents. The next module processes the
retrieved documents, and the last m
odule performs the extraction and generation of a response. Natural
language processing techniques are used for processing the question and documents and also for answer
extraction. Ontology and domain knowledge are used for reformulating queries and ident
ifying the
relations. The aim of the system is to generate short and specific answer to the question that is asked in the
natural language in a specific domain. We have achieved 94 % accuracy of natural language question
answering in our implementation
Towards a Query Rewriting Algorithm Over Proteomics XML ResourcesCSCJournals
Querying and sharing Web proteomics data is not an easy task. Given that, several data sources can be used to answer the same sub-goals in the Global query, it is obvious that we can have many candidates rewritings. The user-query is formulated using Concepts and Properties related to Proteomics research (Domain Ontology). Semantic mappings describe the contents of underlying sources. In this paper, we propose a characterization of query rewriting problem using semantic mappings as an associated hypergraph. Hence, the generation of candidates’ rewritings can be formulated as the discovery of minimal Transversals of an hypergraph. We exploit and adapt algorithms available in Hypergraph Theory to find all candidates rewritings from a query answering problem. Then, in future work, some relevant criteria could help to determine optimal and qualitative rewritings, according to user needs, and sources performances.
A Semantic Retrieval System for Extracting Relationships from Biological Corpusijcsit
The World Wide Web holds a large size of different information. Sometimes while searching the World Wide Web, users always do not gain the type of information they expect. In the subject of information extraction, extracting semantic relationships between terms from documents become a challenge. This
paper proposes a system helps in retrieving documents based on the query expansion and tackles the extracting of semantic relationships from biological documents. This system retrieved documents that are relevant to the input terms then it extracts the existence of a relationship. In this system, we use Boolean
model and the pattern recognition which helps in determining the relevant documents and determining the place of the relationship in the biological document. The system constructs a term-relation table that accelerates the relation extracting part. The proposed method offers another usage of the system so the
researchers can use it to figure out the relationship between two biological terms through the available information in the biological documents. Also for the retrieved documents, the system measures the percentage of the precision and recall.
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
Abstract—The Information retrieval system is taking an important role in current search engine which performs searching operation based on keywords which results in an enormous amount of data available to the user, from which user cannot figure out the essential and most important information. This limitation may be overcome by a new web architecture known as the semantic web which overcome the limitation of the keyword based search technique called the conceptual or the semantic search technique. Natural language processing technique is mostly implemented in a QA system for asking user’s questions and several steps are also followed for conversion of questions to the query form for retrieving an exact answer. In conceptual search, search engine interprets the meaning of the user’s query and the relation among the concepts that document contains with respect to a particular domain that produces specific answers instead of showing lists of answers. In this paper, we proposed the ontology based semantic information retrieval system and the Jena semantic web framework in which, the user enters an input query which is parsed by Standford Parser then the triplet extraction algorithm is used. For all input queries, the SPARQL query is formed and further, it is fired on the knowledge base (Ontology) which finds appropriate RDF triples in knowledge base and retrieve the relevant information using the Jena framework.
The Statement of Conjunctive and Disjunctive Queries in Object Oriented Datab...Editor IJCATR
Entrance of object orienting concept in database caused the relation database gradually to replace with object oriented
database in various fields. On the other hand for solving the problem of real world uncertain data, several methods were presented.
One of these methods for modeling database is an approach wich couples object-oriented database modeling with fuzzy logic. Many
queries that users to pose are expressed on the basis of linguistic variables. Because of classical databases are not able to support these
variables, leads to fuzzy approaches are considered. We investigate databases queries in this study both simple and complex ways. In
the complex way, we use conjunctive and disjunctive queries. In the following, we use the XML labels to express inqueries into fuzzy.
We can also communicate with other sections of software by entering into XML world as the most reliable opportunity. Also we want
to correct conjunctive and disjunctive queries related to fuzzy object oriented database using the concept of dependency measure and
weight, and weight be assigned to different phrases of a query based on user emphasis. The other aim of this research is mapping fuzzy
queries to fuzzy-XML. It is expected to be simple implement of query, and output of execution of queries be greatly closer to users'
needs and fulfill her expect. The results show that the proposed method explains the possible conjunctive and disjunctive queries the
database in the form of Fuzzy-XML.
Text preprocessing is a vital stage in text classification (TC) particularly and text mining generally. Text preprocessing tools is to reduce multiple forms of the word to one form. In addition, text preprocessing techniques are provided a lot of significance and widely studied in machine learning. The basic phase in text classification involves preprocessing features, extracting relevant features against the features in a database. However, they have a great impact on reducing the time requirement and speed resources needed. The effect of the preprocessing tools on English text classification is an area of research. This paper provides an evaluation study of several preprocessing tools for English text classification. The study includes using the raw text, the tokenization, the stop words, and the stemmed. Two different methods chi-square and TF-IDF with cosine similarity score for feature extraction are used based on BBC English dataset. The Experimental results show that the text preprocessing effect on the feature extraction methods that enhances the performance of English text classification especially for small threshold values.
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
From last few years online information is growing tremendously on World Wide Web or on user’s desktops and thus online information gains much more attention in the field of automatic text summarization. Text mining has become a significant research field as it produces valuable data from unstructured and large amount of texts. Summarization systems provide the possibility of searching the important keywords of the texts and so the consumer will expend less time on reading the whole document. Main objective of summarization system is to generate a new form which expresses the key meaning of the contained text. This paper study on various existing techniques with needs of novel Multi-Document summarization schemes. This paper is motivated by arising need to provide high quality summary in very short period of time. In proposed system, user can quickly and easily access correctly-developed summaries which expresses the key meaning of the contained text. The primary focus of this paper lies with thef_β-optimal merge function, a function recently presented here, that uses the weighted harmonic mean to discover a harmony in the middle of precision and recall. Proposed system utilizes Bisect K-means clustering to improve the time and Neural Networks to improve the accuracy of summary generated by NEWSUM algorithm.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Survey of Machine Learning Techniques in Textual Document ClassificationIOSR Journals
Classification of Text Document points towards associating one or more predefined categories based
on the likelihood expressed by the training set of labeled documents. Many machine learning algorithms plays
an important role in training the system with predefined categories. The importance of Machine learning
approach has felt because of which the study has been taken up for text document classification based on the
statistical event models available. The aim of this paper is to present the important techniques and
methodologies that are employed for text documents classification, at the same time making awareness of some
of the interesting challenges that remain to be solved, focused mainly on text representation and machine
learning techniques.
I gave this talk in the EDBT 2014 conference, which tool place in Athens, Greece.
I show how data examples can be used to characterize the behavior of scientific modules. I present a new methods that automatically generate the data examples, and show that such data examples are useful for the human user to understand the task of the modules, and that they can be used to assist curators in repairing broken workflows (i.e., workflows for which one or more modules are no longer supplied by their providers)
Experimental Result Analysis of Text Categorization using Clustering and Clas...ijtsrd
In a world that routinely produces more textual data. It is very critical task to managing that textual data. There are many text analysis methods are available to managing and visualizing that data, but many techniques may give less accuracy because of the ambiguity of natural language. To provide the ne grained analysis, in this paper introduce e cient machine learning algorithms for categorize text data. To improve the accuracy, in proposed system I introduced Natural language toolkit NLTK python library to perform natural language processing. The main aim of proposed system is to generalize the model for real time text categorization applications by using e cient text classi cation as well as clustering machine learning algorithms and nd the efficient and accurate model for input dataset using performance measure concept. Patil Kiran Sanajy | Prof. Kurhade N. V. ""Experimental Result Analysis of Text Categorization using Clustering and Classification Algorithms"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd25077.pdf
Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/25077/experimental-result-analysis-of-text-categorization-using-clustering-and-classification-algorithms/patil-kiran-sanajy
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
Improved Video Transmission over MANETs using MDSR with MDCijsrd.com
MANET does not have any fixed infrastructure, so the mobile nodes are free to move within a network which results in dynamic change of network topology. The Real-time video transport has rigid bandwidth, delay, and a loss requirement to support this application in Mobile Ad-Hoc Networks is a challenge. MANET consists of mobile nodes that cause frequent link failures. This link failure causes two main problems. First, when a route break occurs, all packets that have already transmitted on that route are dropped which degrades the video quality and it decreases the average packet delivery ratio (PDR). Second, the transmission of data traffic is halted for the time till a new route is discovered and it increasing the average end-to-end delay. For that we have proposed Node-Disjoint Multipath Routing Based on DSR Protocol with Multiple Description Coding Technique (MDC). Node-Disjoint Path means there is no common node between two paths and MDC encode a media source into two or more sub-bit streams. The sub-streams, also called descriptions. The experiment has been done using NS2 simulator with Evalvid for evaluating the video quality. Our proposed scheme will improve Packet Delivery Ratio, Throughput and Decreased Average End-to-End Delay.
A Survey of Source Authentication Schemes for Multicast transfer in Adhoc Net...ijsrd.com
An adhoc network is a collection of autonomous nodes with dynamically changing infrastructure. Multicast is a good mechanism for group communication. It can be used in the group oriented applications like video/audio conference, interactive group games, video on demand etc. The security problems obstruct the large deployment of the multicast communication model. Multicast data origin authentication is the main component in the security architecture. The authentication schemes should scalable and efficient against packet loss. In this article we discuss varies authentication scheme for multicast data origin with their advantage and disadvantage
Microcontroller Based Sign Language Gloveijsrd.com
The people who are speech impaired and paralyzed patients those have difficulty in communication. So that patients cannot speak and hear properly and they have problem in communication to other people who don't understand sign languages. So at that time electronic hand glove is used for communication and for that one hand is used for making position of different fingers using flex sensors. The objective of my project is to develop a electronic device for the people who suffer from speech impairment and paralyzed patients. In this, Flex sensor glove is used and Indian sign language's alphabets make using different position of fingers and thumb and their output are shown in the LCD.
Fingerprinting Based Indoor Positioning System using RSSI Bluetoothijsrd.com
Positioning is basis for providing location information to mobile users, however, with the growth of wireless and mobile communications technologies. Mobile phones are equipped with several radio frequency technologies for driving the positioning information like GSM, Wi-Fi or Bluetooth etc. In this way, the objective of this thesis was to implement an indoor positioning system relying on Bluetooth Received Signal Strength (RSS) technology and it integrates into the Global Positioning Module (GPM) to provide precise information inside the building. In this project, we propose indoor positioning system based on RSS fingerprint and footprint architecture that smart phone users can get their position through the assistance collections of Bluetooth signals, confining RSSs by directions, and filtering burst noises that can overcome the server signal fluctuation problem inside the building. Meanwhile, this scheme can raise more accuracy in finding the position inside the building.
A Review on Various Most Common Symmetric Encryptions Algorithmsijsrd.com
Security is the most challenging aspects in the internet and network application. Internet and networks applications are growing very fast, so the importance and the value of the exchanged data over the internet or other media types are increasing. Information security has been very important issue in data communication. Any loss or threat to information can prove to be great loss to the organization. Encryption technique plays a main role in information security system. This paper gives a comparison of various encryption algorithms and then finds best available one algorithm for the network security.
Extracting and Reducing the Semantic Information Content of Web Documents to ...ijsrd.com
Ranking and optimization of web service compositions represent challenging areas of research with significant implication for realization of the "Web of Services" vision. The semantic web, where the semantics information is indicated using machine-process able language such as the Web Ontology Language (OWL) "Semantic web service" use formal semantic description of web service functionality and enable automated reasoning over web service compositions. These semantic web services can then be automatically discovered, composed into more complex services, and executed. Automating web service composition through the use of semantic technologies calculating the semantic similarities between outputs and inputs of connected constituent services, and aggregate these values into a measure of semantics quality for the composition. It propose a novel and extensible model balancing the new dimensions of semantic quality ( as a functional quality metric) with QoS metric, and using them together as a ranking and optimization criteria. It also demonstrates the utility of Genetic Algorithms to allow optimization within the context of a large number of services foreseen by the "Web of Service" vision. To reduce the semantics of the web documents then to support semantic document retrieval by using Network Ontology Language (NOL) and to improve QoS as a ranking and optimization.
Simulation Of Interline Power Flow Controller in Power Transmission Systemijsrd.com
The interline power flow controller (IPFC) is one of the latest generation flexible AC transmission systems (FACTS) controller used to control power flows of multiple transmission lines. The IPFC is the multifunction device, such as power flow control, voltage control, oscillation damping. This paper presents an overview and study and mathematical model of Interline Power Flow Control. The simulations of a simple power system of 500kV/230kV in MATLAB and simulation results are carried out on it. The results without and with IPFC are compared in terms of voltages, active and reactive power flows to demonstrate the performance of the IPFC model.
Development of Adaptive Neuro Fuzzy Inference System for Estimation of Evapot...ijsrd.com
The accuracy of an adaptive neurofuzzy computing technique in estimation of reference evapotranspiration (ETo) is investigated in this paper. The model is based on Adaptive Neurofuzzy Inference System (ANFIS) and uses commonly available weather information such as the daily climatic data, Maximum and Minimum Air Temperature, Relative Humidity, Wind Speed and Sunshine hours from station, Karjan (Latitude - 22°03'10.95"N, Longitude - 73°07'24.65"E), in Vadodara (Gujarat), are used as inputs to the neurofuzzy model to estimate ETo obtained using the FAO-56 Penman.Monteith equation. The daily meteorological data of two years from 2009 and 2010 at Karjan Takuka, Vadodara, are used to train the model, and the data in 2011 is used to predict the ETo in that year and to validate the model. The ETo in training period (Train- ETo) and the predicted results (Test-ETo) are compared with the ETo computed by Penman-Monteith method (PM-ETo) using "gDailyET" Software. The results indicate that the PM-ETo values are closely and linearly correlated with Train- ETo and Test- ETo with Root Mean Squared Error (RMSE) and showed the higher significances of the Train- ETo and Test- ETo. The results indict the feasibility of using the convenient model to resolve the problems of agriculture irrigation with intelligent algorithm, and more accurate weather forecast, appropriate membership function and suitable fuzzy rules.
Transient Stability of Power System using Facts Device-UPFCijsrd.com
This paper is based on Occurrence of a fault in a power system causes transients. To stabilize the system, The Flexible Alternating Current Transmission (FACTS) devices such as UPFC are becoming important in suppressing power system oscillations and improving system damping. The UPFC is a solid-state device, which can be used to control the active and reactive power.. By using a UPFC the oscillation introduced by the faults, the rotor angle and speed deviations can be damped out quickly than a system without a UPFC. The effectiveness of UPFC in suppressing power system oscillation is investigated by analyzing their oscillation in rotor angle and change in speed occurred in the two machine system considered in this work. A proportional integral (PI) controller has been employed for the UPFC. It is also shown that a UPFC can control independently the real and reactive power flow in a transmission line. A MATLAB simulation has been carried out to demonstrate the performance of the UPFC in achieving transient stability of the two-machine five-bus system.
Quality Evaluation Technique For Phyllanthus Emblica(Gooseberry) Using Comput...ijsrd.com
This paper proposes quality assessment method to classify a phyllanthus emblica (gooseberry) using computer vision by surface and geometric features. India is one of the most important gooseberry producers in North Asia, than Germany, Poland, U.K, Russia etc., but fruit sorting in some area is still done by hand which is tedious and inaccurate. Thus, the need exists for improvement of efficiency and accuracy of this fruit quality assessment that can meet the demands of international markets. Low-cost and non-destructive technologies capable of sorting gooseberry according to their properties would help to promote the gooseberry export industries. This paper propose the method of colorization and extracting value parameters, by this parameters the detection of browning or affected part and identification of the uniform shape and size. This differentiates the quality of gooseberries processed as well as fresh. For classification the decision tree is used.
Modeling and Simulation of Base Plate of Friction Stir Welding-Advanced Weldi...ijsrd.com
Friction stir processing is an emerging technique based on the principles of friction stir welding (FSW). It is a solid-state joining method that is energy efficient, environmentally friendly, and versatile. It is considered by many to be the most significant development in metal joining in a decade. The basic concept of friction stir processing is remarkably simple. A rotating tool with pin and shoulder is inserted in the material to be joined, and traversed along the line of interest. The heating is localized, and is generated by friction between the tool and the work piece, with additional adiabatic heating from metal deformation. A processed zone is produced by movement of material from the front of the pin to the back of the pin.
In this paper we attempt to give a networking solution by applying VLSI architecture techniques to router design for networking systems to provide intelligent control over the network. Networking routers to have limited input/output configurations, which we attempt to overcome by adopting bridging loops to reduce the latency and security concerns. Other techniques we explore include the use of multiple protocols. We attempt to overcome the security and latency issues with protocol switching technique embedded in the router engine itself. The approach is based on hardware coding to reduce the impact of latency issues as the hardware itself is designed according to the need. We attempt to provide a multipurpose networking router by means of Verilog code, thus we can maintain the same switching speed with more security we embed the packet storage buffer on chip and generate the code as self-independent VLSI based router. Our main focus is the implementation of hardware IP .router. The approach enables the router to process multiple incoming IP packets with different versions of protocols simultaneously, e.g. for IPv4 and IPv6. The approach will results in increased switching speed of routing per packet for both current trend protocols, which we believe would result inconsiderable enhancement in networking systems.
Packed Bed Reactor for Catalytic Cracking of Plasma Pyrolyzed Gasijsrd.com
Packed bed reactors play vital role in chemical industries for obtaining valuable product, like steam reforming of natural gas, ammonia synthesis, sulphuric acid production, methanol synthesis, methanol oxidation, butadiene production, styrene production. It is not only used for production but also used in separation process like adsorption, distillation and stripping section. Packed bed reactors are work horse of the chemical and petroleum industries. Its low cost, and simplicity makes it first choice to any chemical processes. In our experimental work vacuum residue is used as a feed which is pyrolyzed in the primary chamber with the help of plasma into hydrogen and hydrocarbon gases which is feed stream to the Ni catalyst containing packed bed reactor called catalytic cracker. Ni loading in the catalyst about 70 % is used to crack or decompose lower molecular hydrocarbon in to hydrogen to maximize the energy content per mass flow of gas steam and also to minimize the carbon dioxide equivalent gases at outlet of the reactor. Since cracking is surface phenomena so the catalyst play important role in designing of reactor shape. Parallel Catalytic packed bed with regeneration and deactivation can be used for commercial production of clean fuel.
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENTijnlc
This project aims to develop a system which converts a natural language statement into MySQL query to retrieve information from respective database. The system mainly focuses on creation of complex queries which includes nested queries with more than two-level depth, queries with aggregate functions, having clause, group by clause and co-related queries which are formed due to constraint on aggregate function. The natural language input statement taken from the user is passed through various OpenNLP natural language processing techniques like Tokenization, Parts of Speech Tagging, Stemming and Lemmatization to get the statement in the desired form. The statement is further processed to extract the type of query, the basic clause, which specifies the required entities from the database and the condition clause, which specifies constraints on the basic clause. The final query is generated by converting the basic and condition clauses to their query form and then concatenating the condition query to the basic query. Currently, the system works only with MySQL database.
Class Diagram Extraction from Textual Requirements Using NLP Techniquesiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
Machine learning for text document classification-efficient classification ap...IAESIJAI
Numerous alternative methods for text classification have been created because of the increase in the amount of online text information available. The cosine similarity classifier is the most extensively utilized simple and efficient approach. It improves text classification performance. It is combined with estimated values provided by conventional classifiers such as Multinomial Naive Bayesian (MNB). Consequently, combining the similarity between a test document and a category with the estimated value for the category enhances the performance of the classifier. This approach provides a text document categorization method that is both efficient and effective. In addition, methods for determining the proper relationship between a set of words in a document and its document categorization is also obtained.
Automatically finding domain specific key terms from a given set of research paper is a challenging task and research papers to a particular area of research is a concern for many people including students, professors and researchers. A domain classification of papers facilitates that search process. That is, having a list of domains in a research field, we try to find out to which domain(s) a given paper is more related. Besides, processing the whole paper to read take a long time. In this paper, using domain knowledge requires much human effort, e.g., manually composing a set of labeling a large corpus. In particular, we use the abstract and keyword in research paper as the seeing terms to identify similar terms from a domain corpus which are then filtered by checking their appearance in the research papers. Experiments show the TF –IDF measure and the classification step make this method more precisely to domains. The results show that our approach can extract the terms effectively, while being domain independent.
Text document clustering and similarity detection is the major part of document management, where every document should be identified by its key terms and domain knowledge. Based on the similarity, the documents are grouped into clusters. For document similarity calculation there are several approaches were proposed in the existing system. But the existing system is either term based or pattern based. And those systems suffered from several problems. To make a revolution in this challenging environment, the proposed system presents an innovative model for document similarity by applying back propagation time stamp algorithm. It discovers patterns in text documents as higher level features and creates a network for fast grouping. It also detects the most appropriate patterns based on its weight and BPTT performs the document similarity measures. Using this approach, the document can be categorized easily. In order to perform the above, a new approach is used. This helps to reduce the training process problems. The above framework is named as BPTT. The BPTT has implemented and evaluated using dot net platform with different set of datasets.
Towards Ontology Development Based on Relational Databaseijbuiiir1
Ontology is defined as the formal explicit specification of a shared conceptualization. It has been widely used in almost all fields especially artificial intelligence, data mining, and semantic web etc. It is constructed using various set of resources. Now it has become a very important task to improve the efficiency of ontology construction. In order to improve the efficiency, need an automated method of building ontology from database resource. Since manual construction is found to be erroneous and not up to the expectation, automatic construction of ontology from database is innovated. Then the construction rules for ontology building from relational data sources are put forward. Finally, ontology for �automated building of ontology from relational data sources� has been implemented
Answer extraction and passage retrieval forWaheeb Ahmed
—Question Answering systems (QASs) do the task of
retrieving text portions from a collection of documents that
contain the answer to the user’s questions. These QASs use a
variety of linguistic tools that be able to deal with small
fragments of text. Therefore, to retrieve the documents which
contains the answer from a large document collections, QASs
employ Information Retrieval (IR) techniques to minimize the
number of documents collections to a treatable amount of
relevant text. In this paper, we propose a model for passage
retrieval model that do this task with a better performance for
the purpose of Arabic QASs. We first segment each the top five
ranked documents returned by the IR module into passages.
Then, we compute the similarity score between the user’s
question terms and each passage. The top five passages (with
high similarity score) are retrieved are retrieved. Finally,
Answer Extraction techniques are applied to extract the final
answer. Our method achieved an average for precision of
87.25%, Recall of 86.2% and F1-measure of 87%.
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Due to availability of internet and evolution of embedded devices, Internet of things can be useful to contribute in energy domain. The Internet of Things (IoT) will deliver a smarter grid to enable more information and connectivity throughout the infrastructure and to homes. Through the IoT, consumers, manufacturers and utility providers will come across new ways to manage devices and ultimately conserve resources and save money by using smart meters, home gateways, smart plugs and connected appliances. The future smart home, various devices will be able to measure and share their energy consumption, and actively participate in house-wide or building wide energy management systems. This paper discusses the different approaches being taken worldwide to connect the smart grid. Full system solutions can be developed by combining hardware and software to address some of the challenges in building a smarter and more connected smart grid.
A Survey Report on : Security & Challenges in Internet of Thingsijsrd.com
In the era of computing technology, Internet of Things (IoT) devices are now popular in each and every domains like e-governance, e-Health, e-Home, e-Commerce, and e-Trafficking etc. Iot is spreading from small to large applications in all fields like Smart Cities, Smart Grids, Smart Transportation. As on one side IoT provide facilities and services for the society. On the other hand, IoT security is also a crucial issues.IoT security is an area which totally concerned for giving security to connected devices and networks in the IoT .As, IoT is vast area with usability, performance, security, and reliability as a major challenges in it. The growth of the IoT is exponentially increases as driven by market pressures, which proportionally increases the security threats involved in IoT The relationship between the security and billions of devices connecting to the Internet cannot be described with existing mathematical methods. In this paper, we explore the opportunities possible in the IoT with security threats and challenges associated with it.
In today’s emerging world of Internet, each and every thing is supposed to be in connected mode with the help of billions of smart devices. By connecting all the devises used in our day to day life, make our life trouble less and easy. We are incorporated in a world where we are used to have smart phones, smart cars, smart gadgets, smart homes and smart cities. Different institutes and researchers are working for creating a smart world for us but real question which we need to emphasis on is how to make dumb devises talk with uncommon hardware and communication technology. For the same what kind of mechanism to use with various protocols and less human interaction. The purpose is to provide the key area for application of IoT and a platform on which various devices having different mechanism and protocols can communicate with an integrated architecture.
Study on Issues in Managing and Protecting Data of IOTijsrd.com
This paper discusses variety of issues for preserving and managing data produced by IoT. Every second large amount of data are added or updated in the IoT databases across the heterogeneous environment. While managing the data each phase of data processing for IoT data is exigent like storing data, querying, indexing, transaction management and failure handling. We also refer to the problem of data integration and protection as data requires to be fit in single layout and travel securely as they arrive in the pool from diversified sources in different structure. Finally, we confer a standardized pathway to manage and to defend data in consistent manner.
Interactive Technologies for Improving Quality of Education to Build Collabor...ijsrd.com
Today with advancement in Information Communication Technology (ICT) the way the education is being delivered is seeing a paradigm shift from boring classroom lectures to interactive applications such as 2-D and 3-D learning content, animations, live videos, response systems, interactive panels, education games, virtual laboratories and collaborative research (data gathering and analysis) etc. Engineering is emerging with more innovative solutions in the field of education and bringing out their innovative products to improve education delivery. The academic institutes which were once hesitant to use such technology are now looking forward to such innovations. They are adopting the new ways as they are realizing the vast benefits of using such methods and technology. The benefits are better comprehensibility, improved learning efficiency of students, and access to vast knowledge resources, geographical reach, quick feedback, accountability and quality research. This paper focuses on how engineering can leverage the latest technology and build a collaborative learning environment which can then be integrated with the national e-learning grid.
Internet of Things - Paradigm Shift of Future Internet Application for Specia...ijsrd.com
In the world more than 15% people are living with disability that also include children below age of 10 years. Due to lack of independent support services specially abled (handicap) people overly rely on other people for their basic needs, that excludes them from being financially and socially active. The Internet of Things (IoT) can give support system and a better quality of life as well as participation in routine and day to day life. For this purpose, the future solutions for current problems has been introduced in this paper. Daunting challenges have been considered as future research and glimpse of the IoT for specially abled person is given in the paper.
A Study of the Adverse Effects of IoT on Student's Lifeijsrd.com
Internet of things (IoT) is the most powerful invention and if used in the positive direction, internet can prove to be very productive. But, now a days, due to the social networking sites such as Face book, WhatsApp, twitter, hike etc. internet is producing adverse effects on the student life, especially those students studying at college Level. As it is rightly said, something which has some positive effects also has some of the negative effects on the other hand. In this article, we are discussing some adverse effects of IoT on student’s life.
Pedagogy for Effective use of ICT in English Language Learningijsrd.com
The use of information and communications technology (ICT) in education is a relatively new phenomenon and it has been the educational researchers' focus of attention for more than two decades. Educators and researchers examine the challenges of using ICT and think of new ways to integrate ICT into the curriculum. However, there are some barriers for the teachers that prevent them to use ICT in the classroom and develop supporting materials through ICT. The purpose of this study is to examine the high school English teachers’ perceptions of the factors discouraging teachers to use ICT in the classroom.
In recent years usage of private vehicles create urban traffic more and more crowded. As result traffic becomes one of the important problems in big cities in all over the world. Some of the traffic concerns are traffic jam and accidents which have caused a huge waste of time, more fuel consumption and more pollution. Time is very important parameter in routine life. The main problem faced by the people is real time routing. Our solution Virtual Eye will provide the current updates as in the real time scenario of the specific route. This research paper presents smart traffic navigation system, based on Internet of Things, which is featured by low cost, high compatibility, easy to upgrade, to replace traditional traffic management system and the proposed system can improve road traffic tremendously.
Ontological Model of Educational Programs in Computer Science (Bachelor and M...ijsrd.com
In this work there is illustrated an ontological model of educational programs in computer science for bachelor and master degrees in Computer science and for master educational program “Computer science as second competence†by Tempus project PROMIS.
Understanding IoT Management for Smart Refrigeratorijsrd.com
Lately the concept of Internet of Things (IoT) is being more elaborated and devices and databases are proposed thereby to meet the need of an Internet of Things scenario. IoT is being considered to be an integral part of smart house where devices will be connected to each other and also react upon certain environmental input. This will eventually include the home refrigerator, air conditioner, lights, heater and such other home appliances. Therefore, we focus our research on the database part for such an IoT’ fridge which we called as smart Fridge. We describe the potentials achievable through a database for an IoT refrigerator to manage the refrigerator food and also aid the creation of a monthly budget of the house for a family. The paper aims at the data management issue based on a proposed design for an intelligent refrigerator leveraging the sensor technology and the wireless communication technology. The refrigerator which identifies products by reading the barcodes or RFID tags is proposed to order the required products by connecting to the Internet. Thus the goal of this paper is to minimize human interaction to maintain the daily life events.
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...ijsrd.com
Double wishbone designs allow the engineer to carefully control the motion of the wheel throughout suspension travel. 3-D model of the Lower Wishbone Arm is prepared by using CAD software for modal and stress analysis. The forces and moments are used as the boundary conditions for finite element model of the wishbone arm. By using these boundary conditions static analysis is carried out. Then making the load as a function of time; quasi-static analysis of the wishbone arm is carried out. A finite element based optimization is used to optimize the design of lower wishbone arm. Topology optimization and material optimization techniques are used to optimize lower wishbone arm design.
A Review: Microwave Energy for materials processingijsrd.com
Microwave energy is a latest largest growing technique for material processing. This paper presents a review of microwave technologies used for material processing and its use for industrial applications. Advantages in using microwave energy for processing material include rapid heating, high heating efficiency, heating uniformity and clean energy. The microwave heating has various characteristics and due to which it has been become popular for heating low temperature applications to high temperature applications. In recent years this novel technique has been successfully utilized for the processing of metallic materials. Many researchers have reported microwave energy for sintering, joining and cladding of metallic materials. The aim of this paper is to show the use of microwave energy not only for non-metallic materials but also the metallic materials. The ability to process metals with microwave could assist in the manufacturing of high performance metal parts desired in many industries, for example in automotive and aeronautical industries.
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logsijsrd.com
With an expontial growth of World Wide Web, there are so many information overloaded and it became hard to find out data according to need. Web usage mining is a part of web mining, which deal with automatic discovery of user navigation pattern from web log. This paper presents an overview of web mining and also provide navigation pattern from classification and clustering algorithm for web usage mining. Web usage mining contain three important task namely data preprocessing, pattern discovery and pattern analysis based on discovered pattern. And also contain the comparative study of web mining techniques.
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEMijsrd.com
Application of FACTS controller called Static Synchronous Compensator STATCOM to improve the performance of power grid with Wind Farms is investigated .The essential feature of the STATCOM is that it has the ability to absorb or inject fastly the reactive power with power grid . Therefore the voltage regulation of the power grid with STATCOM FACTS device is achieved. Moreover restoring the stability of the power system having wind farm after occurring severe disturbance such as faults or wind farm mechanical power variation is obtained with STATCOM controller . The dynamic model of the power system having wind farm controlled by proposed STATCOM is developed . To validate the powerful of the STATCOM FACTS controller, the studied power system is simulated and subjected to different severe disturbances. The results prove the effectiveness of the proposed STATCOM controller in terms of fast damping the power system oscillations and restoring the power system stability.
Making model of dual axis solar tracking with Maximum Power Point Trackingijsrd.com
Now a days solar harvesting is more popular. As the popularity become higher the material quality and solar tracking methods are more improved. There are several factors affecting the solar system. Major influence on solar cell, intensity of source radiation and storage techniques The materials used in solar cell manufacturing limit the efficiency of solar cell. This makes it particularly difficult to make considerable improvements in the performance of the cell, and hence restricts the efficiency of the overall collection process. Therefore, the most attainable maximum power point tracking method of improving the performance of solar power collection is to increase the mean intensity of radiation received from the source used. The purposed of tracking system controls elevation and orientation angles of solar panels such that the panels always maintain perpendicular to the sunlight. The measured variables of our automatic system were compared with those of a fixed angle PV system. As a result of the experiment, the voltage generated by the proposed tracking system has an overall of about 28.11% more than the fixed angle PV system. There are three major approaches for maximizing power extraction in medium and large scale systems. They are sun tracking, maximum power point (MPP) tracking or both.
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...ijsrd.com
In day today's relevance, it is mandatory to device the usage of diesel in an economic way. In present scenario, the very low combustion efficiency of CI engine leads to poor performance of engine and produces emission due to incomplete combustion. Study of research papers is focused on the improvement in efficiency of the engine and reduction in emissions by adding ethanol in a diesel with different blends like 5%, 10%, 15%, 20%, 25% and 30% by volume. The performance and emission characteristics of the engine are tested observed using blended fuels and comparative assessment is done with the performance and emission characteristics of engine using pure diesel.
Study and Review on Various Current Comparatorsijsrd.com
This paper presents study and review on various current comparators. It also describes low voltage current comparator using flipped voltage follower (FVF) to obtain the single supply voltage. This circuit has short propagation delay and occupies a small chip area as compare to other current comparators. The results of this circuit has obtained using PSpice simulator for 0.18 μm CMOS technology and a comparison has been performed with its non FVF counterpart to contrast its effectiveness, simplicity, compactness and low power consumption.
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...ijsrd.com
Power dissipation is a challenging problem for today's system-on-chip design and test. This paper presents a novel architecture which generates the test patterns with reduced switching activities; it has the advantage of low test power and low hardware overhead. The proposed LP-TPG (test pattern generator) structure consists of modified low power linear feedback shift register (LP-LFSR), m-bit counter, gray counter, NOR-gate structure and XOR-array. The seed generated from LP-LFSR is EXCLUSIVE-OR ed with the data generated from gray code generator. The XOR result of the sequence is single input changing (SIC) sequence, in turn reduces the switching activity and so power dissipation will be very less. The proposed architecture is simulated using Modelsim and synthesized using Xilinx ISE9.2.The Xilinx chip scope tool will be used to test the logic running on FPGA.
Defending Reactive Jammers in WSN using a Trigger Identification Service.ijsrd.com
In the last decade, the greatest threat to the wireless sensor network has been Reactive Jamming Attack because it is difficult to be disclosed and defend as well as due to its mass destruction to legitimate sensor communications. As discussed above about the Reactive Jammers Nodes, a new scheme to deactivate them efficiently is by identifying all trigger nodes, where transmissions invoke the jammer nodes, which has been proposed and developed. Due to this identification mechanism, many existing reactive jamming defending schemes can be benefited. This Trigger Identification can also work as an application layer .In this paper, on one side we provide the several optimization problems to provide complete trigger identification service framework for unreliable wireless sensor networks and on the other side we also provide an improved algorithm with regard to two sophisticated jamming models, in order to enhance its robustness for various network scenarios.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
ML for identifying fraud using open blockchain data.pptx
Novel Database-Centric Framework for Incremental Information Extraction
1. IJSRD - International Journal for Scientific Research & Development| Vol. 1, Issue 4, 2013 | ISSN (online): 2321-0613
All rights reserved by www.ijsrd.com 827
Abstract--Information extraction (IE) has been an active
research area that seeks techniques to uncover information
from a large collection of text. IE is the task of
automatically extracting structured information
from unstructured and/or semi structured machine-
readable documents. In most of the cases this activity
concerns processing human language texts by means
of natural language processing (NLP). Recent activities in
document processing like automatic annotation and content
extraction could be seen as information extraction. Many
applications call for methods to enable automatic extraction
of structured information from unstructured natural
language text. Due to the inherent challenges of natural
language processing, most of the existing methods for
information extraction from text tend to be domain specific.
In this project a new paradigm for information extraction. In
this extraction framework, intermediate output of each text
processing component is stored so that only the improved
component has to be deployed to the entire corpus.
Extraction is then performed on both the previously
processed data from the unchanged components as well as
the updated data generated by the improved component.
Performing such kind of incremental extraction can result in
a tremendous reduction of processing time and there is a
mechanism to generate extraction queries from both labeled
and unlabeled data. Query generation is critical so that
casual users can specify their information needs without
learning the query language.
I. INTRODUCTION
Information extraction (IE) is typically realized by special-
purpose programs that perform a sequence of processing
modules, including sentence splitters, tokenizers, named
entity recognizers, shallow or deep syntactic parsers, and
finally extraction based on a collection of patterns.
However, such a framework is inflexible and expensive in
face of dynamic application needs. Consider a biology-
oriented scenario when the original information extraction
goal is to extract interact- tions among proteins from a
corpus of text. Suppose later on we are interested in
finding gene-disease associations from the same corpus.
Existing approaches would have to develop a new
extraction system specifically for this new extraction goal,
and run that extraction system on the entire corpus from
scratch, which is very expensive. Consider another
application scenario where the extraction goal remains the
same, but an improved named entity recognizer becomes
available. This would also require extraction to be
performed from scratch on the entire corpus. However, we
observe that only a portion of the corpus is affected with
newly recognized entities, as the majority of the entities are
overlaps between the original and the improved recognizers.
Such expensive re-computation should be minimized. This
is particularly true for extraction in the biomedical domain,
where a full processing of all 17 million Medline abstracts
took about more than 36K hours of CPU time using a
single-core CPU with 2-GHz and 2 GB of RAM. In this
case, the Link Grammar parser [3] contributes to a large
portion of the time spent in text processing.
In this demonstration, we propose a new paradigm of infor-
mation extraction in the form of database queries. We present
a general-purpose information extraction system, UniqIE, in
the context of biomedical extraction, which can efficiently
handle diverse extraction needs and keep the extracted
information up- to-date incrementally when new knowledge
becomes available. The insight of UniqIE is that changes in
extraction goals or deployment of improved processing
modules hardly affect all sentences in the entire collection.
Thus we differentiate two phases of processing.
1) Initial Phase: we perform a one-time parse,
entity recognition and tagging (identifying individual
entries as belonging to a class of interest) on the whole
corpus based on current knowledge. The generated
syntactic parse trees and semantic entity tagging of the
processed text is stored in a parse tree database (PTDB).
2) Extraction Phase: Extracting particular kinds of
relations can be done by issuing an appropriate query to
PTDB. As query languages such as XPath and XQuery
are not suitable for extracting linguistic patterns [2], we
design and implement a query language called PTQL for
pattern extraction which effectively achieves diverse IE
goals [6]. To ease the extraction tasks for users, our system
not only allows a user to issue PTQL queries for
extraction, but it can also automatically generate queries
for high-quality extraction based on user input keyword-
based queries and feedback.
There are several advantages of the proposed approach,
which have been demonstrated in our initial experimental
evaluation. First, using database queries instead of
writing individual special-purpose programs, information
extraction becomes generic for diverse applications and
becomes easier for the user. The user can express and
analyze an extraction pattern by issuing a database query.
When a user has a new extraction goal, the user only needs
to write another query on PTDB without developing and
running new programs.
Second, upon new extraction goals, the two-phase approach
(A) Query-specified Extraction: evaluation of PTQL queries
through filtering and translation to SQL queries.
(B) Pseudo- relevance Feedback Query Generation:
generation of PTQL queries based on the common
Novel Database-Centric Framework for Incremental Information
Extraction
K. Keerthana1
Mr. P. Sasikumar2
1
student M.E-CSE 2
M.Tech (Ph.D.) 2
Project Guide- Lecturer
1, 2
Selvam College of Technology, Namakkal
S.P.B.Patel Engineering College, Mehsana, Gujarat
2. Novel Database-Centric Framework for Incremental Information Extraction
(IJSRD/Vol. 1/Issue 4/2013/0005)
All rights reserved by www.ijsrd.com 828
grammatical patterns among the top-ranked sentences
relevant to the user keyword-based queries.
Fig (1): Architecture of UniqIE
Avoids performing the initial phase again, an extremely
expensive phase that has be to performed by existing
approaches.
Third, with the use of databases, UniqIE only needs to
perform extraction incrementally on the sentences that
are affected by an improved module, and thus it is much
more efficient than running the whole extraction
programs from scratch as required by existing systems.
Suppose an improved named entity recognizer that can
discover a more extensive list of protein names becomes
available. We only need to perform a delta extraction on the
database with respect to the newly recognized protein names
using queries.
Indeed, the ability of expressing information extraction
and exploiting database optimizations for the process is also
observed in [7]. While [7] proposes to use Data logs for ex-
tracting facts from “relational tables”, we focus on
extracting meaningful “tables” from “parse trees” of text
documents. Due to the variety of extraction needs, the
existence of hierarchical data structure and the lack of a
relational schema, this involves a new set of technical
challenges as outlined in Section IV.
II. SYSTEM
Figure 1 illustrates the system architecture of our UniqIE
system. The Text Processor performs the Initial Phase for
corpus processing and stores the processed information in
the Parse Tree Database (PTDB). The extraction patterns
over parse trees can be expressed in our proposed parse tree
query language (PTQL). The PTQL query evaluator takes a
PTQL query and transforms it into keyword-based queries
and SQL queries, which are evaluated by the underlying
RDBMS and IR engine. The index builder creates an
inverted index for the corpus as part of the query evaluation
by the IR engine.
The user interface provides two input modes:
query- specified extraction mode and pseudo-relevance-
feedback ex- traction mode. A user can directly specify
PTQL queries for extraction in query-specified extraction
mode. The user inter face also provides the capability for
users to input a keyword- based query. When a user
keyword query is issued, relevant sentences are retrieved
using an existing IR keyword search engine. With the top-
ranked sentences, their corresponding grammatical
structures are retrieved from PTDB. The PTQL query
generator then uncovers the common grammatical pat- terns
by considering the parse trees of the top-ranked sentences to
automatically augment the initial keyword-based queries
and generate PTQL queries. Extracted results are
presented to the users once the queries are evaluated.
Furthermore, an explainer module is available to illustrate
the provenance of query results by showing the syntactic
structures of the sentences involved in the extracted results.
This helps the users understand, and enhance their queries
accordingly.
A. Text Parsing and Parse Tree Database (PTDB)
The Text Processor parses Medline abstracts with the Link
Grammar parser [3], and identifies entities in the sentences.
Each document is represented as a hierarchical
representation called the parse tree of a document. A parse
tree is composed of a constituent tree and a linkage. A
constituent tree is a syntactic tree of a sentence with the
nodes represented by part-of-speech tags and leafs
corresponding to words in the sentence. A linkage, on the
other hand, represents the syntactic dependencies (or links)
between pairs of words in a sentence. Each node in the
parse tree has labels and attributes capturing the document
structure (such as title, sections, sentences), part-of-speech
tags, and entity types of corresponding words. Figure 2
shows a sample parse tree of a sentence, where the solid
lines indicate parent-child relationships in the constituent
tree and the dotted lines represent the linkage. Each leaf
node in a parse tree has a value and a tag attribute. The
tag attribute indicates the entity type of a leaf node.
The Parse Tree Database is a relational database for storing
parse trees and semantic types provided by the Text
Processor.
B. Information Extraction Using PTQL Queries
To perform information extraction, we propose a query
language, PTQL, to specify linguistic patterns on parse
trees. An example of a PTQL query is shown in Figure 3. A
PTQL query consists four components delimited by colons:
(i) tree patterns, (ii) a link condition, (iii) a proximity
condition, and (iv) a return expression. A tree pattern
describes the hierarchical structure and the horizontal
order of the nodes in a linguistic extraction pattern. X Path
axis are used for expressing node relationships. In the
example for Figure 3, the tree pattern specifies that there
is a node labeled as S as the root of a sub tree that
contains three nodes represented by variables i1, v and
i2. A link condition describes the linking dependencies
between nodes.
Fig. 2:an example of a parse tree
//S{//?[tag=’GENE’](i1)=>//V[Value=’regu
lates’] (v)=>//?[tag=’GENE’](i2)}: i1 !S v
and
v !O i2 :: i1.value, v.value,
3. Novel Database-Centric Framework for Incremental Information Extraction
(IJSRD/Vol. 1/Issue 4/2013/0005)
All rights reserved by www.ijsrd.com 829
i2.value
Fig. 3: An example of a PTQL query
In the example for Figure3, i1! S v represents that the
node denoted by i1 has to be connected to the node
denoted by v through an S link. In other words, i1 is
the subject and v is the corresponding verb. Similarly, the
link term v! O i 2 indicates that i2 is the object and v is the
corresponding verb. A proximity condition specifies words
that are within a specified word distance in the sentence.
A return expression defines the list of elements to be
returned. In the example, i1.value, v.value,
i2.value indicate to return the bindings to the
variables i1, v, i2 (i.e. two interactors and the
interaction verb) for sentences that satisfy the query. The
parse tree in Figure 2 satisfies the query. The details of the
PTQL query language and its implementation can be
found in [6].
C. Pseudo-relevance Feedback Query Generation
To ease the learning curve in issuing PTQL queries for the
users, UniqIE allows a user to issue simple keyword-based
queries, and automatically generates PTQL queries based
on the user keyword query.
To achieve this, it first performs an initial retrieval from
the inverted index of the corpus with the user keyword
query. Among the top-k% of the retrieved sentences Sk ,
the parse trees of Sk are retrieved from PTDB to find
the common grammatical patterns among Sk . Intuitively,
a sentence that bears the common grammatical patterns
among the top-ranked sentences is likely to be relevant.
Second, for each parse tree of the relevant sentence
UniqIE extracts the sub tree that is rooted at the LCA
(lowest common ancestor) lca of the query terms. Third,
to efficiently compare and find the common patterns,
UniqIE generates m-th l e v e l string encodings for each
sub tree [5]. When m = 0, the string encodes the exact
linguistic pattern in the sub tree, and thus the retrieved
sentences have the exact pattern as the relevant sentences,
potentially with a high precision. With the increase value of
m, the string encodes a more generalized linguistic pattern,
and is likely to retrieve more sentences that lead to a higher
recall with possible compromise on precision. Fourth,
identical m-th level string encodings form clusters of
common grammatical patterns Cm . Finally, a PTQL query
is generated for each of the clusters in Cm .
D. Query Evaluation and Optimization
To evaluate PTQL queries on PTDB, the Query Translator
generates SQL queries from PTQL queries. Efficiency is a
key requirement for query evaluation. One of our opti-
mizations is that for each PTQL query, the Filter module
First generates an keyword-based query to efficiently prune
irrelevant sentences, and then the Query Translator
generates a SQL query equivalent to the PTQL query, and
performs the actual extraction only on relevant sentences.
The keyword- based query captures keywords in the PTQL
query, while the extraction query captures both the structural
patterns and keywords. The keyword-based and SQL
queries are evaluated using an IR engine and a relational
database, respectively. For efficient query processing, the
Index Builder creates an inverted index that indexes
sentences according to the words, named entities and entity
types.
III. DEMONSTRATION
Fig. 4: A screenshot for the UniqIE system showing the
query results that share the same m-th level string
encoding.
What will be shown in the demo? Our web-based demon-
stration , as shown in Figure 4, will illustrate how the
UniqIE system enables generic extraction.
1) Query-specified Extraction. In the demonstration,
the user can input a PTQL query to express an extraction
pattern or select one of the PTQL query examples. We will
show that to perform extraction, a user no longer needs to
write specific extraction programs.
2) Pseudo-relevance Feedback Query Generation.
The user can express extraction patterns in the form of
keyword-based queries. This scenario illustrates the
feasibility of generating PTQL queries from keyword-
based queries through a mech- anism inspired by the
pseudo-relevance feedback approach commonly found in
IR. In addition, the user can achieve extraction results for
optimal precision or recall by adjusting the value of m.
3) Two Phase Extraction and Incremental
Evaluation. We will illustrate the efficiency of UniqIE
when new extraction goals or improved processing
components emerge. For instance, as- sume that NER1 is a
currently deployed gene name recognizer, and NER2 is an
improved version NER1 to be adopted by UniqIE. The
user can browse the sentences that are affected, i.e.
sentences with genes that are recognized by NER2 but not
NER1, and vice versa. Then the user can see that the
extraction is incrementally performed on the affected
sentences only, and thus it is very efficient.
4) Provenance of Query Results and Query
Explanation. To help users develop and test their queries,
upon click, the prove- nance of the query results will be
displayed, which includes the original sentence along with
its parse tree. UniqIE also illustrates the flow of every step
of the query generation and PTQL query evaluation.
IV. DISCUSSION
Significance of Our Approach
The significance of our approach lies in three aspects.
4. Novel Database-Centric Framework for Incremental Information Extraction
(IJSRD/Vol. 1/Issue 4/2013/0005)
All rights reserved by www.ijsrd.com 830
A. Novel Database-Centric Framework for Information
Ex- traction.
Information extraction is traditionally realized by writing
special-purpose programs for each specific extraction goal.
In this demonstration, we will illustrate a new extraction
framework, where extraction is formulated as queries on a
database that stores the parsed data. The benefits of such a
framework for information extraction include: (i)
incremental evaluation is achieved in the presence of new
extraction goals and deployment of improved processing
components; (ii) database query optimization is leveraged
for efficiency.
B. Proven Success of Information Extraction in Biomedical
Domain with Promises to General Domains
The underlying framework of the UniqIE system has been
tested on infor- mation extraction of biomedical literature
[5], and performed among the top in the BioNLP’09 shared
task on event ex- traction [4]. Our two-phase extraction
framework and query generation are not specific to only the
biomedical domain, but can be adapted to information
extraction in general domain.
C. Performing Diverse Extraction Goals without
Training Data
Typical IE systems, such as Snowball [1], adopt the
supervised learning approach that takes annotated data in
generating extraction patterns. However, training data is
scarce and it is known to be expensive to assemble. This
can limit the opportunity for a trained IE system to perform
another extraction goal. Our automated query generation
approach forms PTQL extraction queries by exploiting the
linguistic features of the top-ranked relevant results.
Without the use of training data, our approach is readily
available to extract different kinds of extraction goals. Such
approach serves diverse information needs among different
users.
1) Database Challenges. Our general framework
for two-phase information extraction opens up a lot of new
opportunities and challenges for data management research.
D. Languages for Information Extraction
The parse tree database is complex, and extraction patterns
involve traversals of paths in constituent trees, as well as
links and link types between node pairs. Without user-
defined functions, existing query languages fail to specify
required extraction patterns due to missing axes (XPath,
XQuery) or unable to traverse linkages as a first class citizen
(XPath, XQuery, LPath [2]). The design of query languages
for information extraction on parsed documents demands
investigation.
1) Optimization Challenges for Query Optimization on
Large- scale Data. UniqIE handles 1.5 terabytes of parsed
text data. Thus efficiency and scalability are essential
elements of the system. During prototyping, we found that
directly evaluating SQL queries translated from PTQL
queries was very slow due to the complexity of the
extraction patterns. In UniqIE, we significantly improved the
efficiency by leveraging keyword- based queries for pruning.
However, further query optimization is essential to handle
cases when only a small number of sentences can be filtered
by keyword-based queries.
2) Automated Query Generation. Query generation is
critical so that casual users can specify their information
needs without learning a query language, although our
current attempts of automated query generation already show
promises, many further technical challenges need to be
addressed. For instance, how to strike the balance of
precision and recall when generating PTQL queries that may
generalize the linguistic tree patterns in relevant sentences?
How to estimate the “quality” of the generated PTQL queries
before the execution?
UniqIE presents our attempts in providing a versatile
approach for information extraction. The elegance of our
approach is that unlike typical extraction frameworks, intro-
ducing new knowledge in our framework does not require
the reprocessing of all modules. Simple SQL insert
statements can be issued to store the new entities in PTDB.
We believe that studying fundamental database management
issues on information extraction – a well-known important
problem – opens up a lot of new opportunities and
challenges.
REFERENCES
[1] E. Agichtein and L. Gravano Snowball: extracting
relations from large plain-text collections. In ACM Digital
libraries, 2000.
[2] S. Bird, Y. Chen, et. al. Designing and Evaluating
an XPathDialect for Linguistic Queries. In ICDE, 2006.
[3] D. Grinberg, et. al.. A Robust Parsing Algorithm For
LINK Grammars. CMU-CS-TR-95-125, Pittsburgh, PA,
1995.
[4] J. Hakenberg, et. al. Molecular event extraction
from Link Grammar parse trees. In Proc. of BioNLP’09,
2009.
[5] L. Tari, et. al.. Querying parse tree database of
Medline text to synthesize user-specific biomolecular
networks. In PSB’09, 2009.
[6] P. H. Tu, et. al.. Generalized text extraction from
molecular biology text using parse tree database
querying. TR-08-004, Arizona State University, 2008.
[7] S. Warren, et. al. Declarative IE using datalog with
embedded extraction predicates. In VLDB ’07, 2007.