The widespread adoption of the World-Wide Web (the Web) has created challenges both for society as a whole and for the technology used to build and maintain the Web. The ongoing struggle of information retrieval systems is to wade through this vast pile of data and satisfy users by presenting them with information that most adequately it’s their needs. On a societal level, the Web is expanding faster than we can comprehend its implications or develop rules for its use. The ubiquitous use of the Web has raised important social concerns in the areas of privacy, censorship, and access to information. On a technical level, the novelty of the Web and the pace of its growth have created challenges not only in the development of new applications that realize the power of the Web, but also in the technology needed to scale applications to accommodate the resulting large data sets and heavy loads. This thesis presents searching algorithms and hierarchical classification techniques for increasing a search service's understanding of web queries. Existing search services rely solely on a query's occurrence in the document collection to locate relevant documents. They typically do not perform any task or topic-based analysis of queries using other available resources, and do not leverage changes in user query patterns over time. Provided within are a set of techniques and metrics for performing temporal analysis on query logs. Our log analyses are shown to be reasonable and informative, and can be used to detect changing trends and patterns in the query stream, thus providing valuable data to a search service.
Context Driven Technique for Document ClassificationIDES Editor
In this paper we present an innovative hybrid Text
Classification (TC) system that bridges the gap between
statistical and context based techniques. Our algorithm
harnesses contextual information at two stages. First it extracts
a cohesive set of keywords for each category by using lexical
references, implicit context as derived from LSA and wordvicinity
driven semantics. And secondly, each document is
represented by a set of context rich features whose values are
derived by considering both lexical cohesion as well as the extent
of coverage of salient concepts via lexical chaining. After
keywords are extracted, a subset of the input documents is
apportioned as training set. Its members are assigned categories
based on their keyword representation. These labeled
documents are used to train binary SVM classifiers, one for
each category. The remaining documents are supplied to the
trained classifiers in the form of their context-enhanced feature
vectors. Each document is finally ascribed its appropriate
category by an SVM classifier.
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
-In this study a comprehensive evaluation of two
supervised feature selection methods for dimensionality
reduction is performed - Latent Semantic Indexing (LSI) and
Principal Component Analysis (PCA). This is gauged against
unsupervised techniques like fuzzy feature clustering using
hard fuzzy C-means (FCM) . The main objective of the study is
to estimate the relative efficiency of two supervised techniques
against unsupervised fuzzy techniques while reducing the
feature space. It is found that clustering using FCM leads to
better accuracy in classifying documents in the face of
evolutionary algorithms like LSI and PCA. Results show that
the clustering of features improves the accuracy of document
classification
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
Classifying web users in a personalised search setup is cumbersome due the very nature of dynamism in
user browsing history. This fluctuating nature of user behaviour and user interest shall be well interpreted
within a fuzzy setting. Prior to analysing user behaviour, nature of user interests has to be collected. This
work proposes a fuzzy based user classification model to suit a personalised web search environment. The
user browsing data is collected using an established customised browser designed to suit personalisation.
The data are fuzzified and fuzzy rules are generated by applying decision trees. Using fuzzy rules, the
search pages are labelled to aid grouping of user search interests. Evaluation of the proposed approach
proves to be better when compared with Bayesian classifier.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Context Driven Technique for Document ClassificationIDES Editor
In this paper we present an innovative hybrid Text
Classification (TC) system that bridges the gap between
statistical and context based techniques. Our algorithm
harnesses contextual information at two stages. First it extracts
a cohesive set of keywords for each category by using lexical
references, implicit context as derived from LSA and wordvicinity
driven semantics. And secondly, each document is
represented by a set of context rich features whose values are
derived by considering both lexical cohesion as well as the extent
of coverage of salient concepts via lexical chaining. After
keywords are extracted, a subset of the input documents is
apportioned as training set. Its members are assigned categories
based on their keyword representation. These labeled
documents are used to train binary SVM classifiers, one for
each category. The remaining documents are supplied to the
trained classifiers in the form of their context-enhanced feature
vectors. Each document is finally ascribed its appropriate
category by an SVM classifier.
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
-In this study a comprehensive evaluation of two
supervised feature selection methods for dimensionality
reduction is performed - Latent Semantic Indexing (LSI) and
Principal Component Analysis (PCA). This is gauged against
unsupervised techniques like fuzzy feature clustering using
hard fuzzy C-means (FCM) . The main objective of the study is
to estimate the relative efficiency of two supervised techniques
against unsupervised fuzzy techniques while reducing the
feature space. It is found that clustering using FCM leads to
better accuracy in classifying documents in the face of
evolutionary algorithms like LSI and PCA. Results show that
the clustering of features improves the accuracy of document
classification
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
Classifying web users in a personalised search setup is cumbersome due the very nature of dynamism in
user browsing history. This fluctuating nature of user behaviour and user interest shall be well interpreted
within a fuzzy setting. Prior to analysing user behaviour, nature of user interests has to be collected. This
work proposes a fuzzy based user classification model to suit a personalised web search environment. The
user browsing data is collected using an established customised browser designed to suit personalisation.
The data are fuzzified and fuzzy rules are generated by applying decision trees. Using fuzzy rules, the
search pages are labelled to aid grouping of user search interests. Evaluation of the proposed approach
proves to be better when compared with Bayesian classifier.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
NATURE: A TOOL RESULTING FROM THE UNION OF ARTIFICIAL INTELLIGENCE AND NATURA...ijaia
This paper presents the final results of the research project that aimed for the construction of a tool which
is aided by Artificial Intelligence through an Ontology with a model trained with Machine Learning, and is
aided by Natural Language Processing to support the semantic search of research projects of the Research
System of the University of Nariño. For the construction of NATURE, as this tool is called, a methodology
was used that includes the following stages: appropriation of knowledge, installation and configuration of
tools, libraries and technologies, collection, extraction and preparation of research projects, design and
development of the tool. The main results of the work were three: a) the complete construction of the
Ontology with classes, object properties (predicates), data properties (attributes) and individuals
(instances) in Protegé, SPARQL queries with Apache Jena Fuseki and the respective coding with
Owlready2 using Jupyter Notebook with Python within the virtual environment of anaconda; b) the
successful training of the model for which Machine Learning algorithms were used and specifically
Natural Language Processing algorithms such as: SpaCy, NLTK, Word2vec and Doc2vec, this was also
performed in Jupyter Notebook with Python within the virtual environment of anaconda and with
Elasticsearch; and c) the creation of NATURE by managing and unifying the queries for the Ontology and
for the Machine Learning model. The tests showed that NATURE was successful in all the searches that
were performed as its results were satisfactory
Survey of Machine Learning Techniques in Textual Document ClassificationIOSR Journals
Classification of Text Document points towards associating one or more predefined categories based
on the likelihood expressed by the training set of labeled documents. Many machine learning algorithms plays
an important role in training the system with predefined categories. The importance of Machine learning
approach has felt because of which the study has been taken up for text document classification based on the
statistical event models available. The aim of this paper is to present the important techniques and
methodologies that are employed for text documents classification, at the same time making awareness of some
of the interesting challenges that remain to be solved, focused mainly on text representation and machine
learning techniques.
Contextual model of recommending resources on an academic networking portalcsandit
Artificial Intelligence techniques have been instrumental in helping users to handle the large
amount of information on the Internet. The idea of recommendation systems, custom search
engines, and intelligent software has been widely accepted among users who seek assistance in
searching, sorting, classifying, filtering and sharing this vast quantity of information. In this
paper, we present a contextual model of recommendation engine which keeping in mind the
context and activities of a user, recommends resources in an academic networking portal. The
proposed method uses the implicit method of feedback and the concepts relationship hierarchy
to determine the similarity between a user and the resources in the portal. The proposed
algorithm has been tested on an academic networking portal and the results are convincing.
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTALcscpconf
Artificial Intelligence techniques have been instrumental in helping users to handle the large amount of information on the Internet. The idea of recommendation systems, custom search engines, and intelligent software has been widely accepted among users who seek assistance insearching, sorting, classifying, filtering and sharing this vast quantity of information. In thispaper, we present a contextual model of recommendation engine which keeping in mind the context and activities of a user, recommends resources in an academic networking portal. Theproposed method uses the implicit method of feedback and the concepts relationship hierarchy to determine the similarity between a user and the resources in the portal. The proposed algorithm has been tested on an academic networking portal and the results are convincing
Integrated expert recommendation model for online communitiesst02IJwest
Online communities have become vital places for Web 2.0 users to share knowledg
e and experiences.
Recently, finding expertise user in community has become an important research issue. This paper
proposes a novel cascaded model for expert recommendation using aggregated knowledge extracted from
enormous contents and social network fe
atures. Vector space model is used to compute the relevance of
published content with respect
to a specific query while PageRank
algorithm is applied to rank candidate
experts. The experimental results sho
w that the proposed model is
an effective recommen
dation which can
guarantee that the most candidate experts are both highly relevant to the specific queries and highly
influential in corresponding areas
UML MODELING AND SYSTEM ARCHITECTURE FOR AGENT BASED INFORMATION RETRIEVALijcsit
In this current technological era, there is an enormous increase in the information available on web and
also in the online databases. This information abundance increases the complexity of finding relevant
information. To solve such challenges, there is a need for improved and intelligent systems for efficient
search and retrieval. Intelligent Agents can be used for better search and information retrieval in a
document collection. The information required by a user is scattered in a large number of databases. In this
paper, the object oriented modeling for agent based information retrieval system is presented. The paper
also discusses the framework of agent architecture for obtaining the best combination terms that serve as
an input query to the information retrieval system. The communication and cooperation among the agents
are also explained. Each agent has a task to perform in information retrieval.
Intelligent Semantic Web Search Engines: A Brief Survey dannyijwest
The World Wide Web (WWW) allows the people to share the information (data) from the large database repositories globally. The amount of information grows billions of databases. We need to search the information will specialize tools known generically search engine. There are many of search engines available today, retrieving meaningful information is difficult. However to overcome this problem in search engines to retrieve meaningful information intelligently, semantic web technologies are playing a major role. In this paper we present survey on the search engine generations and the role of search engines in intelligent web and semantic search technologies.
An effective search on web log from most popular downloaded contentijdpsjournal
A Web page recommender system effectively predicts the best related web page to search. While search
ing
a word from search engine it may display some unnecessary links and unrelated data’s to user so to a
void
this problem, the con
ceptual prediction model combines both the web usage and domain knowledge. The
proposed conceptual prediction model automatically generates a semantic network of the semantic Web
usage knowledge, which is the integration of domain knowledge and web usage i
nformation. Web usage
mining aims to discover interesting and frequent user access patterns from web browsing data. The
discovered knowledge can then be used for many practical web applications such as web
recommendations, adaptive web sites, and personali
zed web search and surfing
Annotation Approach for Document with Recommendation ijmpict
An enormous number of organizations generate and share textual descriptions of their products, facilities, and activities. Such collections of textual data comprise a significant amount of controlled information, which residues buried in the unstructured text. Whereas information extraction systems simplify the extraction of structured associations, they are frequently expensive and incorrect, particularly when working on top of text that does not comprise any examples of the targeted structured data. Projected an alternative methodology that simplifies the structured metadata generation by recognizing documents that are possible to contain information of awareness and this data will be beneficial for querying the database. Moreover, we intend algorithms to extract attribute-value pairs, and similarly devise new mechanisms to map such pairs to manually created schemes. We apply clustering technique to the item content information to complement the user rating information, which improves the correctness of collaborative similarity, and solves the cold start problem.
PRE-RANKING DOCUMENTS VALORIZATION IN THE INFORMATION RETRIEVAL PROCESScsandit
In this short paper we present three methods to valorise score relevance of some documents
basing on their characteristics in order to enhance their ranking. Our framework is an
information retrieval system dedicated to children. The valorisation methods aim to increase the
relevance score of some documents by an additional value which is proportional to the number
of multimedia objects included, the number of objects linked to the user particulars and the
included topics. All of the three valorization methods use fuzzy rules to identify the valorization
value.
NATURE: A TOOL RESULTING FROM THE UNION OF ARTIFICIAL INTELLIGENCE AND NATURA...ijaia
This paper presents the final results of the research project that aimed for the construction of a tool which
is aided by Artificial Intelligence through an Ontology with a model trained with Machine Learning, and is
aided by Natural Language Processing to support the semantic search of research projects of the Research
System of the University of Nariño. For the construction of NATURE, as this tool is called, a methodology
was used that includes the following stages: appropriation of knowledge, installation and configuration of
tools, libraries and technologies, collection, extraction and preparation of research projects, design and
development of the tool. The main results of the work were three: a) the complete construction of the
Ontology with classes, object properties (predicates), data properties (attributes) and individuals
(instances) in Protegé, SPARQL queries with Apache Jena Fuseki and the respective coding with
Owlready2 using Jupyter Notebook with Python within the virtual environment of anaconda; b) the
successful training of the model for which Machine Learning algorithms were used and specifically
Natural Language Processing algorithms such as: SpaCy, NLTK, Word2vec and Doc2vec, this was also
performed in Jupyter Notebook with Python within the virtual environment of anaconda and with
Elasticsearch; and c) the creation of NATURE by managing and unifying the queries for the Ontology and
for the Machine Learning model. The tests showed that NATURE was successful in all the searches that
were performed as its results were satisfactory
Survey of Machine Learning Techniques in Textual Document ClassificationIOSR Journals
Classification of Text Document points towards associating one or more predefined categories based
on the likelihood expressed by the training set of labeled documents. Many machine learning algorithms plays
an important role in training the system with predefined categories. The importance of Machine learning
approach has felt because of which the study has been taken up for text document classification based on the
statistical event models available. The aim of this paper is to present the important techniques and
methodologies that are employed for text documents classification, at the same time making awareness of some
of the interesting challenges that remain to be solved, focused mainly on text representation and machine
learning techniques.
Contextual model of recommending resources on an academic networking portalcsandit
Artificial Intelligence techniques have been instrumental in helping users to handle the large
amount of information on the Internet. The idea of recommendation systems, custom search
engines, and intelligent software has been widely accepted among users who seek assistance in
searching, sorting, classifying, filtering and sharing this vast quantity of information. In this
paper, we present a contextual model of recommendation engine which keeping in mind the
context and activities of a user, recommends resources in an academic networking portal. The
proposed method uses the implicit method of feedback and the concepts relationship hierarchy
to determine the similarity between a user and the resources in the portal. The proposed
algorithm has been tested on an academic networking portal and the results are convincing.
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTALcscpconf
Artificial Intelligence techniques have been instrumental in helping users to handle the large amount of information on the Internet. The idea of recommendation systems, custom search engines, and intelligent software has been widely accepted among users who seek assistance insearching, sorting, classifying, filtering and sharing this vast quantity of information. In thispaper, we present a contextual model of recommendation engine which keeping in mind the context and activities of a user, recommends resources in an academic networking portal. Theproposed method uses the implicit method of feedback and the concepts relationship hierarchy to determine the similarity between a user and the resources in the portal. The proposed algorithm has been tested on an academic networking portal and the results are convincing
Integrated expert recommendation model for online communitiesst02IJwest
Online communities have become vital places for Web 2.0 users to share knowledg
e and experiences.
Recently, finding expertise user in community has become an important research issue. This paper
proposes a novel cascaded model for expert recommendation using aggregated knowledge extracted from
enormous contents and social network fe
atures. Vector space model is used to compute the relevance of
published content with respect
to a specific query while PageRank
algorithm is applied to rank candidate
experts. The experimental results sho
w that the proposed model is
an effective recommen
dation which can
guarantee that the most candidate experts are both highly relevant to the specific queries and highly
influential in corresponding areas
UML MODELING AND SYSTEM ARCHITECTURE FOR AGENT BASED INFORMATION RETRIEVALijcsit
In this current technological era, there is an enormous increase in the information available on web and
also in the online databases. This information abundance increases the complexity of finding relevant
information. To solve such challenges, there is a need for improved and intelligent systems for efficient
search and retrieval. Intelligent Agents can be used for better search and information retrieval in a
document collection. The information required by a user is scattered in a large number of databases. In this
paper, the object oriented modeling for agent based information retrieval system is presented. The paper
also discusses the framework of agent architecture for obtaining the best combination terms that serve as
an input query to the information retrieval system. The communication and cooperation among the agents
are also explained. Each agent has a task to perform in information retrieval.
Intelligent Semantic Web Search Engines: A Brief Survey dannyijwest
The World Wide Web (WWW) allows the people to share the information (data) from the large database repositories globally. The amount of information grows billions of databases. We need to search the information will specialize tools known generically search engine. There are many of search engines available today, retrieving meaningful information is difficult. However to overcome this problem in search engines to retrieve meaningful information intelligently, semantic web technologies are playing a major role. In this paper we present survey on the search engine generations and the role of search engines in intelligent web and semantic search technologies.
An effective search on web log from most popular downloaded contentijdpsjournal
A Web page recommender system effectively predicts the best related web page to search. While search
ing
a word from search engine it may display some unnecessary links and unrelated data’s to user so to a
void
this problem, the con
ceptual prediction model combines both the web usage and domain knowledge. The
proposed conceptual prediction model automatically generates a semantic network of the semantic Web
usage knowledge, which is the integration of domain knowledge and web usage i
nformation. Web usage
mining aims to discover interesting and frequent user access patterns from web browsing data. The
discovered knowledge can then be used for many practical web applications such as web
recommendations, adaptive web sites, and personali
zed web search and surfing
Annotation Approach for Document with Recommendation ijmpict
An enormous number of organizations generate and share textual descriptions of their products, facilities, and activities. Such collections of textual data comprise a significant amount of controlled information, which residues buried in the unstructured text. Whereas information extraction systems simplify the extraction of structured associations, they are frequently expensive and incorrect, particularly when working on top of text that does not comprise any examples of the targeted structured data. Projected an alternative methodology that simplifies the structured metadata generation by recognizing documents that are possible to contain information of awareness and this data will be beneficial for querying the database. Moreover, we intend algorithms to extract attribute-value pairs, and similarly devise new mechanisms to map such pairs to manually created schemes. We apply clustering technique to the item content information to complement the user rating information, which improves the correctness of collaborative similarity, and solves the cold start problem.
PRE-RANKING DOCUMENTS VALORIZATION IN THE INFORMATION RETRIEVAL PROCESScsandit
In this short paper we present three methods to valorise score relevance of some documents
basing on their characteristics in order to enhance their ranking. Our framework is an
information retrieval system dedicated to children. The valorisation methods aim to increase the
relevance score of some documents by an additional value which is proportional to the number
of multimedia objects included, the number of objects linked to the user particulars and the
included topics. All of the three valorization methods use fuzzy rules to identify the valorization
value.
Effective Performance of Information Retrieval on Web by Using Web Crawling dannyijwest
World Wide Web consists of more than 50 billion pages online. It is highly dynamic [6] i.e. the web
continuously introduces new capabilities and attracts many people. Due to this explosion in size, the
effective information retrieval system or search engine can be used to access the information. In this paper
we have proposed the EPOW (Effective Performance of WebCrawler) architecture. It is a software agent
whose main objective is to minimize the overload of a user locating needed information. We have designed
the web crawler by considering the parallelization policy. Since our EPOW crawler has a highly optimized
system it can download a large number of pages per second while being robust against crashes. We have
also proposed to use the data structure concepts for implementation of scheduler & circular Queue to
improve the performance of our web crawler. (Abstract)
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
Abstract: In many fields, such as industry, commerce, government, and education, knowledge discovery and data
mining can be immensely valuable to the subject of Artificial Intelligence. Because of the recent increase in
demand for KDD techniques, such as those used in machine learning, databases, statistics, knowledge acquisition,
data visualisation, and high performance computing, knowledge discovery and data mining have grown in
importance. By employing standard formulas for computational correlations, we hope to create an integrated
technique that can be used to filter web world social information and find parallels between similar tastes of
diverse user information in a variety of settings
Web personalization using clustering of web usage dataijfcstjournal
The exponential growth in the number and the complexity of information resources and services on the Web
has made log data an indispensable resource to characterize the users for Web-based environment. It
creates information of related web data in the form of hierarchy structure through approximation. This
hierarchy structure can be used as the input for a variety of data mining tasks such as clustering,
association rule mining, sequence mining etc.
In this paper, we present an approach for personalizing web user environment dynamically when he
interacting with web by clustering of web usage data using concept hierarchy. The system is inferred from
the web server’s access logs by means of data and web usage mining techniques to extract the information
about users. The extracted knowledge is used for the purpose of offering a personalized view of the
services to users.
`A Survey on approaches of Web Mining in Varied Areasinventionjournals
There has been lot of research in recent years for efficient web searching. Several papers have proposed algorithm for user feedback sessions, to evaluate the performance of inferring user search goals. When the information is retrieved, user clicks on a particular URL. Based on the click rate, ranking will be done automatically, clustering the feedback sessions. Web search engines have made enormous contributions to the web and society. They make finding information on the web quick and easy. However, they are far from optimal. A major deficiency of generic search engines is that they follow the ‘‘one size fits all’’ model and are not adaptable to individual users.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
The activity of finding significant data identified with a particular subject is troublesome in web because of the immensity of web information. This situation makes website streamlining strategies into an irreplaceable technique according to analysts, academicians, and industrialists. Inquiry history investigation is the definite examination of web information from various clients with the end goal of comprehension and upgrading web taking care of. Inquiry log or client seek history incorporates clients' beforehand submitted inquiries and their comparing clicked reports or locales' URLs. Accordingly question log investigation is considered as the most utilized technique for improving the clients' pursuit encounter. The proposed strategy investigates and groups client scan histories with the end goal of website streamlining. In this approach, the issue of getting sorted out clients' verifiable questions into bunches in a dynamic and robotized design is examined. The consequently arranged inquiry gatherings will help in various website streamlining systems like question proposal, item re-positioning, question adjustments and so on. The proposed strategy considers a question aggregate as an accumulation of inquiries together with the comparing set of clicked URLs that are identified with each other around a general data require. This technique proposes another strategy for joining word likeness measures alongside report similitude measures to frame a consolidated comparability measure. In the proposed strategy other question importance measures, for example, inquiry reformulation and clicked URL idea are likewise considered. Assessment comes about show how the proposed technique outflanks existing strategies.
IJRET : International Journal of Research in Engineering and TechnologyImprov...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...IAEME Publication
In today ’s global business, the web has been the most important means of communication. Clients and customers may find their products online, which is a benefit of doing business online. Web mining is the process of using data mining tools to analyse and extract the information from a Web pages and applications autonomously. Many firms use web structure mining to generate suitable predictions and judgments for business growth, productivity, manufacturing techniques, and more utilizing data mining business strategies. In the online booking domain, optimum web data mining analysis of web structure is a crucial component that gives a systematic manner of new application towards real-time data with various levels of implications. Web structure mining emphases on the construction of the web's hyperlinks. Linkage administration that is done correctly can lead to future connections, which can therefore increase the prediction performance of learnt models. A increased interest in Web mining, structural analysis research has expanded, resulting in a new research area that sits at the crossroads of work in the network analysis, hyperlink and the web mining, structural training, and empirical software design techniques, as well as graph mining. Web structure mining is the development of determining structure data from the web. The proposed WSM approach is a system of finding the structure of data stored over the Web. Web structure mining can encourage the clients to recover the significant records by breaking down the connection situated structure of Web content. Web structure mining has been one of the most important resources for information extraction and the knowledge discovery as the amount of data available online has increased.
MULTIMODAL COURSE DESIGN AND IMPLEMENTATION USING LEML AND LMS FOR INSTRUCTIO...IJMIT JOURNAL
Traditionally, teaching has been centered around classroom delivery. However, the onslaught of the
COVID-19 pandemic has cultivated usage of technology, teaching, and learning methodologies for course
delivery. We investigate and describe different modes of course delivery that maintain the integrity of
teaching and learning. This paper answers to the research questions: 1) What course delivery method our
academic institutions use and why? 2) How can instructors validate the guidelines of the institutions? 3)
How courses should be taught to provide student learning outcomes? Using the Learning Environment
Modeling Language (LEML), we investigate the design and implementation of courses for delivery in the
following environments: face-to-face, online synchronous, asynchronous, hybrid, and hyflex. A good
course design and implementation are key components of instructional alignment. Furthermore, we
demonstrate how to design, implement, and deliver courses in synchronous, asynchronous, and hybrid
modes and describe our proposed enhancements to LEML.
Novel R&D Capabilities as a Response to ESG Risks-Lessons From Amazon’s Fusio...IJMIT JOURNAL
Environmental, Social, and Governance (ESG) management is essential for transforming corporate
financial performance-oriented business strategies into Finance (F) + ESG optimization strategies to
achieve the Sustainable Development Goals (SDGs).
In this trend, the rise of ESG risks has divided firms into two categories. Former incorporates a growthmindset that creates a passion for learning, and urges it to improve itself by endeavoring Research and
development (R&D) -driven challenges, while the other category, characterized by risk aversion, avoids
challenging highly uncertain R&D activities and seeks more manageable endeavors.
This duality underscores the complexity of corporate R&D strategies in addressing ESG risks and
necessitates the development of novel R&D capabilities for corporate R&D transformation strategies
towards F + ESG optimization.
International Journal of Managing Information Technology (IJMIT) ** WJCI IndexedIJMIT JOURNAL
The International Journal of Managing Information Technology (IJMIT) is a quarterly open access peer-reviewed journal that publishes articles that contribute new results in all areas of the strategic application of information technology (IT) in organizations. The journal focuses on innovative ideas and best practices in using IT to advance organizations – for-profit, non-profit, and governmental. The goal of this journal is to bring together researchers and practitioners from academia, government, and industry to focus on understanding both how to use IT to support the strategy and goals of the organization and to employ IT in new ways to foster greater collaboration, communication, and information sharing both within the organization and with its stakeholders. The International Journal of Managing Information Technology seeks to establish new collaborations, new best practices, and new theories in these areas.
International Journal of Managing Information Technology (IJMIT) ** WJCI IndexedIJMIT JOURNAL
The International Journal of Managing Information Technology (IJMIT) is a quarterly open access peer-reviewed journal that publishes articles that contribute new results in all areas of the strategic application of information technology (IT) in organizations. The journal focuses on innovative ideas and best practices in using IT to advance organizations – for-profit, non-profit, and governmental. The goal of this journal is to bring together researchers and practitioners from academia, government, and industry to focus on understanding both how to use IT to support the strategy and goals of the organization and to employ IT in new ways to foster greater collaboration, communication, and information sharing both within the organization and with its stakeholders. The International Journal of Managing Information Technology seeks to establish new collaborations, new best practices, and new theories in these areas.
NOVEL R & D CAPABILITIES AS A RESPONSE TO ESG RISKS- LESSONS FROM AMAZON’S FU...IJMIT JOURNAL
Environmental, Social, and Governance (ESG) management is essential for transforming corporate
financial performance-oriented business strategies into Finance (F) + ESG optimization strategies to
achieve the Sustainable Development Goals (SDGs).
In this trend, the rise of ESG risks has divided firms into two categories. Former incorporates a growthmindset that creates a passion for learning, and urges it to improve itself by endeavoring Research and
development (R&D) -driven challenges, while the other category, characterized by risk aversion, avoids
challenging highly uncertain R&D activities and seeks more manageable endeavors.
This duality underscores the complexity of corporate R&D strategies in addressing ESG risks and
necessitates the development of novel R&D capabilities for corporate R&D transformation strategies
towards F + ESG optimization.
Building on this premise, this paper conducts an empirical analysis, utilizing reliable firms data on ESG
risk and brand value, with a focus on 100 global R&D leader firms. It analyzes R&D and actions for ESG
risk mitigation, and assesses the development of new functions that fulfill F + ESG optimization through
R&D. The analysis also highlights the significance of network externality effects, with a specific focus on
Amazon, a leading R&D company, providing insights into the direction for transforming R&D strategies
towards F + ESG optimization.
The dynamics of stakeholder engagement in F + ESG optimization are indicated with the example of
amazon's activities. Through the analysis, it became evident that Amazon's capacity encompassing growth
and scalability, specifically its ability to grow and expand, is accelerating high-level research and
development by gaining the trust of stakeholders in the "synergy through R&D-driven ESG risk
mitigation."
Finally, as examples of these initiatives, the paper discussed the Climate Pledge led by Amazon and the
transformation of Japan's management system.
A REVIEW OF STOCK TREND PREDICTION WITH COMBINATION OF EFFECTIVE MULTI TECHNI...IJMIT JOURNAL
It is important for investors to understand stock trends and market conditions before trading stocks. Both
these capabilities are very important for an investor in order to obtain maximized profit and minimized
losses. Without this capability, investors will suffer losses due to their ignorance regarding stock trends
and market conditions. Technical analysis helps to understand stock prices behavior with regards to past
trends, the signals given by indicators and the major turning points of the market price. This paper reviews
the stock trend predictions with a combination of the effective multi technical indicator strategy to increase
investment performance by taking into account the global performance and the proposed combination of
effective multi technical indicator strategy model.
INTRUSION DETECTION SYSTEM USING CUSTOMIZED RULES FOR SNORTIJMIT JOURNAL
These days the security provided by the computer systems is a big issue as it always has the threats of
cyber-attacks like IP address spoofing, Denial of Service (DOS), token impersonation, etc. The security
provided by the blue team operations tends to be costly if done in large firms as a large number of systems
need to be protected against these attacks. This leads these firms to turn to less costly security
configurations like IDS Suricata and IDS Snort. The main theme of the project is to improve the services
provided by Snort which is a tool used in creating a vague defense against cyber-attacks like DDOS
attacks which are done on both physical and network layers. These attacks in turn result in loss of
extremely important data. The rules defined in this project will result in monitoring traffic, analyzing it,
and taking appropriate action to not only stop the attack but also locate its source IP address. This whole
process uses different tools other than Snort like Wireshark, Wazuh and Splunk. The product of this will
result in not only the detection of the attack but also the source IP address of the machine on which the
attack is initiated and completed. The end product of this research will result in sets of default rules for the
Snort tool which will not only be able to provide better security than its previous versions but also be able
to provide the user with the IP address of the attacker or the person conducting the attack. The system
involves the integration of Wazuh with Snort tool in order to make it more efficient than IDS Suricata
which is another intrusion detection system capable of detecting all these types of attacks as mentioned.
Splunk is another tool used in this project which increases the firewall efficiency to pass the no. of bits to
be scanned and the no. of bits scanned successfully. Wazuh is used in this system as it is the best choice for
traffic monitoring and incident response than any other of its alternatives in the market. Since this system
is used in firms which are known to handle big amounts of data and for this purpose, we use Splunk tool as
it is very efficient in handling big amounts of data. Wireshark is used in this system in order to give the IDS
automation in its capability to capture and report the malicious packets found during the network scan. All
of this gives the IDS a capability of a low budget automated threat detection system. This paper gives
complete guidelines for authors submitting papers for the AIRCC Journals.
Artificial Intelligence (AI) has rapidly become a critical technology for businesses seeking to improve
efficiency and profitability. One area where AI is proving particularly impactful is in service operations
management, where it is used to create AI-powered service operations (AIServiceOps) that deliver highvalue services to customers. AIServiceOps involve the use of AI to automate and optimize various business
processes, such as customer service, sales, marketing, and supply chain management. The rapid
development of Artificial Intelligence has prompted many changes in the field of Information Technology
(IT) Service Operations. IT Service Operations are driven by AI, i.e., AIServiceOps. AI has empowered
new vitality and addressed many challenges in IT Service Operations. However, there is a literature gap on
the Business Value Impact of Artificial intelligence (AI) Powered IT Service Operations. It can help IT
build optimized business resilience by creating value in complex and ever-changing environments as
product organizations move faster than IT can handle. So, this research paper examines how AIServiceOps
creates business value and sustainability, basically how AIServiceOps makes the IT staff liberation from a
low-level, repetitive workout and traditional IT practices for a continuously optimized process. One of the
research objectives is to compare Traditional IT Service Operations with AIServiceOPs. This paper
provides the basis for how enterprises can evaluate AIServiceOps and consider it a digital transformation
tool. The paper presents a case study of a company that implemented AI-powered service operations
(AIServiceOps) and analyzes the resulting business outcomes. The study shows that AIServiceOps can
significantly improve service delivery, reduce response times, and increase customer satisfaction.
Furthermore, it demonstrates how AIServiceOps can deliver substantial cost savings, such as reducing
labor costs and minimizing downtime.
MEDIATING AND MODERATING FACTORS AFFECTING READINESS TO IOT APPLICATIONS: THE...IJMIT JOURNAL
Although IOT seems to be the upcoming trend, it is still in its infancy; especially in the banking industry.
There is a clear gap in literature, as only few studies identify factors affecting readiness to IOT
applications in banks in general, and almost negligible investigations on mediating and moderating
factors. Accordingly, this research aims to investigate the main factors that affect employees’ readiness to
IOT applications, while highlighting the mediating and moderating factors in the Egyptian banking sector.
The importance of Egypt stems from its high population and steady steps taken towards technology
adoption. 479 valid questionnaires were distributed over HR employees in banks. Data collected was
statistically analysed using Regression and SEM. Results showed a significant impact of ‘Security’,
‘Networking’, ‘Software Development’ and ‘Regulations’ on ‘readiness to IOT applications. Thus, the
readiness acceptance level is high‘Security’ and ‘User Intention’ were proven to mediate the relationship
between research variables and readiness to IOT applications, and only a partial moderation role was
proven for ‘Efficiency’. The study contributes to increasing literature on IOT applications in general, and
fills a gap on the Egyptian banking context in particular. Finally, it provides decision makers at banks with
useful guidelines on how to optimally promote IOT applications among employees.
EFFECTIVELY CONNECT ACQUIRED TECHNOLOGY TO INNOVATION OVER A LONG PERIODIJMIT JOURNAL
IT (Information and Communication Technology) companies are facing the dilemma of decreasing
productivity despite increasing research and development efforts. M&A (Merger and Acquisition) is being
considered as a breakthrough solution. From existing research, it has been pointed out that M&A leads to
the emergence of new innovations. Purpose of this study was to discuss the efficient ways of acquisition and
to resolve the dilemma of productivity decline by clarifying how the technology obtained through M&A
leads to the creation of new innovations. Hypothesis 1 was that the technology acquired through M&A is
utilized for innovation creation, Hypothesis 2 was that the acquired technology is utilized over a long
period of time, and Hypothesis 3 was that a long-term utilization has a positive impact on corporate
performance. The results, using sports prosthetics as a case study and using patents as a proxy variable,
confirmed all the hypotheses set. We have revealed that long-term utilization of technology obtained
through M&A is effective for creating new innovations.
International Journal of Managing Information Technology (IJMIT) ** WJCI IndexedIJMIT JOURNAL
The International Journal of Managing Information Technology (IJMIT) is a quarterly open access peer-reviewed journal that publishes articles that contribute new results in all areas of the strategic application of information technology (IT) in organizations. The journal focuses on innovative ideas and best practices in using IT to advance organizations – for-profit, non-profit, and governmental. The goal of this journal is to bring together researchers and practitioners from academia, government, and industry to focus on understanding both how to use IT to support the strategy and goals of the organization and to employ IT in new ways to foster greater collaboration, communication, and information sharing both within the organization and with its stakeholders. The International Journal of Managing Information Technology seeks to establish new collaborations, new best practices, and new theories in these areas.
Authors are solicited to contribute to the journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the areas of information technology and management
4th International Conference on Cloud, Big Data and IoT (CBIoT 2023)IJMIT JOURNAL
4th International Conference on Cloud, Big Data and IoT (CBIoT 2023) will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the areas of Cloud, Big Data and IoT. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the area of Cloud, Big Data and IoT.
Authors are solicited to contribute to the conference by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Cloud, Big Data and IoT.
TRANSFORMING SERVICE OPERATIONS WITH AI: A CASE FOR BUSINESS VALUEIJMIT JOURNAL
Artificial Intelligence (AI) has rapidly become a critical technology for businesses seeking to improve
efficiency and profitability. One area where AI is proving particularly impactful is in service operations
management, where it is used to create AI-powered service operations (AIServiceOps) that deliver highvalue services to customers. AIServiceOps involve the use of AI to automate and optimize various business
processes, such as customer service, sales, marketing, and supply chain management. The rapid
development of Artificial Intelligence has prompted many changes in the field of Information Technology
(IT) Service Operations. IT Service Operations are driven by AI, i.e., AIServiceOps. AI has empowered
new vitality and addressed many challenges in IT Service Operations. However, there is a literature gap on
the Business Value Impact of Artificial intelligence (AI) Powered IT Service Operations. It can help IT
build optimized business resilience by creating value in complex and ever-changing environments as
product organizations move faster than IT can handle. So, this research paper examines how AIServiceOps
creates business value and sustainability, basically how AIServiceOps makes the IT staff liberation from a
low-level, repetitive workout and traditional IT practices for a continuously optimized process. One of the
research objectives is to compare Traditional IT Service Operations with AIServiceOPs. This paper
provides the basis for how enterprises can evaluate AIServiceOps and consider it a digital transformation
tool. The paper presents a case study of a company that implemented AI-powered service operations
(AIServiceOps) and analyzes the resulting business outcomes. The study shows that AIServiceOps can
significantly improve service delivery, reduce response times, and increase customer satisfaction.
Furthermore, it demonstrates how AIServiceOps can deliver substantial cost savings, such as reducing
labor costs and minimizing downtime.
DESIGNING A FRAMEWORK FOR ENHANCING THE ONLINE KNOWLEDGE-SHARING BEHAVIOR OF ...IJMIT JOURNAL
The main objective of this paper is to identify the factors that influence academic staff's digital knowledgesharing behaviors in Ethiopian higher education. A structural equation model was used to validate the
research framework using survey data from 210 respondents. The collected data has been analyzed using
Smart PLS software. The results of the study show that trust, self-motivation, and altruism are positively
related to attitude. Contrary to our expectations, knowledge technology negatively affects attitude.
However, reward systems and empowerment by leaders are significantly associated with knowledgesharing intentions.Knowledge-sharing intention, in turn, was significantly related to digital knowledgesharing behavior. The contributions of this study are twofold. The framework may serve as a roadmap for
future researchers and managers considering their strategy to enhance digital knowledge sharing in HEI.
The findings will benefit academic staff and university administrations.The study will also help academic
staff enhance their knowledge-sharing practices.
BUILDING RELIABLE CLOUD SYSTEMS THROUGH CHAOS ENGINEERINGIJMIT JOURNAL
Cloud computing systems need to be reliable so that they can be accessed and used for computing at any
given point in time. The complex nature of cloud systems is the motivation to conduct research in novel
ways of ensuring that cloud systems are built with reliability in mind. In building cloud systems, it is
expected that the cloud system will be able to deal with high demands and unexpected events that affect the
reliability and performance of the system.
In this paper, chaos engineering is considered a heuristic method that can be used to build reliable cloud
systems. Chaos engineering is aimed at exposing weaknesses in systems that are in production. Chaos
engineering will help identify system weaknesses and strengths when a system is exposed to unexpected
knocks and shocks while it is in production.
Chaos engineering allows system developers and administrators to get insights into how the cloud system
will behave when it is exposed to unexpected occurrences.
A REVIEW OF STOCK TREND PREDICTION WITH COMBINATION OF EFFECTIVE MULTI TECHNI...IJMIT JOURNAL
It is important for investors to understand stock trends and market conditions before trading stocks. Both
these capabilities are very important for an investor in order to obtain maximized profit and minimized
losses. Without this capability, investors will suffer losses due to their ignorance regarding stock trends
and market conditions. Technical analysis helps to understand stock prices behavior with regards to past
trends, the signals given by indicators and the major turning points of the market price. This paper reviews
the stock trend predictions with a combination of the effective multi technical indicator strategy to increase
investment performance by taking into account the global performance and the proposed combination of
effective multi technical indicator strategy model.
NETWORK MEDIA ATTENTION AND GREEN TECHNOLOGY INNOVATIONIJMIT JOURNAL
This paper will provide a novel empirical study for the relationship between network media attention and
green technology innovation and examine how network media attention can ease financing constraints. It
collected data from listed companies in China's heavy pollution industry and performed rigorous
regression analysis, in order to innovatively explore the environmental governance functions of the media.
It found that network media attention significantly promotes green technology innovation. By analyzing the
inner mechanism further, it found that network media attention can promote green innovation by easing
financing constraints. Besides, network media attention has a significant positive impact on green invention
patents while not affecting green utility model patents.
INCLUSIVE ENTREPRENEURSHIP IN HANDLING COMPETING INSTITUTIONAL LOGICS FOR DHI...IJMIT JOURNAL
Information System (IS) research advocates employing collaborative and loose coupling strategies to address contradictory issues to address diversified actors’ interests than the prescriptive and unilateral Information Technology (IT) governance mechanisms’, yet it is rarely depicting how managers employ these strategies in Health Information System (HIS) implementation, particularly in a resource-constrained setting where IS implementation activities have highly relied on multiple international organizations resources. This study explored how managers in resource-constrained settings employ collaborative IT governance mechanisms in the case of District Health Information System 2 (DHIS2) adoption with an interpretative case study approach and the institutional logic concept. The institutional logic concept was used to identify the major actors’ logics underpinning the DHIS2 adoption. The study depicted the importance of high-level officials' distance from the dominant systemic logic to consider new alternative, and to employ inclusive IT governance mechanisms which separated resource from the system that facilitated stakeholders’ collaboration in DHIS2 adoption based on their capacity and interest.
DEEP LEARNING APPROACH FOR EVENT MONITORING SYSTEMIJMIT JOURNAL
With an increasing number of extreme events and complexity, more alarms are being used to monitor
control rooms. Operators in the control rooms need to monitor and analyze these alarms to take suitable
actions to ensure the system’s stability and security. Security is the biggest concern in the modern world. It
is important to have a rigid surveillance that should guarantee protection from any sought of hazard.
Considering security, Closed Circuit TV (CCTV) cameras are being utilized for reconnaissance, but these
CCTV cameras require a person for supervision. As a human being, there can be a possibility to be tired
off in supervision at any point of time. So, we need a system to detect automatically. Thus, we came up with
a solution using YOLO V5. We have taken a data set and used robo-flow framework to enhance the existing
images into numerous variations where it will create a copy of grey scale image, a copy of its rotation and
a copy of its blurred version which will be used to get an enlarged data set. This work mainly focuses on
providing a secure environment using CCTV live footage as a source to detect the weapons. Using YOLO
algorithm, it divides an image from the video into grid system and each grid detects an object within itself
MULTIMODAL COURSE DESIGN AND IMPLEMENTATION USING LEML AND LMS FOR INSTRUCTIO...IJMIT JOURNAL
Traditionally, teaching has been centered around classroom delivery. However, the onslaught of the
COVID-19 pandemic has cultivated usage of technology, teaching, and learning methodologies for course
delivery. We investigate and describe different modes of course delivery that maintain the integrity of
teaching and learning. This paper answers to the research questions: 1) What course delivery method our
academic institutions use and why? 2) How can instructors validate the guidelines of the institutions? 3)
How courses should be taught to provide student learning outcomes? Using the Learning Environment
Modeling Language (LEML), we investigate the design and implementation of courses for delivery in the
following environments: face-to-face, online synchronous, asynchronous, hybrid, and hyflex. A good
course design and implementation are key components of instructional alignment. Furthermore, we
demonstrate how to design, implement, and deliver courses in synchronous, asynchronous, and hybrid
modes and describe our proposed enhancements to LEML.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Classification-based Retrieval Methods to Enhance Information Discovery on the Web
1. International Journal of Managing Information Technology (IJMIT) Vol.3, No.1, February 2011
DOI : 10.5121/ijmit.2011.3103 33
Classification-based Retrieval Methods to
Enhance Information Discovery on the Web
Yogendra Kumar Jain1
and Sandeep Wadekar2
1
Head of Department, Computer Science & Engineering
Samrat Ashok Technological Institute Vidisha, M.P., India
ykjain_p@yahoo.co.in
2
Research Scholar, Computer Science & Engineering
Samrat Ashok Technological Institute Vidisha, M.P., India
sandy12dec@gmail.com
ABSTRACT
The widespread adoption of the World-Wide Web (the Web) has created challenges both for society as a
whole and for the technology used to build and maintain the Web. The ongoing struggle of information
retrieval systems is to wade through this vast pile of data and satisfy users by presenting them with
information that most adequately it’s their needs. On a societal level, the Web is expanding faster than we
can comprehend its implications or develop rules for its use. The ubiquitous use of the Web has raised
important social concerns in the areas of privacy, censorship, and access to information. On a technical
level, the novelty of the Web and the pace of its growth have created challenges not only in the
development of new applications that realize the power of the Web, but also in the technology needed to
scale applications to accommodate the resulting large data sets and heavy loads. This thesis presents
searching algorithms and hierarchical classification techniques for increasing a search service's
understanding of web queries. Existing search services rely solely on a query's occurrence in the
document collection to locate relevant documents. They typically do not perform any task or topic-based
analysis of queries using other available resources, and do not leverage changes in user query patterns
over time. Provided within are a set of techniques and metrics for performing temporal analysis on query
logs. Our log analyses are shown to be reasonable and informative, and can be used to detect changing
trends and patterns in the query stream, thus providing valuable data to a search service.
KEYWORDS
Knowledge Discovery, Classification Method, Search Engine Technique
1. INTRODUCTION
The widespread adoption of the World-Wide Web (the Web) has created challenges both for
society as a whole and for the technology used to build and maintain the Web. For many years,
information retrieval research focused mainly on the problem of ad-hoc document retrieval, a
topical search task that assumes all queries are meant to express a broad request for information
on a topic identified by the query. This task is exemplified by the early TREC conferences,
where the ad-hoc document retrieval track was prominent [2]. In recent years, particularly since
the popular advent of the World Wide Web and E-commerce, IR researchers have begun to
expand their efforts to understand the nature of the information need that users express in their
queries. The unprecedented growth of available data coupled with the vast number of available
online activities has introduced a new wrinkle to the problem of search; it is now important to
attempt to determine not only what the user is looking for, but also the task they are trying to
accomplish and the method by which they would prefer to accomplish it. In addition, all users
2. International Journal of Managing Information Technology (IJMIT) Vol.3, No.1, February 2011
34
are not created equal; different users may use different terms to describe similar information
needs; the concept of what is relevant" to a user has only become more and more unclear as the
web has matured and more diverse data have become available [3, 5]. Because of this, it is of
key interest to search services to discover sets of identifying features that an information
retrieval system can use to associate a specific user query with a broader information need. All
of these concerns fall into the general area of query understanding. The central idea is that there
is more information present in a user query than simply the topic of focus, and that harnessing
this information can lead to the development of more effective and efficient information
retrieval systems. Existing search engines focus mainly on basic term-based techniques for
general search, and do not attempt query understanding. This thesis addresses these
shortcomings by presenting a pair of novel techniques for improving query understanding [1]
and [4] and [15].
2. BACKGROUND AND RELATED WORK
A log of billions of web queries, that constitutes the total query traffic for a six-month period of
a general-purpose commercial web search service. Previously, query logs were studied from a
single, cumulative view. Understanding how queries change over time is critical to developing
effective, efficient web search services. The web is a dynamic, uncooperative environment with
several issues that make analysis of web search very difficult. These include:
1. The web is a dynamic collection: its data, users, search engines, and popular queries are
constantly changing [4, 6].
2. Typical web search engine traffic consists of many hundreds of millions of queries per
day and is highly diverse and heterogeneous, requiring a large sample of queries to
adequately represent a population of even one day's queries [7].
It is difficult to accurately capture the user's desired task and information need from web
queries, which are typically very short [4].
2.2. Prior Work in Query Log Analysis
Examinations of search engine evaluation indicate that performance likely varies over time due
to differences in query sets and collections [3]. Although the change in collections over time has
been studied (e.g., the growth of the web) [4], analysis of users' queries has been primarily
limited to the investigation of a small set of available query logs that provide a snapshot of their
query stream over a fixed period of time. Existing query log analysis can be partitioned into
large-scale log analysis, small-scale log analysis and some other applications of log analysis
such as categorization and query clustering. A survey covering a great deal of relevant prior
work in search studies can be found in [6]. Jansen and Pooch provide a framework for static log
analysis, but do not address analysis of changes in a query stream over time [11]. Given that
most search engines receive on the order of between tens and hundreds of millions of queries a
day [7], current and future log analysis efforts should use increasingly larger query sets to
ensure that prior assumptions still hold.
2.3. Automatic Classification of Web Queries
Accurate topical classification of user queries allows for increased effectiveness, efficiency, and
revenue potential in general-purpose web search systems. Such classification becomes critical if
the system is to return results not just from a general web collection but from topic-specific
3. International Journal of Managing Information Technology (IJMIT) Vol.3, No.1, February 2011
35
back-end databases as well. Successful query classification poses a challenging problem, as web
queries are very short, typically providing few features. This feature sparseness, coupled with
the dynamic nature of the query stream and the constantly changing vocabulary of the average
user hinders traditional methods of text classification. Understanding the topical sense of user
queries is a problem at the heart of web search. Successfully mapping incoming general user
queries to topical categories, particularly those for which the search engine has domain-specific
knowledge, can bring improvements in both the efficiency and the effectiveness of general web
search Much of the potential for these improvements exists because many of today's search
engines, both for the Web and for enterprises, often incorporate the use of topic specific back-
end databases when performing general web search. That is, in addition to traditional document
retrieval from general indices, they attempt to automatically route incoming queries to an
appropriate subset of specialized back-end databases and return a merged listing of results, often
giving preference to the topic-specific results. This strategy is similar to that used by meta-
search engines on the web [5]. The hope is that the system will be able to identify the set of
back-end databases that are most appropriate to the query, allowing for more efficient and
effective query processing. Correct routing decisions can result in reduced computational and
financial costs for the search service, since it is clearly impractical to send every query to every
backend-database in the (possibly very large) set. If a search service has a strict operational
need for low response time, erroneous routing decisions could force much larger scaling than is
necessary [6]. An accurate classification of a query to topic(s) of interest can assist in making
correct routing decisions, thereby reducing this potential cost.
2.4. Prior Work in Query Classification
Classifying texts and understanding queries are both fundamental concerns in information
retrieval, and so we can mention only a fraction of the potentially relevant work. In particular,
while work on classifying documents is relevant to classifying queries, we focus on query-
oriented work as being most relevant here. Query classification makes very different demands
on a classification system than does document classification. Queries are very short (the studies
discussed below find average web query lengths between 2.0 and 2.6 terms), and thus have few
features for a classifier to work with. Further, unlike a full text document, a query is just a
stand-in for the thing we would really like to classify: the user's interest. Sebastiani has
surveyed recent work on document classification [4]. More narrowly, our particular interest is in
queries to web search engines. Past research can be divided (with some overlap) into work
where manual or automated classification (and other techniques) are used to help understand a
body of queries, versus work on automatic classification of queries as a tool to improve system
effectiveness or provide new capabilities.
2.5. Related Topics and Applications of Query Classification Method
Web directories are subject taxonomies where each category is described by a set of terms that
consign the concept behind the category and a list of documents related to the concept. To
classify queries into directories could be useful to discover relationships among queries and the
concepts behind each query. In this sense there has been substantial work. For example, in
previous proposes a Bayesian approach to classify queries into subject taxonomies. Based on
the construction of training data, the queries are classified following a breadth first search
strategy starting from the root category and descending one level with each number of iteration.
The main limitation of the method is the following: queries are always classified into leaves
because leaves maximize the probability of the path root - leaf.
In the recent proposed competition was focused on the classification of queries into a given
subject taxonomy. To do that, the competition organizers provided a small training data set
composed by a list of queries and their category labels. Most of the papers were based on
classifiers which learn under supervised techniques. The winning paper was written and applies
4. International Journal of Managing Information Technology (IJMIT) Vol.3, No.1, February 2011
36
a two phase framework to classify a set of queries into subject taxonomy. Using a machine
learning approach, they collected data from the web for training synonym based classifiers that
map a query to each related category. In the second phase, the queries were formulated to a
search engine. Using the labels and the text of the retrieved pages, the queries were enriched in
their descriptions.
Finally, the queries were classified into the subject taxonomy using the classifiers through a
consensus function. The main limitations of the proposed method are the dependency of the
classification to the quality of the training data, the human effort involved in the training data
construction and the semi-automatic nature of the approach which limits the scale of the method
applications [8] and [9] and [15].
Vogel classify queries into subject taxonomies using a semi-automatic approach. First they post
the query to the Google directory which scans the Open Directory for occurrences of the query
within the Open Directory categories. Then the top-100 documents are retrieved from Google
formulating the query to the search engine. Using this document collection an ordered list of
those categories that the 100 retrieved documents are classified into is built. Using a semi-
automatic category mapping between the web categories and a subject category the method
identifies a set of the closest topics to each query. Unfortunately, the method is limited to the
quality of the Google classifier that identifies the closest categories in the Open directory. Also,
it is limited to the quality of the answer lists retrieved from Google. Finally, the semi-automatic
nature of the approach limits the scale of the method.
Directories are hierarchies of classes which group documents covering related topics.
Directories are compounded by nodes. Each node represents a category where Web resources
are classified. Queries and documents are traditionally considered as Web resources. The main
advantage of the use of directories is that if we find the appropriate category, the related
resources will be useful in most of cases. The structure of a directory is as follows: the root
category represents the all node. The all node means the complete corpus of human knowledge.
Thus, all queries and documents are relevant to the root category. Each category shows a list of
documents related to the category subject. Traditionally, documents are manually classified by
human editors.
The categories are organized in a child/parent relationship. The child/parent relation can be
viewed as an”IS-A” relationship. Thus, the child/parent relation represents a generalization/
specialization relation of the concepts represented by the categories.
It is possible to add links among categories without following the hierarchical structure of the
directory. These kind of links represent relationships among subjects of different categories in
the directory which share descriptive terms but have different meanings. For example, in the
TODOCL directory, the category sports/aerial/aviation has a link to the category business &
economy/transport/aerial because they share the term aerial that has two meanings: in one
category the term is related to sports and in the other to transportation.
Other related work in the area of query classification focuses on the potential applications of
query classification technology, and situations where the capabilities of an automatic query
classification system would be useful. One such key area is metasearch, a web-centric search
technique borne from the old problem of collection fusion [7].
A metasearch engine attempts to integrate and present data from a large number of disparate
engines, which may or may not be specialized in nature [5]. Clearly, automatic query
classification could be an important factor in such systems, as appropriately classifying a query
could assist the metasearch engine in optimally presenting the results it gathers from its source
engines, provided that sufficient metadata about their capabilities is available. Internet
mediators are a form of highly specialized metasearch engine that attempt to take a single query
and route it to multiple, disparate data sources based on its own analysis of the query [7], and
for them, an effective query classification system could be a critical part of the decision-making
process when performing effective query routing. Other areas where query classification can be
5. International Journal of Managing Information Technology (IJMIT) Vol.3, No.1, February 2011
37
effective include query disambiguation and research into predicting how well a query might
perform on a given collection. Cronen-Townsend, et. al recently introduced the Clarity" metric
as a way of predicting whether or not a query will have good performance [12], [13] and [16].
The Clarity metric provides a way of measuring how closely the language of a given query fits a
given collection. This information can then be used to make an estimation of that query's
expected performance; if it appears low, the query can be marked for further processing. Clarity
has been used by other researchers as a means of estimating and reducing the ambiguity of
queries, but this is an expensive process that requires external resources [14]. Automatic query
classification could assist systems in estimating the clarity of a specific query, possibly by
providing similar queries from the query's category for further analysis.
At this point, it is relevant to specify which kind of applications will be considered as output of
the query log mining process. We work on three kind of applications:
1. Query recommendation: focusing on the reformulation of the original query, these kind of
applications aim at identifying relationships between the original queries and alternative queries,
such as generalization/specialization relationships.
2. Document recommendation: These kinds of applications will identify relevant documents to
the original query.
3. Query classification: These kinds of applications will identify relevant queries and documents
for each node in the directory, enriching their descriptions.
There is a similar goal behind each application: to identify relevant documents to the users. This
is the main objective of a search engine and finally all the search engine applications are
oriented to accomplish this goal. But they work in different manners:
1. A query recommendation application searches for a better way to formulate the original
query.
2. Document recommendation applications focus on the identification of documents that are
relevant to the users who have formulated similar queries in the past.
3. A query classification application searches for a better description of a node in a taxonomy,
adding new queries and documents to them. When the original query is classified into a
directory node, a document list is recommended to the user. Also it is possible to navigate into
the directory, specializing /generalizing the original query. A global view of the relationships
between data mining techniques used in this thesis and the applications generated from the use
of these techniques over the query log data is given in Figure 1 [11] and [13] and [17]. Of
course, the applications will follow a set of design criteria. The desirable properties of the
applications will be the following:
6. International Journal of Managing Information Technology (IJMIT) Vol.3, No.1, February 2011
38
Figure 1: Relations between the data mining techniques used in this thesis and the applications
generated.
Maintainability: The costs associated to the system maintenance must be low.
Variability: The user’s preferences change in the course of time. This user click feature is due
to two main factors: changes in the user’s interests and elaboration of new documents with
contents that can be interesting to the users. The applications must reflect the changes in the
user’s interests in order to produce quality answer lists.
Scalability: The applications designed must be able to manage huge document collections,
efficiently.
User’s click data analysis: The analysis of the user’s click data is known as Web usage mining.
Web usage mining literature focuses on techniques that could predict user behavior patterns in
order to improve the success of Web sites in general, modifying for example, their interfaces.
Some studies are also focused on applying soft computing techniques to discover interesting
patterns in users click data, working with vagueness and imprecision in the data. Other kind of
works is focused on the identification of hidden relations in the data. Typical problems related
are user sessions and robot sessions identification. Web usage mining studies could be classified
into two categories: those based on the server side and those based on the client side. Studies
based on the client side retrieve data from the user using cookies or other methods, such as ad-
hoc logging browser plugins. For example, in recent proposes mining techniques to process data
retrieved from the client side using panel logs. Panel logs, such as TV audience ratings, cover a
broad URL space from the client side. As a main contribution, they could study global behavior
in Web communities. Web usage mining studies based on server log data present statistical
analysis in order to discover rules and non trivial patterns in the data. The main problem is the
discovery of navigation patterns. For example, assumes that users choose the following
document determined by the last few documents visited, concluding how well Markov models
predict user’s clicks. Similar studies consider longer sequences of requests to predict user
behavior or the use of hidden Markov models introducing complex models of user click
prediction. An analysis of the Alta Vista search engine log, focused on individual queries,
duplicate queries and the correlation between query terms. As a main result, authors show that
users type short queries and select very few documents in their sessions. Similar studies analyze
other commercial search engine logs. These works address the analysis from an aggregated
point of view, i.e., present global distributions over the data. Other approaches include novel
7. International Journal of Managing Information Technology (IJMIT) Vol.3, No.1, February 2011
39
points of view for the analysis of the data. As a main result, author shows that query traffic
differs from one topic to another, considering hourly analysis, these results being useful for
query disambiguation and routing. Finally the analysis is using geographic information about
users. The main results of this study show that queries are short, search topics are broadening
and approximately 50% of the Web documents viewed are topically relevant. The literature also
shows some works related with the following purpose: to determine the need behind the query.
To do that, analyzed a query log data determining three types of needs: informational,
navigational and transactional. Informational queries are related to specific contents in
documents. Navigational queries are used to find a specific site in order to browse their pages.
Transactional queries allow users to perform a procedure using web resources such as
commercial operations or download. In the last two categories were refined into ten more
classes. In recent years, a few papers have tried to determine the intention behind the query,
following the categorization introduced above. Simple attributes were used to predict the need
behind the query. To introduce a framework to identify the intention behind the query using
user’s click data. Following a combination between supervised and unsupervised methods, the
authors show that it is possible to identify concepts related to the query and to classify the query
in a given user’s needs categorization. As we can see in the works described above, the user’s
click data has been extensively studied in scientific literature. Despite this fact, we have little
understanding about how search engines affect their own searching process and how users
interact with search engine results. Our contribution in this topic will be focused in this manner:
to illustrate the user-search engine interaction as a bidirectional feedback relation. To do this we
should understand the searching process as a complex phenomenon constituted by multiple
factors such as the relative position of the pages in answer lists, the semantic connections
between query words and the document reading time, among other variables. Finally, we
propose user search behavior models that involve the factors enumerated. The main advantage
over this work is the use of several variables involved in the searching process to produce
models about how users search and how users use the search engine results [5], [16], [18] and
[20].
3. PROPOSED CLASSIFICATION APPROACHES
Prior efforts in classifying general web queries have included both manual and automatic
techniques. In this section, we describe the manual and automatic classification approaches that
collectively form our framework and explain our motivations for using each approach. We also
introduce a new rule-based automatic classification technique based on an application of
computational linguistics for identifying selectional preferences by mining a very large
unlabeled query log. We demonstrate that a combination of these approaches allows us to
develop an automatic web query classification system that covers a large portion of the query
stream with a reasonable degree of precision for using each approach. We also introduce a new
rule-based automatic classification technique based on an application of computational
linguistics for identifying selectional preferences by mining a very large unlabeled query log.
We demonstrate that a combination of these approaches allows us to develop an automatic web
query classification system that covers a large portion of the query stream with a reasonable
degree of precision.
We explored this technique by using 18 lists of categorized queries produced in the above
fashion by a team of AOLTM
editors. We examined the coverage of these categories and found
that even after considerable time and development effort; they only represented 12% of the
general query stream. In addition to poor coverage, a key weakness in this approach is that the
query stream is in a constant state of flux. Any manual classifications based on popular
elements in a constantly changing query stream will become ineffective over time. Additionally,
8. International Journal of Managing Information Technology (IJMIT) Vol.3, No.1, February 2011
40
manual classification is very expensive. It is infeasible to label enough queries, and to repeat
this process often enough, to power an exact match classification system that covers a sufficient
amount of the query stream. Finally, one does not necessarily achieve high precision even on
queries that are covered. A natural step is to leverage the labeled queries from the exact match
system through supervised learning. The idea is to train a classifier on the manual classifications
with the goal of uncovering features that enable novel queries to be classified with respect to the
categories.
Each query is classified following a top-down approach. First, we determine the closest centroid
to the query considering all the centroids at the first level of depth in the concept taxonomy.
Then we repeat the process for each level of the taxonomy while the distance between the query
and the closest centroid will be less than the distance at the previous level. The top-down
approach is used to avoid the noise effects introduced by document and query terms. From our
point of view, the term relevance decreases from general topics to sub-topics. In the figure 2 we
illustrate the classification schema.
For our experiments we use the Perception with Margins algorithm, which has been shown to be
competitive with state-of-the-art algorithms such as support vector machines in text
categorization, and are very computationally efficient [7].
Figure 2: The hierarchical classification schema
Our implementation normalizes the document feature vectors to unit length (Euclidean
normalization), which we have found to increase classification accuracy. To examine the
effectiveness of the supervised learning approach we trained a perception for each category
from the exact match system, using all queries in a given category as positive examples and all
queries not in that category as negative examples.
9. International Journal of Managing Information Technology (IJMIT) Vol.3, No.1, February 2011
41
To realistically evaluate this approach we developed a test collection that is representative of the
general query stream. To determine the number of queries required to achieve a representative
sample of our query stream, we calculated the necessary sample size in queries:
Representative Sample Size Formula
ss = (Z2
)(P)(1-p)/c2
…………….(1)
Where:
Z = Z value (e.g. 1.96 for 95% confidence)
p = percentage picking a choice, expressed as decimal (0.5 used here)
c = confidence interval, expressed as decimal (e.g., .04 = +/- 4).
By setting our confidence level to 99% and error rate to 1%, we require a sample of at least
16,638 queries.
To build the test set we had a team of human editors perform a manual classification of 20,000
queries randomly sampled from general query stream. These 20,000 queries were then used for
testing our perception learner trained on the classifications from the exact match system.
3.1 Selectional Preferences in Computational Linguistics
A selectional preference study begins by parsing a corpus to produce pairs, (x; y), of words that
occur in particular syntactic relationships. The verb-object relationship is most widely studied,
but others such as adjective-noun have been used [5]. The interest is in finding how x, the
context, constrains the meaning of y, its argument.
The selectional preference strength S(x) of a word x, where u ranges over a set U of semantic
classes. P(U) is the probability distribution of semantic classes taken on by words in a particular
syntactic position, while P(U|x) is that distributions for cases where a contextual syntactic
position is occupied by x. S(x) is simply the KL-divergence of P(U|x) from P(U).
Selectional Preference Strength:S(x) = D(P(u|x)||P(u)) =∑P(u|x)lg(P(u|x)/p(u))
The ideal approach to estimating the necessary probabilities would be to use a set of (x; y) pairs
where each y has been tagged with its semantic class, perhaps produced by parsing a
semantically tagged corpus. The maximum likelihood estimates (MLEs) are:
P (u) = nu/N
P(u|x) = (nxu/N) / nx/N = nxu/ nx
Where N is the total number of pairs, nu is number of pairs with a word of class u in the
syntactic position of interest, nx is the number of pairs with x providing context for that position,
and nxu is the number of pairs where both are true.
4. RESULTS
Our experiment consisted of evaluating the quality of the answer lists retrieved by the search
engine ranking method, the method based on the hierarchical classifier, and the method based
on the flat classifier. In order to do this, we have considered the first 10 documents
recommended by the search engine for each one of the 30 queries, the first ten documents
10. International Journal of Managing Information Technology (IJMIT) Vol.3, No.1, February 2011
42
recommended by the web directory when the closest category is determined using the flat
classifier, and the first 10 documents recommended by the web directory when the query is
classified using the hierarchical method. The document quality has been evaluated by a group of
experts using the same relevance criteria as in the previous experiment (0-4, from lower to
higher relevance). The precision for every ranking and every query is obtained, according to the
position. Finally, the average precision is calculated over the total of documents recommended
by position. Results are shown in Figure below 4.1. In the figure we can observe that all the
evaluated methods are good quality rankings, especially for the first 5 recommended
documents. The recommendation methods based on hierarchical classification and flat
classification perform in a better way for the first 5 positions than the TodoCL ranking. This
means that the classification in the taxonomy is a good quality one, the same as shown in the
previous experiment. However, the ranking loses precision compared to the one of TodoCL, if
we consider the last 5 positions. This is due to the fact that many of the evaluated queries are
classified in taxonomy nodes where less than 10 documents are recommended.
In these cases, since there is no recommendation, the associated precision equals 0, which
severely disqualifies the methods based on classifiers. Fortunately, none of the queries is
classified in a node with less than 5 recommended documents. Therefore, a fair comparison of
the methods should be limited to the first 5 positions where, as we have seen, the hierarchical
method is favorably compared with the original ranking and with the flat classifier.
Due to the fact that in general the coverage of directories is low, i.e. there are few documents
recommended in each node of the directory, it is necessary to design a maintenance method in
order to classify documents into nodes enriching their descriptions and improving the coverage
of the directory.
Figure 3: Average precision of the retrieved documents for the methods based on classifiers and
for the search engine ranking.
5. CONCLUSIONS AND FUTURE WORK
We can conclude that our query classification method allows us to identify concepts associated
to the queries. The taxonomy would permit us to specialize the query in order to provide final
accurate recommendations. The proposed method also allows us to improve the precision of the
retrieved documents over the first 5 positions of the answer lists. Compared with the search
engine ranking method and compared with the method based on a flat classifier, the hierarchical
11. International Journal of Managing Information Technology (IJMIT) Vol.3, No.1, February 2011
43
classifier method provides better results regarding the precision of the answer lists. One of the
biggest constraints of the proposed method lies in the fact that it depends strongly on the
taxonomy quality. In the third experiment, we can notice that an important proportion of the
nodes contain an insufficient quantity of recommended documents, which prevents a favorable
comparison to the TodoCL ranking beyond the first 5 recommendations. The classification
method proposed in this chapter allows the identification of well defined concepts for each
query classified. The maintenance method proposed allows the Web directory to be
automatically maintained without losing precision and recall. By considering the logs in the
maintenance process we take into account information based on user’s click data. The
preferences are bound to query sessions, and thus we may obtain preferences sensible to topics
users are looking for. The importance and quality of a Web page is reflected by the preferences
registered in the Web log. This allows us to find many new relevant documents not considered
in the directory built by human editors. Another constraint is that the taxonomy does not always
allow us to identify the need behind the query. Both constraints are related to the fact that since
the directory is manually maintained, it is limited in enlargement and freshness because of the
frequency of the editor’s evaluations and updates.
In future, we will present a method based on the proposed classification schema which permits
us to maintain automatically the directories, by adding new queries and documents to each
directory node.
REFERENCES
[1] Ali Sajedi Badashian, Seyyed Hamidreza Afzali, Morteza Ashurzad Delcheh, Mehregan
Mahdavi and Mahdi Alipour, (2010) “Conceptual File Management: Revising the Structure of
Classificationbased Information Retrieval”, 5th
IEEE International Conference on Digital
Information Management (ICDIM), pp. 108-113.
[2] Ming Zhao and Shutao Li, (2009) “Sparse Representation Classification for Image Text
Detection”, 2nd
IEEE International Symposium on Computational Intelligence and Design, Vol.
1, pp. 76-79.
[3] Radhouane Guermazi, Mohamed Hammami, and Abdelmajid Ben Hamadou, (2009) “Violent
Web images classification based on MPEG 7 color descriptors”, IEEE International Conference
on Systems, Man, and Cybernetics, pp. 3106 – 3111.
[4] Esin Guldogan, and Moncef Gabbouj, (2010) “Adaptive Image Classification Based on
Folksonomy”, 11th
International Workshop on Image Analysis for Multimedia Interactive
Services (WIAMIS), pp. 1-4.
[5] Shuang Deng, and Hong Peng, (2006) “Document Classification Based on Support Vector
Machine Using A Concept Vector Model”, IEEE/WIC/ACM International Conference on Web
Intelligence, pp. 473-476.
[6] Choochart Haruechaiyasak, Mei-Ling Shyu, Shu-Ching Chen, and Xiuqi Li, (2002) “Web
Document Classification Based on Fuzzy Association”, 26th
IEEE Annual International
Conference on Computer Software and Applications COMPSAC 2002, pp. 487-492.
[7] Shou-Bindong, Yi-Myianngg, (2002) “Hierarchical Web Image Classification by Multi-Level
Features”, 1st
IEEE International Conference on Machine Learning and Cybernetics, Vol. 21, pp.
663-668.
[8] S. M. Beitzel, E. C. Jensen, D. D. Lewis, A. Chowdhury, A. Kolcz, and O. Frieder, (2005)
“Improving automatic query classification via semi-supervised learning”, 5th
IEEE International
Conference on Data Mining, pp. 42-49.
[9] YANG Xiao-qin, JU Shiguang and CAO Qinghuang, (2009) “A Deep Web Complex Matching
Method based on Association Mining and Semantic Clustering”, 6th
IEEE Web Information
Systems and Applications Conference, WISA 2009, pp. 169-172.
12. International Journal of Managing Information Technology (IJMIT) Vol.3, No.1, February 2011
44
[10] Nizar R. Mabroukeh, and C. I. Ezeife, (2009) “Semantic-rich Markov Models for Web
Prefetching”, IEEE International Conference on Data Mining Workshops, ICDMW’09, pp. 465-
470.
[11] Suleyman Salin, and Pinar Senkul, (2009) “Using Semantic Information for Web Usage Mining
Based Recommendation”, 24th
IEEE International Symposium on Computer and Information
Sciences, ISCIS 2009, pp. 236-241.
[12] Yi Feng, (2010) “Towards Knowledge Discovery in Semantic Era”, 7th
IEEE International
Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2010, Vol. 5, pp. 2071-2075.
[13] Elaheh Momeni, (2010) “Towards (Semi-) Automatic Moderation of Social Web Annotations”,
2nd
IEEE International Conference on Social Computing, pp.1123-1128.
[14] Julia Hoxha and Sudhir Agarwal, (2010) “Semi-automatic Acquisition of Semantic Descriptions
of Processes in the Web”, IEEE International Conference on Web Intelligence and Intelligent
Agent Technology, Vol. 1, pp. 256-263.
[15] Taksa, Isak, Zelikovitz, Sarah, Spink Amanda, (2007) “Using Web Search Logs to Identify
Query Classification Terms”, 4th
IEEE International Conference on Information Technology,
ITNG '07, pp. 469-474.
[16] C. N. Ziegler, M. Skubacz, (2007) “Content Extraction from News Pages Using Particle Swarm
Optimization on Linguistic and Structural Features”, IEEE/WIC/ACM International Conference
on Web Intelligence, pp. 242-249.
[17] Weiming Hu, Ou Wu, Zhouyao Chen, Zhouyu Fu, S. Maybank, (2007) “Recognition of
Pornographic Web Pages by Classifying Texts and Images” IEEE Transactions on Pattern
Analysis and Machine Intelligence, Vol. 29, No. 6, pp. 1019-1034.
[18] Jun Fang, Lei Guo, XiaoDong Wang, Ning Yang, (2007) “Ontology-Based Automatic
Classification and Ranking for Web Documents”, 4th
International Conference on Fuzzy Systems
and Knowledge Discovery, FSKD’2007, Vol. 3, pp. 627-631.
[19] Shiqun Yin, Yuhui Qiu, Chengwen Zhong, Jifu Zhou, (2007) “Study of Web Information
Extraction and Classification Method”, IEEE International Conference on Wireless
Communications, Networking, and Mobile Computing, WiCom’2007, pp. 5548-5552.
[20] F. Schroff, A. Criminisi, A. Zisserman, (2007) “Harvesting Image Databases from the Web”,
ICCV’2007, 11th
IEEE International Conference on Computer Vision, pp. 1-8.