Finding the required URL among the first few result pages of a search engine is still a challenging task. This may require number of reformulations of the search string thus adversely affecting user's search time. Query ambiguity and polysemy are major reasons for not obtaining relevant results in the top few result pages. Efficient query composition and data organization are necessary for getting effective results. Context of the information need and the user intent may improve the autocomplete feature of existing search engines. This research proposes a Funnel Mesh-5 algorithm (FM5) to construct a search string taking into account context of information need and user intention with three main steps 1) Predict user intention with user profiles and the past searches via weighted mesh structure 2) Resolve ambiguity and polysemy of search strings with context and user intention 3) Generate a personalized disambiguated search string by query expansion encompassing user intention and predicted query. Experimental results for the proposed approach and a comparison with direct use of search engine are presented. A comparison of FM5 algorithm with K Nearest Neighbor algorithm for user intention identification is also presented. The proposed system provides better precision for search results for ambiguous search strings with improved identification of the user intention. Results are presented for English language dataset as well as Marathi (an Indian language) dataset of ambiguous search strings.
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...cseij
Query sensitive summarization aims at providing the users with the summary of the contents of single or multiple web pages based on the search query. This paper proposes a novel idea of generating a comparative summary from a set of URLs from the search result. User selects a set of web page links from the search result produced by search engine. Comparative summary of these selected web sites is generated. This method makes use of HTML DOM tree structure of these web pages. HTML documents are segmented into set of concept blocks. Sentence score of each concept block is computed with respect to the query and feature keywords. The important sentences from the concept blocks of different web pages are extracted to compose the comparative summary on the fly. This system reduces the time and effort required for the user to browse various web sites to compare the information. The comparative summary of the contents would help the users in quick decision making.
User search goal inference and feedback session using fast generalized – fuzz...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
Improving search result via search keywords and data classification similarityConference Papers
This document proposes a new method to improve search results by predicting queries based on similarity between input queries and historical data of keywords and their classifications. It involves extracting keywords from queries and classifying them using named entity recognition and lexical databases. Prediction rules are generated by analyzing relationships between keywords and classifications in historical data. These rules are then used to predict additional related keywords for new queries to enhance search criteria and provide more relevant search results. The method aims to address limitations of existing search engines that rely only on raw historical query data without enrichment to predict queries.
Research on Document Indexing in the Search Engines. The main theme of Informational retrieval is to send the exact response of a user for specific Query.
The information search retrieval is a very big process, to achieve this concept we need to develop an application with more effect and we have to use techniques like Document indexing, page ranking, clustering technique. Among all of these Document index is plays avital role while searching why since instead of searching hundreds of thousands of documents it will directly go to the particular index and will give the output here. Here our achievement mainly is indexing, the clear meaning of the indexing is storing an index is to optimize speed and performance in finding the appropriate/corresponding document for the user searched query.
My conclusion is the context based index approach is used in the query retrieval, this is mainly from the source document. Instead of searching every page on server, finding technically is better. Due to this we can save our time, we can reduce the burden of server.
This document discusses techniques for personalizing search engine results using concept-based user profiles. It proposes six methods for creating user profiles that capture both positive and negative user preferences and interests based on concepts extracted from search queries and results. The methods use machine learning algorithms to learn weighted concept vectors representing user profiles. An evaluation found that profiles capturing both positive and negative preferences performed best. The goal is to resolve query ambiguity and increase result relevance by understanding each user's unique interests and preferences.
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
This paper proposes a multi-document summarization system that uses bisect k-means clustering, an optimal merge function, and a neural network. The system first preprocesses input documents through stemming and removing stop words. It then applies bisect k-means clustering to group similar sentences. The clusters are merged using an optimal merge function to find important keywords. The NEWSUM algorithm is used to generate a primary summary for each keyword. A neural network trained on sentence classifications is then used to classify sentences in the primary summary as positive or negative. Only positively classified sentences are included in the final summary to improve accuracy. The system aims to generate a concise and accurate summary in a short period of time from multiple documents on a given topic.
This document discusses methods for measuring semantic similarity between words. It begins by discussing how traditional lexical similarity measurements do not consider semantics. It then discusses several existing approaches that measure semantic similarity using web search engines and text snippets. These approaches calculate word co-occurrence statistics from page counts and analyze lexical patterns extracted from snippets. Pattern clustering is used to group semantically similar patterns. The approaches are evaluated using datasets and metrics like precision and recall. Finally, the document proposes a new method that combines page count statistics, lexical pattern extraction and clustering, and support vector machines to measure semantic similarity.
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...cseij
Query sensitive summarization aims at providing the users with the summary of the contents of single or multiple web pages based on the search query. This paper proposes a novel idea of generating a comparative summary from a set of URLs from the search result. User selects a set of web page links from the search result produced by search engine. Comparative summary of these selected web sites is generated. This method makes use of HTML DOM tree structure of these web pages. HTML documents are segmented into set of concept blocks. Sentence score of each concept block is computed with respect to the query and feature keywords. The important sentences from the concept blocks of different web pages are extracted to compose the comparative summary on the fly. This system reduces the time and effort required for the user to browse various web sites to compare the information. The comparative summary of the contents would help the users in quick decision making.
User search goal inference and feedback session using fast generalized – fuzz...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
Improving search result via search keywords and data classification similarityConference Papers
This document proposes a new method to improve search results by predicting queries based on similarity between input queries and historical data of keywords and their classifications. It involves extracting keywords from queries and classifying them using named entity recognition and lexical databases. Prediction rules are generated by analyzing relationships between keywords and classifications in historical data. These rules are then used to predict additional related keywords for new queries to enhance search criteria and provide more relevant search results. The method aims to address limitations of existing search engines that rely only on raw historical query data without enrichment to predict queries.
Research on Document Indexing in the Search Engines. The main theme of Informational retrieval is to send the exact response of a user for specific Query.
The information search retrieval is a very big process, to achieve this concept we need to develop an application with more effect and we have to use techniques like Document indexing, page ranking, clustering technique. Among all of these Document index is plays avital role while searching why since instead of searching hundreds of thousands of documents it will directly go to the particular index and will give the output here. Here our achievement mainly is indexing, the clear meaning of the indexing is storing an index is to optimize speed and performance in finding the appropriate/corresponding document for the user searched query.
My conclusion is the context based index approach is used in the query retrieval, this is mainly from the source document. Instead of searching every page on server, finding technically is better. Due to this we can save our time, we can reduce the burden of server.
This document discusses techniques for personalizing search engine results using concept-based user profiles. It proposes six methods for creating user profiles that capture both positive and negative user preferences and interests based on concepts extracted from search queries and results. The methods use machine learning algorithms to learn weighted concept vectors representing user profiles. An evaluation found that profiles capturing both positive and negative preferences performed best. The goal is to resolve query ambiguity and increase result relevance by understanding each user's unique interests and preferences.
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
This paper proposes a multi-document summarization system that uses bisect k-means clustering, an optimal merge function, and a neural network. The system first preprocesses input documents through stemming and removing stop words. It then applies bisect k-means clustering to group similar sentences. The clusters are merged using an optimal merge function to find important keywords. The NEWSUM algorithm is used to generate a primary summary for each keyword. A neural network trained on sentence classifications is then used to classify sentences in the primary summary as positive or negative. Only positively classified sentences are included in the final summary to improve accuracy. The system aims to generate a concise and accurate summary in a short period of time from multiple documents on a given topic.
This document discusses methods for measuring semantic similarity between words. It begins by discussing how traditional lexical similarity measurements do not consider semantics. It then discusses several existing approaches that measure semantic similarity using web search engines and text snippets. These approaches calculate word co-occurrence statistics from page counts and analyze lexical patterns extracted from snippets. Pattern clustering is used to group semantically similar patterns. The approaches are evaluated using datasets and metrics like precision and recall. Finally, the document proposes a new method that combines page count statistics, lexical pattern extraction and clustering, and support vector machines to measure semantic similarity.
This document discusses approaches to resolving ambiguity in user queries to search engines. It first introduces the problem of ambiguity, where a single search term can have multiple meanings. It then overviews existing ranking algorithms used by search engines. The document proposes a dynamic page ranking algorithm to better disambiguate user intent by considering the context of the query. It evaluates the approach using mean average precision on the ambiguous term "beat". Experimental results show the dynamic approach outperforms a baseline page ranking algorithm on this task.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...pharmaindexing
This document discusses an efficient semantic data alignment approach based on fuzzy c-means (FCM) clustering to infer user search goals using feedback sessions. It aims to cluster similar pseudo-documents representing user feedback sessions to better understand user search intents for a given query. The approach first collects feedback sessions from search results and generates pseudo-documents. It then uses FCM clustering to group similar pseudo-documents while also measuring semantic similarity between terms. This is an improvement over k-means clustering which does not consider semantic similarity. The results are evaluated using metrics like classified average precision and show this FCM-based approach performs better than clustering without semantic alignment.
TALASH: A SEMANTIC AND CONTEXT BASED OPTIMIZED HINDI SEARCH ENGINEIJCSEIT Journal
This document summarizes a research paper that proposes three models for query expansion in a Hindi search engine: 1) Using lexical resources like HindiWordNet to find synonyms and related terms, 2) Using user context information like location, interests and profession, 3) Combining lexical resources and user context. An experiment compares the precision of results from simple Google searches to searches using each model. Precision was highest using the combined Model III at 0.79, showing that integrating lexical and user context information improves search quality in Hindi.
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
-In this study a comprehensive evaluation of two
supervised feature selection methods for dimensionality
reduction is performed - Latent Semantic Indexing (LSI) and
Principal Component Analysis (PCA). This is gauged against
unsupervised techniques like fuzzy feature clustering using
hard fuzzy C-means (FCM) . The main objective of the study is
to estimate the relative efficiency of two supervised techniques
against unsupervised fuzzy techniques while reducing the
feature space. It is found that clustering using FCM leads to
better accuracy in classifying documents in the face of
evolutionary algorithms like LSI and PCA. Results show that
the clustering of features improves the accuracy of document
classification
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
A Survey on Automatically Mining Facets for Queries from their Search ResultsIRJET Journal
This document summarizes research on automatically mining query facets from search results. Query facets provide useful summaries of a query by grouping related terms and phrases. The document reviews existing methods for query recommendation and facet extraction. It also proposes an unsupervised technique to mine query facets from top search results without additional domain knowledge. The technique aims to help users better understand queries and explore information through faceted search.
A New Algorithm for Inferring User Search Goals with Feedback SessionsIJERA Editor
When different users may have different search goals when they submit it to a search engine. The inference and analysis of user search goals can be very useful in improving search engine relevance and user experience. The Novel approach to infer user search goals by analyzing search engine query logs. Once the User entered the query, the Resultant URLs will be filtered and the Pseudo-Documents are generated. Once the Pseudo documents are generated the Server will apply the Clustering Mechanism to URL’s. So that the URLs are listed as different categories. Feedback sessions are constructed from user click-through logs and can efficiently reflect the information needs of user. Second, we propose a novel approach to generate pseudo documents to better represents the feedback sessions for clustering. Finally we proposed new criterion “Classified Average Precision (CAP)” to evaluate the performance of inferring user search goals. Experimental results are presented using user click-through logs from a commercial search engine to validate the effectiveness of our proposed methods. Third, the distributions of user search goals can also be useful in applications such as re ranking web search results that contain different user search goals.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
This document summarizes a research paper on developing user profiles from search engine queries to enable personalized search results. It discusses how current search engines generally return the same results regardless of individual user interests. The paper proposes methods to construct user profiles capturing both positive and negative preferences from search histories and click-through data. Experimental results showed profiles including both preferences performed best by improving query clustering and separating similar vs. dissimilar queries. Future work aims to use profiles for collaborative filtering and predicting new query intents.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document summarizes various approaches to web search personalization that have been proposed by researchers. It discusses 12 different approaches that use techniques such as building user profiles based on browsing history and click data, constructing personalized ontologies and taxonomies, re-ranking search results based on learned user interests, and updating user profiles based on implicit feedback during web browsing. The document concludes that personalized search is needed to better tailor search results to individual users and their contexts, and that continued research on innovative personalization techniques is important as the amount of online information grows.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
This document presents a general framework for building classifiers and clustering models using hidden topics to deal with short and sparse text data. It analyzes hidden topics from a large universal dataset using LDA. These topics are then used to enrich both the training data and new short text data by combining them with the topic distributions. This helps reduce data sparseness and improves classification and clustering accuracy for short texts like web snippets. The framework is also applied to contextual advertising by matching web pages and ads based on their hidden topic similarity.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
Classification-based Retrieval Methods to Enhance Information Discovery on th...IJMIT JOURNAL
The widespread adoption of the World-Wide Web (the Web) has created challenges both for society as a whole and for the technology used to build and maintain the Web. The ongoing struggle of information retrieval systems is to wade through this vast pile of data and satisfy users by presenting them with information that most adequately it’s their needs. On a societal level, the Web is expanding faster than we can comprehend its implications or develop rules for its use. The ubiquitous use of the Web has raised important social concerns in the areas of privacy, censorship, and access to information. On a technical level, the novelty of the Web and the pace of its growth have created challenges not only in the development of new applications that realize the power of the Web, but also in the technology needed to scale applications to accommodate the resulting large data sets and heavy loads. This thesis presents searching algorithms and hierarchical classification techniques for increasing a search service's understanding of web queries. Existing search services rely solely on a query's occurrence in the document collection to locate relevant documents. They typically do not perform any task or topic-based analysis of queries using other available resources, and do not leverage changes in user query patterns over time. Provided within are a set of techniques and metrics for performing temporal analysis on query logs. Our log analyses are shown to be reasonable and informative, and can be used to detect changing trends and patterns in the query stream, thus providing valuable data to a search service.
Single document keywords extraction in Bahasa Indonesia using phrase chunkingTELKOMNIKA JOURNAL
Keywords help readers to understand the idea of a document quickly. Unfortunately, considerable time and effort are often needed to come up with a good set of keywords manually. This research focused on generating keywords from a document automatically using phrase chunking. Firstly, we collected part of speech patterns from a collection of documents. Secondly, we used those patterns to extract candidate keywords from the abstract and the content of a document. Finally, keywords are selected from the candidates based on the number of words in the keyword phrases and some scenarios involving candidate reduction and sorting. We evaluated the result of each scenario using precision, recall, and F-measure. The experiment results show: i) shorter-phrase keywords with string reduction extracted from the abstract and sorted by frequency provides the highest score, ii) in every proposed scenario, extracting keywords using the abstract always presents a better result, iii) using shorter-phrase patterns in keywords extraction gives better score in comparison to using all phrase patterns, iv) sorting scenarios based on the multiplication of candidate frequencies and the weight of the phrase patterns offer better results.
IRJET- Review on Information Retrieval for Desktop Search EngineIRJET Journal
This document summarizes techniques for desktop search engines, including feature extraction using entity recognition, query understanding using part-of-speech tagging and segmentation, and similarity measures for scoring and ranking documents. It discusses using ontologies, concept graphs, semantic networks, and vector space models to represent knowledge in documents. Feature extraction identifies entities that can be mapped to knowledge bases to infer meanings. Query understanding aims to determine intent regardless of technique used. Similarity is measured using approaches like comparing maximum common subgraphs between a document and query graphs.
This document summarizes a research paper that proposes a framework for personalized web search using query log and clickthrough data. The framework implements a re-ranking approach that combines user search context and browsing behavior to generate personalized search results with high relevance. The framework consists of five components: a request handler, query processor, result handler, event handler, and response handler. The result handler applies a re-ranking approach using query log and clickthrough data to personalize search results before returning them to the user. An evaluation found the framework and re-ranking approach to be effective for personalized web search and information retrieval.
This document summarizes a research paper that proposes a novel framework for personalized web search using query log and clickthrough data. The framework implements a re-ranking approach to generate personalized search results with high relevance. It derives an extended set of user preferences and concepts based on extracted data from query logs and clickthrough information. An evaluation found the framework and re-ranking approach to be highly effective for personalized search and information retrieval.
Personalized web search using browsing history and domain knowledgeRishikesh Pathak
This document proposes a framework for improving personalized web search by constructing an enhanced user profile using both the user's browsing history and domain knowledge. The enhanced user profile is used to better suggest relevant web pages to the user based on their search query. An experiment found that suggestions made using the enhanced user profile performed better than using a standard user profile alone. The framework involves modeling the user, re-ranking search results, and displaying personalized results based on the enhanced user profile.
Multi Similarity Measure based Result Merging Strategies in Meta Search EngineIDES Editor
In Meta Search Engine result merging is the key
component. Meta Search Engines provide a uniform query
interface for Internet users to search for information.
Depending on users’ needs, they select relevant sources and
map user queries into the target search engines, subsequently
merging the results. The effectiveness of a Meta Search
Engine is closely related to the result merging algorithm it
employs. In this paper, we have proposed a Meta Search
Engine, which has two distinct steps (1) searching through
surface and deep search engine, and (2) Ranking the results
through the designed ranking algorithm. Initially, the query
given by the user is inputted to the deep and surface search
engine. The proposed method used two distinct algorithms
for ranking the search results, concept similarity based
method and cosine similarity based method. Once the results
from various search engines are ranked, the proposed Meta
Search Engine merges them into a single ranked list. Finally,
the experimentation will be done to prove the efficiency of
the proposed visible and invisible web-based Meta Search
Engine in merging the relevant pages. TSAP is used as the
evaluation criteria and the algorithms are evaluated based on
these criteria.
This document discusses approaches to resolving ambiguity in user queries to search engines. It first introduces the problem of ambiguity, where a single search term can have multiple meanings. It then overviews existing ranking algorithms used by search engines. The document proposes a dynamic page ranking algorithm to better disambiguate user intent by considering the context of the query. It evaluates the approach using mean average precision on the ambiguous term "beat". Experimental results show the dynamic approach outperforms a baseline page ranking algorithm on this task.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...pharmaindexing
This document discusses an efficient semantic data alignment approach based on fuzzy c-means (FCM) clustering to infer user search goals using feedback sessions. It aims to cluster similar pseudo-documents representing user feedback sessions to better understand user search intents for a given query. The approach first collects feedback sessions from search results and generates pseudo-documents. It then uses FCM clustering to group similar pseudo-documents while also measuring semantic similarity between terms. This is an improvement over k-means clustering which does not consider semantic similarity. The results are evaluated using metrics like classified average precision and show this FCM-based approach performs better than clustering without semantic alignment.
TALASH: A SEMANTIC AND CONTEXT BASED OPTIMIZED HINDI SEARCH ENGINEIJCSEIT Journal
This document summarizes a research paper that proposes three models for query expansion in a Hindi search engine: 1) Using lexical resources like HindiWordNet to find synonyms and related terms, 2) Using user context information like location, interests and profession, 3) Combining lexical resources and user context. An experiment compares the precision of results from simple Google searches to searches using each model. Precision was highest using the combined Model III at 0.79, showing that integrating lexical and user context information improves search quality in Hindi.
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
-In this study a comprehensive evaluation of two
supervised feature selection methods for dimensionality
reduction is performed - Latent Semantic Indexing (LSI) and
Principal Component Analysis (PCA). This is gauged against
unsupervised techniques like fuzzy feature clustering using
hard fuzzy C-means (FCM) . The main objective of the study is
to estimate the relative efficiency of two supervised techniques
against unsupervised fuzzy techniques while reducing the
feature space. It is found that clustering using FCM leads to
better accuracy in classifying documents in the face of
evolutionary algorithms like LSI and PCA. Results show that
the clustering of features improves the accuracy of document
classification
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
A Survey on Automatically Mining Facets for Queries from their Search ResultsIRJET Journal
This document summarizes research on automatically mining query facets from search results. Query facets provide useful summaries of a query by grouping related terms and phrases. The document reviews existing methods for query recommendation and facet extraction. It also proposes an unsupervised technique to mine query facets from top search results without additional domain knowledge. The technique aims to help users better understand queries and explore information through faceted search.
A New Algorithm for Inferring User Search Goals with Feedback SessionsIJERA Editor
When different users may have different search goals when they submit it to a search engine. The inference and analysis of user search goals can be very useful in improving search engine relevance and user experience. The Novel approach to infer user search goals by analyzing search engine query logs. Once the User entered the query, the Resultant URLs will be filtered and the Pseudo-Documents are generated. Once the Pseudo documents are generated the Server will apply the Clustering Mechanism to URL’s. So that the URLs are listed as different categories. Feedback sessions are constructed from user click-through logs and can efficiently reflect the information needs of user. Second, we propose a novel approach to generate pseudo documents to better represents the feedback sessions for clustering. Finally we proposed new criterion “Classified Average Precision (CAP)” to evaluate the performance of inferring user search goals. Experimental results are presented using user click-through logs from a commercial search engine to validate the effectiveness of our proposed methods. Third, the distributions of user search goals can also be useful in applications such as re ranking web search results that contain different user search goals.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
This document summarizes a research paper on developing user profiles from search engine queries to enable personalized search results. It discusses how current search engines generally return the same results regardless of individual user interests. The paper proposes methods to construct user profiles capturing both positive and negative preferences from search histories and click-through data. Experimental results showed profiles including both preferences performed best by improving query clustering and separating similar vs. dissimilar queries. Future work aims to use profiles for collaborative filtering and predicting new query intents.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document summarizes various approaches to web search personalization that have been proposed by researchers. It discusses 12 different approaches that use techniques such as building user profiles based on browsing history and click data, constructing personalized ontologies and taxonomies, re-ranking search results based on learned user interests, and updating user profiles based on implicit feedback during web browsing. The document concludes that personalized search is needed to better tailor search results to individual users and their contexts, and that continued research on innovative personalization techniques is important as the amount of online information grows.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
This document presents a general framework for building classifiers and clustering models using hidden topics to deal with short and sparse text data. It analyzes hidden topics from a large universal dataset using LDA. These topics are then used to enrich both the training data and new short text data by combining them with the topic distributions. This helps reduce data sparseness and improves classification and clustering accuracy for short texts like web snippets. The framework is also applied to contextual advertising by matching web pages and ads based on their hidden topic similarity.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
Classification-based Retrieval Methods to Enhance Information Discovery on th...IJMIT JOURNAL
The widespread adoption of the World-Wide Web (the Web) has created challenges both for society as a whole and for the technology used to build and maintain the Web. The ongoing struggle of information retrieval systems is to wade through this vast pile of data and satisfy users by presenting them with information that most adequately it’s their needs. On a societal level, the Web is expanding faster than we can comprehend its implications or develop rules for its use. The ubiquitous use of the Web has raised important social concerns in the areas of privacy, censorship, and access to information. On a technical level, the novelty of the Web and the pace of its growth have created challenges not only in the development of new applications that realize the power of the Web, but also in the technology needed to scale applications to accommodate the resulting large data sets and heavy loads. This thesis presents searching algorithms and hierarchical classification techniques for increasing a search service's understanding of web queries. Existing search services rely solely on a query's occurrence in the document collection to locate relevant documents. They typically do not perform any task or topic-based analysis of queries using other available resources, and do not leverage changes in user query patterns over time. Provided within are a set of techniques and metrics for performing temporal analysis on query logs. Our log analyses are shown to be reasonable and informative, and can be used to detect changing trends and patterns in the query stream, thus providing valuable data to a search service.
Single document keywords extraction in Bahasa Indonesia using phrase chunkingTELKOMNIKA JOURNAL
Keywords help readers to understand the idea of a document quickly. Unfortunately, considerable time and effort are often needed to come up with a good set of keywords manually. This research focused on generating keywords from a document automatically using phrase chunking. Firstly, we collected part of speech patterns from a collection of documents. Secondly, we used those patterns to extract candidate keywords from the abstract and the content of a document. Finally, keywords are selected from the candidates based on the number of words in the keyword phrases and some scenarios involving candidate reduction and sorting. We evaluated the result of each scenario using precision, recall, and F-measure. The experiment results show: i) shorter-phrase keywords with string reduction extracted from the abstract and sorted by frequency provides the highest score, ii) in every proposed scenario, extracting keywords using the abstract always presents a better result, iii) using shorter-phrase patterns in keywords extraction gives better score in comparison to using all phrase patterns, iv) sorting scenarios based on the multiplication of candidate frequencies and the weight of the phrase patterns offer better results.
IRJET- Review on Information Retrieval for Desktop Search EngineIRJET Journal
This document summarizes techniques for desktop search engines, including feature extraction using entity recognition, query understanding using part-of-speech tagging and segmentation, and similarity measures for scoring and ranking documents. It discusses using ontologies, concept graphs, semantic networks, and vector space models to represent knowledge in documents. Feature extraction identifies entities that can be mapped to knowledge bases to infer meanings. Query understanding aims to determine intent regardless of technique used. Similarity is measured using approaches like comparing maximum common subgraphs between a document and query graphs.
This document summarizes a research paper that proposes a framework for personalized web search using query log and clickthrough data. The framework implements a re-ranking approach that combines user search context and browsing behavior to generate personalized search results with high relevance. The framework consists of five components: a request handler, query processor, result handler, event handler, and response handler. The result handler applies a re-ranking approach using query log and clickthrough data to personalize search results before returning them to the user. An evaluation found the framework and re-ranking approach to be effective for personalized web search and information retrieval.
This document summarizes a research paper that proposes a novel framework for personalized web search using query log and clickthrough data. The framework implements a re-ranking approach to generate personalized search results with high relevance. It derives an extended set of user preferences and concepts based on extracted data from query logs and clickthrough information. An evaluation found the framework and re-ranking approach to be highly effective for personalized search and information retrieval.
Personalized web search using browsing history and domain knowledgeRishikesh Pathak
This document proposes a framework for improving personalized web search by constructing an enhanced user profile using both the user's browsing history and domain knowledge. The enhanced user profile is used to better suggest relevant web pages to the user based on their search query. An experiment found that suggestions made using the enhanced user profile performed better than using a standard user profile alone. The framework involves modeling the user, re-ranking search results, and displaying personalized results based on the enhanced user profile.
Multi Similarity Measure based Result Merging Strategies in Meta Search EngineIDES Editor
In Meta Search Engine result merging is the key
component. Meta Search Engines provide a uniform query
interface for Internet users to search for information.
Depending on users’ needs, they select relevant sources and
map user queries into the target search engines, subsequently
merging the results. The effectiveness of a Meta Search
Engine is closely related to the result merging algorithm it
employs. In this paper, we have proposed a Meta Search
Engine, which has two distinct steps (1) searching through
surface and deep search engine, and (2) Ranking the results
through the designed ranking algorithm. Initially, the query
given by the user is inputted to the deep and surface search
engine. The proposed method used two distinct algorithms
for ranking the search results, concept similarity based
method and cosine similarity based method. Once the results
from various search engines are ranked, the proposed Meta
Search Engine merges them into a single ranked list. Finally,
the experimentation will be done to prove the efficiency of
the proposed visible and invisible web-based Meta Search
Engine in merging the relevant pages. TSAP is used as the
evaluation criteria and the algorithms are evaluated based on
these criteria.
Quest Trail: An Effective Approach for Construction of Personalized Search En...Editor IJCATR
This document discusses developing a personalized search engine for software development organizations. It proposes using semantic analysis and genetic algorithms to personalize search results. Semantic analysis resolves ambiguity in queries by understanding their meaning, while genetic algorithms use machine learning to better understand user preferences over time. Quest analysis is also used to identify the goal or task behind a user's search by analyzing search logs at the quest level rather than query or session levels. Together these approaches aim to increase search relevance for users in software organizations by creating group profiles based on domain or project rather than individual user profiles.
The size of the Internet enlarging as per to grow the users of search providers continually demand search
results that are accurate to their wishes. Personalized Search is one of the options available to users in
order to sculpt search results based on their personal data returned to them provided to the search
provider. This brings up fears of privacy issues however, as users are typically anxious to revealing
personal info to an often faceless service provider along the Internet. This work proposes to administer
with the privacy issues surrounding personalized search and discusses ways that privacy can be improved
so that users can get easier with the dismissal of their personal information in order to obtain more precise
search results.
professional fuzzy type-ahead rummage around in xml type-ahead search techni...Kumar Goud
Abstract – It is a research venture on the new information-access standard called type-ahead search, in which systems discover responds to a keyword query on-the-fly as users type in the uncertainty. In this paper we learn how to support fuzzy type-ahead search in XML. Underneath fuzzy search is important when users have limited knowledge about the exact representation of the entities they are looking for, such as people records in an online directory. We have developed and deployed several such systems, some of which have been used by many people on a daily basis. The systems received overwhelmingly positive feedbacks from users due to their friendly interfaces with the fuzzy-search feature. We describe the design and implementation of the systems, and demonstrate several such systems. We show that our efficient techniques can indeed allow this search paradigm to scale on large amounts of data.
Index Terms - type-ahead, large data set, server side, online directory, search technique.
Query Recommendation by using Collaborative Filtering ApproachIRJET Journal
This document proposes a system called QDMiner to mine query facets from the top search results for a query. It uses collaborative filtering techniques to recommend the top-k results that are most relevant to a user's interests.
QDMiner first retrieves the top search results from a search engine. It then mines frequent lists from the HTML tags and free text within the results to identify query facets. It groups common lists and ranks the facets and items based on their appearances. QDMiner represents the search results in two models: the Unique Website Model and Context Similarity Model, to order the query facets.
To recommend results, QDMiner uses collaborative filtering techniques including item-based and user-based
Study and Implementation of a Personalized Mobile Search Engine for Secure Se...IRJET Journal
This document describes a proposed personalized mobile search engine (PMSE) that learns users' content and location preferences from clickthrough data and GPS locations to provide personalized search results. The PMSE uses an ontology-based user profile learned from clicks and locations, without requiring extra user effort. It has a client-server architecture where the client handles interactions and stores privacy-sensitive click data, while the server performs computationally intensive tasks like concept extraction, training, and result reranking. The PMSE aims to improve search personalization by separately considering content and location concepts based on user interests.
IRJET- A Novel Technique for Inferring User Search using Feedback SessionsIRJET Journal
This document proposes a novel technique to infer user search goals using feedback sessions. It aims to address limitations in existing approaches like noisy search results, small numbers of clicked URLs, and lack of consideration of user feedback. The proposed approach generates feedback sessions from user click logs, pre-processes the data, extracts keywords from restructured results, re-ranks the results based on keywords and user history, and categorizes the re-ranked results using predefined categories. The technique is evaluated using Average Precision, which compares it to other clustering and classification algorithms. The goal is to improve information retrieval by better representing user search interests and needs.
`A Survey on approaches of Web Mining in Varied Areasinventionjournals
There has been lot of research in recent years for efficient web searching. Several papers have proposed algorithm for user feedback sessions, to evaluate the performance of inferring user search goals. When the information is retrieved, user clicks on a particular URL. Based on the click rate, ranking will be done automatically, clustering the feedback sessions. Web search engines have made enormous contributions to the web and society. They make finding information on the web quick and easy. However, they are far from optimal. A major deficiency of generic search engines is that they follow the ‘‘one size fits all’’ model and are not adaptable to individual users.
A novel method to search information through multi agent search and retrieIAEME Publication
The document proposes a novel method for searching information through multi-agent search and retrieval that uses both content and context-based search. It describes a system that accepts two text inputs, processes them according to whether desktop or internet search is selected, and provides relevant results through indexing and multi-agents, with content search performed on Hadoop for increased performance. The system aims to provide faster and more accurate search results by filtering out irrelevant results.
Design of optimal search engine using text summarization through artificial i...TELKOMNIKA JOURNAL
Natural language processing is the trending topic in the latest research areas, which allows the developers to create the human-computer interactions to come into existence. The natural language processing is an integration of artificial intelligence, computer science and computer linguistics. The research towards natural Language Processing is focused on creating innovations towards creating the devices or machines which operates basing on the single command of a human. It allows various Bot creations to innovate the instructions from the mobile devices to control the physical devices by allowing the speech-tagging. In our paper, we design a search engine which not only displays the data according to user query but also performs the detailed display of the content or topic user is interested for using the summarization concept. We find the designed search engine is having optimal response time for the user queries by analyzing with number of transactions as inputs. Also, the result findings in the performance analysis show that the text summarization method has been an efficient way for improving the response time in the search engine optimizations.
Web search engines help users find useful information on the WWW. However, when the same
query is submitted by different users, typical search engines return the same result regardless of who
submitted the query. Generally, each user has different information needs for his/her query. Therefore,
the search results should be adapted to users with different information needs. So, there is need of
several approaches to adapting search results according to each user’s need for relevant information
without any user effort. Such search systems that adapt to each user’s preferences can be achieved by
constructing user profiles based on modified collaborative filtering with detailed analysis of user’s
browsing history.
There are three possible types of web search system which can provide personalized
information: (1) systems using relevance feedback, (2) systems in which users register their interest, and
(3) systems that recommend information based on user’s history. In first technique, users have to provide
feedback on relevant or irrelevant judgments which is time consuming and the second one needs
registration of users with their static interests which need extra effort from user. So, the third technique
is best in which users don’t have to give explicit rating; relevancy automatically tracked by user
behavior with search results and history of data usage. It doesn’t require registration of interests; it
captures changing interests of user dynamically by itself. The result section shows that user’s browsing
history allows each user to perform more fine-grained search by capturing changes of each user’s
preferences without any user effort. Users need less time to find the relevant snippet in personalized
search results compared to original results
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...IRJET Journal
This document discusses a proposed system for categorizing search engine results using conceptual clustering. The system analyzes the content of search results to extract relevant concepts, then uses a personalized conceptual clustering algorithm to generate a decision tree of query clusters. This tree can be used to identify categories for web pages and provide topically relevant results to users. The system aims to improve on traditional ranked search results by categorizing results based on the conceptual preferences and interests of individual users.
Web search engines help users find useful information on the WWW. However, when the same query is submitted by different users, typical search engines return the same result regardless of who submitted the query. Generally, each user has different information needs for his/her query. Therefore, the search results should be adapted to users with different information needs. So, there is need of
several approaches to adapting search results according to each user’s need for relevant information without any user effort. Such search systems that adapt to each user’s preferences can be achieved by constructing user profiles based on modified collaborative filtering with detailed analysis of user’s browsing history. There are three possible types of web search system which can provide personalized information: (1) systems using relevance feedback, (2) systems in which users register their interest, and (3) systems that recommend information based on user’s history. In first technique, users have to provide feedback on relevant or irrelevant judgments which is time consuming and the second one needs
registration of users with their static interests which need extra effort from user. So, the third technique is best in which users don’t have to give explicit rating; relevancy automatically tracked by user behavior with search results and history of data usage. It doesn’t require registration of interests; it captures changing interests of user dynamically by itself. The result section shows that user’s browsing history allows each user to perform more fine-grained search by capturing changes of each user’s
preferences without any user effort. Users need less time to find the relevant snippet in personalized
search results compared to original results.
Query Sensitive Comparative Summarization of Search Results Using Concept Bas...CSEIJJournal
Query sensitive summarization aims at providing the users with the summary of the contents of single or
multiple web pages based on the search query. This paper proposes a novel idea of generating a
comparative summary from a set of URLs from the search result. User selects a set of web page links from
the search result produced by search engine. Comparative summary of these selected web sites is
generated. This method makes use of HTML DOM tree structure of these web pages. HTML documents are
segmented into set of concept blocks. Sentence score of each concept block is computed with respect to the
query and feature keywords. The important sentences from the concept blocks of different web pages are
extracted to compose the comparative summary on the fly. This system reduces the time and effort required
for the user to browse various web sites to compare the information. The comparative summary of the
contents would help the users in quick decision making.
This document is a seminar report submitted for a master's degree that focuses on search engine optimization (SEO). It includes an abstract, introduction, sections on search engines and how they work, SEO basics, techniques for on-page and off-page optimization, challenges of SEO, and applications of SEO. The introduction discusses how search engines and SEO have become important for online marketing. Key sections explain the SEO process, on-page optimization including keywords and content, and off-page optimization such as links and social media. Challenges of measuring SEO success are also noted.
Call for paper 2012, hard copy of Certificate, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJCER, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, research and review articles, IJCER Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathematics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer review journal, indexed journal, research and review articles, engineering journal, www.ijceronline.com, research journals,
yahoo journals, bing journals, International Journal of Computational Engineering Research, Google journals, hard copy of Certificate,
journal of engineering, online Submission
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Similar to Context Sensitive Search String Composition Algorithm using User Intention to Handle Ambiguous Keywords (20)
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Neural network optimizer of proportional-integral-differential controller par...IJECEIAES
Wide application of proportional-integral-differential (PID)-regulator in industry requires constant improvement of methods of its parameters adjustment. The paper deals with the issues of optimization of PID-regulator parameters with the use of neural network technology methods. A methodology for choosing the architecture (structure) of neural network optimizer is proposed, which consists in determining the number of layers, the number of neurons in each layer, as well as the form and type of activation function. Algorithms of neural network training based on the application of the method of minimizing the mismatch between the regulated value and the target value are developed. The method of back propagation of gradients is proposed to select the optimal training rate of neurons of the neural network. The neural network optimizer, which is a superstructure of the linear PID controller, allows increasing the regulation accuracy from 0.23 to 0.09, thus reducing the power consumption from 65% to 53%. The results of the conducted experiments allow us to conclude that the created neural superstructure may well become a prototype of an automatic voltage regulator (AVR)-type industrial controller for tuning the parameters of the PID controller.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
A review on features and methods of potential fishing zoneIJECEIAES
This review focuses on the importance of identifying potential fishing zones in seawater for sustainable fishing practices. It explores features like sea surface temperature (SST) and sea surface height (SSH), along with classification methods such as classifiers. The features like SST, SSH, and different classifiers used to classify the data, have been figured out in this review study. This study underscores the importance of examining potential fishing zones using advanced analytical techniques. It thoroughly explores the methodologies employed by researchers, covering both past and current approaches. The examination centers on data characteristics and the application of classification algorithms for classification of potential fishing zones. Furthermore, the prediction of potential fishing zones relies significantly on the effectiveness of classification algorithms. Previous research has assessed the performance of models like support vector machines, naïve Bayes, and artificial neural networks (ANN). In the previous result, the results of support vector machine (SVM) were 97.6% more accurate than naive Bayes's 94.2% to classify test data for fisheries classification. By considering the recent works in this area, several recommendations for future works are presented to further improve the performance of the potential fishing zone models, which is important to the fisheries community.
Electrical signal interference minimization using appropriate core material f...IJECEIAES
As demand for smaller, quicker, and more powerful devices rises, Moore's law is strictly followed. The industry has worked hard to make little devices that boost productivity. The goal is to optimize device density. Scientists are reducing connection delays to improve circuit performance. This helped them understand three-dimensional integrated circuit (3D IC) concepts, which stack active devices and create vertical connections to diminish latency and lower interconnects. Electrical involvement is a big worry with 3D integrates circuits. Researchers have developed and tested through silicon via (TSV) and substrates to decrease electrical wave involvement. This study illustrates a novel noise coupling reduction method using several electrical involvement models. A 22% drop in electrical involvement from wave-carrying to victim TSVs introduces this new paradigm and improves system performance even at higher THz frequencies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Software Engineering and Project Management - Introduction, Modeling Concepts...Prakhyath Rai
Introduction, Modeling Concepts and Class Modeling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO modeling history. Modeling
as Design technique: Modeling, abstraction, The Three models. Class Modeling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams
Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data modeling Concepts, Object Oriented Analysis, Scenario-Based Modeling, Flow-Oriented Modeling, class Based Modeling, Creating a Behavioral Model.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
2. ISSN: 2088-8708
IJECE Vol. 7, No. 1, February 2017 : 432 – 450
433
inadequate [11], [12]. Although different users may use the same query keyword, their intent and context
may be different. Current search engines provide the same results to all users using the same keywords at a
given point in time. Personalization is desirable to better satisfy the needs of the user [13-15].
The following experiment illustrates this further. If a user searches for 'Michael Jackson' then search
engines return results for the famous singer Michael Jackson in majority of result pages. These results would
be treated as irrelevant and incorrect if the user intent was to search for professor Michael Jackson.
Table 1. Example search query done on Google [16] on 29th May 2015 result rows
Query String Michael Jackson Michael Jackson professor
Total Results About 39,00,00,000 About 7,89,00,000 results
Search Results as Singer First 13 pages and after Page 3 - 5th result
Search Results as Professor Page 17 8th result First page
Search Results as Software Development Page 13 last result Second page 2nd result
Search Results as VP Page 16 4th result Not present in the first 20 pages
As shown in Table 1, when one searches for the query string 'Michael Jackson', results for the singer
'Michael Jackson' are returned in the first 13 pages whereas no result is returned for the professor 'Michael
Jackson'. With each page containing 10 results, the relevant results start appearing after 130 result rows.
However, when a word 'professor' is added to the query string 'Michael Jackson', the results for professor
Michael Jackson are seen in the first result page itself. This demonstrates that if keywords based on user
intention are used then better hits can be obtained in the first few search result pages. Query expansion based
on user intention has shown to give better search results over large data sets like Web [7], [17].
Thus user intention can be used to disambiguate a query [18]. User context can include parameters
such as 'gender', 'age', 'topic', location' etc. It can be short-term [11] or long-term [18].
In the proposed method, user intention is identified with the help of user profile containing
parameters like 'gender', 'profession', 'interests', 'location' and past searches. User intention identified with
FM5 algorithm is used to reformulate the query. This paper brings together different IR (Information
Retrieval) areas like QAC (Query autocompletion), Query Personalization and automatic query expansion.
Our contributions are:
1) A novel user intention identification algorithm is proposed to predict user intention.
2) Query expansion is done using identified user intention to get improved precision for ambiguous search
strings.
3) Experimental evaluation of the method is conducted with dataset collected from users. The results reflect
improvement in user intention identification and precision of search results.
4) Results of query expansion using the identified user intention are compared with the results of Google
search engine [16] directly as first baseline and also with results obtained for ambiguous queries by
Chirita et al [7] as a second baseline.
In this paper, Section 2 describes the related work. Section 3 explains data description and how it is
used by the proposed system while Section 4 describes the FM5 user intention identification algorithm.
Results and discussion are described in Section 5. Conclusion is presented in Section 6.
2. RELATED WORK
2.1. Autocompletions and Personalization
Bhatia et al. [19] presents work where phrases and n-grams are mined from text collections and used
for generating autocompletions. Most popular completion i.e. autocompletions based on past popularity of
queries in query logs are modeled in Bar-Yossef and Kraus's work [20], [21]. Commercial search engines use
MPC (most popular completion) for query autocompletion [20]. Other query autocompletion methods
include personalized autocompletion, context based autocompletion using previous queries by user [20], time
based autocompletion [22], time and context based autocompletion [21]. Homologous queries and
semantically related terms are used to generate autocompletions by Cai et al. [12]
Personalization of query results by using the interests of users is done by many researchers [23-26].
User preferences are collected by either implicit or explicit method. Gender and age are used for
personalizing the results by Kharitonov and Serdyukov [27]. User context based on their recent queries is
generated and used to rank the query results in a session by Xiang et al [28]. Most of the research conducted
is for personalizing the query results by reranking them using user profile rather than query autocompletion.
3. IJECE ISSN: 2088-8708
Context Sensitive Search String Composition Algorithm using User Intention to … (Uma Gajendragadkar)
434
This paper proposes an algorithm that uses personalization for query completion or autocompletions in
search. An improvement in autocompletion ranking is claimed by personalization in Shokouhi's work [29].
Shokouhi et al. also presented ranking of autocompletions with a time-sensitive approach as per their
expected popularity [30]. Ambiguous queries are handled by Shoukhoi et al. by providing user context in
terms of session context. Query suggestion is achieved by using click information along with previous
queries in a session as context and then mining query log sessions for query reformulations [31]. This work is
similar to us but it does not consider long-term user context instead focuses on session based user context in
terms of click information and previous queries.
2.2. User Intention
Many studies have tried to identify user intention in different ways. Most of them try to categorize
the queries as informational, navigational and transactional as proposed by Jansen et al [32]. Given a query
suggestion, efforts have been done to understand the user intention using different means like web search
logs [26], [33-36], previous user's search log for same query [37], clicked pages [38], user's search session
history [39], Wikipedia [40], Wordnet and Google n-gram [41]. Using search query logs for existing users to
identify intention cannot guarantee the correctness of search results [37]. Search intent prediction along with
query autocompletion is a less explored area. According to Cheng et al., many searches are triggered by
browsed web pages [42]. Kong et al. tried to predict search intent using recently browsed news article before
search [43]. A large number of queries are triggered by news article daily [43]. Predicting search intent using
browsed pages is inadequate [43].Our proposed method uses live RSS newsfeed for query prediction. It
makes use of user profiles to predict the search intent.
2.3. Query Expansion
Query expansion is used to reformulate the original user query so as to improve retrieval of search results to
better satisfy user needs. One of them is relevance feedback using the returned results and adding new terms
related to the original query and selected documents [44]. Other methods include adding relevant terms based
on term frequency, document frequency from top ranked documents [45], [46], co-occurrence based
techniques [47], thesaurus based techniques [48- 51], desktop specific techniques [7], probability of terms
over search logs [52]. Our approach uses a user intention based keyword addition to expand the original
query to handle ambiguous query terms.
3. DATA DESCRIPTION
3.1. Data collection Methodology and Data Resources
The system uses different types of data sources. For temporal contextual corpustwo elements are
considered. One is static contextual data based on current month and the second is dynamic contextual data
based on daily current events. Based on the parameter „period', a month-wise list of occasions from Hindu
and Christian calendar is taken and their associated keywords list is built. Secondly based on daily current
events, RSS news feed from Reuters [54] is processed and a dataset of keywords is built [1]. The temporal
data is refreshed every day and also at restart of server. This contextual data is generated for both English and
Marathi-an Indian language popularly used in the state of Maharashtra by more than 70 million people.
Marathi n-gram dataset is also created by crawling Marathi websites for about four months and processing
the web pages and is available [55].
The proposed algorithm also uses data from various sources like Google n-gram [56] and
Wordnet [57] for English and Marathi Wordnet data [58]. How to use abovedescribed contextual data to
mine possible query autocompletions is discussed by Uma Gajendragadkar et al [1]. Autocompletions for all
sample test queries are collected from popular search engines for comparison. This is done foreach character
key press of all the test queries.
3.2. User Intention Based Query Expansion
„K‟ user profiles returned by KNN (K Nearest Neighbor) algorithm are used as input to the FM5
algorithm.
Let
̅ } (1)
be a set of profiles such that
4. ISSN: 2088-8708
IJECE Vol. 7, No. 1, February 2017 : 432 – 450
435
∑ ∑ ̅ } (2)
where P(Z) is the probability of the known user profiles and ̅ is the probability of unknown user profiles.
Let
| (3)
be the set of n query words and let
{ | (4)
be the set of m intentions identified. A trial is conducted by collecting random samples<ai; bj>. For each
sample query keyword ai, there could be multiple user intentions stored in intention matrix. If keyword
has two possible intentions then they are indicated by 'T' and all other intentions those are not applicable are
indicated by 'F'. Total 34 different user intentions are considered. For example consider keyword - 'Jaguar'. It
can have two possible user intentions - 'Automobile' and 'Wildlife'. Initially = 0 and = 0 when no query
keyword is typed or predicted and hence no user intentions are present.
3.3. Learning Method and Knowledge Generation
Association rule based learning method is used for user intention identification. Support and
confidence for an intention are computed for the predicted keyword. Association rule learning is used to find
interesting relations between different parameters in the data [59]. It finds strong rules in data based on
support and confidence measures. If a rule such as is found in the data, it
indicates that a customer is likely to buy coffee if the customer has bought both milk and sugar. Association
rule is used in many applications like market analysis, bioinformatics, web usage mining etc.
Minimum threshold values on support and confidence are used to find the interesting rules out of all possible
rules. If } is a set of items and is a set of transactions in
database , then a rule can be defined like where I and .
Support can be calculated as proportion of transactions containing the item set U. Say for
illustration, the item set {milk,sugar} has a support of 6/10 = 0.6 since it occurs in 60% of all transactions.
Confidence of rule can be calculated as the proportion of the transactions that contain both U and V.
(5)
For illustration, the rule has a confidence of 0.9 means in 90% of the
transactions that contain milk and sugar the rule holds true. Other user intention identification methods have
few drawbacks. Using web search logs for intent identification lacks in correct outcomes as the same query
responses have been provided to the users. Using click pages [38] is not very effective as user clicks do not
always translate to the result being relevant to search intent. User search session history [39] works only for a
session. No user intention prediction was done for ambiguous query in case of intent identification with
Wikipedia [40]. Table 2 shows few records from sample data considered for learning intent of a user for a
given keyword.
Table 2. Example Data for Association Rule Mining
Word Intent Gender Location Profession Interest
bond legal M India Lawyer Cooking
court legal M India Engineer Gardening
judge legal M USA Lawyer Books
law legal F UK Farmer TV
notary legal F India Doctor Painting
notice legal M India Lawyer Poems
search legal F India Engineer Gardening
java political F USA Doctor Poems
jaguar Automobile M USA Engineer Art
diabetes health M UK Doctor Theatre
interest social M India Farmer Photography
apple technological F India Engineer Wildlife
java technological M India Engineer Sports
bond movie M India Engineer Movies
5. IJECE ISSN: 2088-8708
Context Sensitive Search String Composition Algorithm using User Intention to … (Uma Gajendragadkar)
436
From the sample data, all rules having a support and confidence more than the threshold value are
considered. These rules are used to learn about the possible intention of a user about a keyword.
Let
| (6)
be the returned intentions and in Equation 4 then how one out of these is selected is explained in
section 4 of the paper. For the purpose of experimentation, two types of users are considered for the system -
registered user and unregistered user. First case is when a user logs in to the system (registered user with
known user profile in Z') and the second case is a user who does not log in to the system (unregistered users
with unknown user profiles in ̅) as in Equation 1. In the second case, if user does not login to the system then
the user profile is not available hence no personalization can be done and no learning happens. In the first
case, a User Profile is created during registration to the search system. This profile ' ' is created by
obtaining user preferences for a set of questions. The values are filled in by an explicit questionnaire asking
questions like 'What is your Profession?' and answer will set the value. User preferences are stored in the user
profile and the past searches done by the user are also stored in this profile. A bit vector representing user
profile is stored in the system for every registered user. This vector of different parameters forms a key for
each user.
is the set of user profile parameters to be considered for taking decision of next probable
alphabet/ numerical/symbol. The composition of search string is done using elements of set .
(7)
The system personalizes search strings based on these user preferences and learns from past
searches. After pressing a key character in search box, the system tries to predict the next character by using
the past searches of the registered user initially and later by using the pool of searches done by other users
having similar profiles to the current user.
Comparison of this method is done with KNN (K Nearest Neighbor) algorithm. The graphs in
Figure 1 show the performance of KNN for user intention identification with different K values on sample
data. As seen in Figure 1, KNN shows better performance with smaller K value for identifying the user
intention but the accuracy of the identification is less (about 39%). To better predict the user intention, we
have proposed the FM5 algorithm.
Figure 1. Performance of KNN with Sample Data
6. ISSN: 2088-8708
IJECE Vol. 7, No. 1, February 2017 : 432 – 450
437
4. PROPOSED METHOD
The objective is to find appropriate user intention for a search string being entered by a user in the
search box. In FM5 algorithm, user profiles are used to identify user intention for a search string being
entered in the search box. For each user a user profile is created during registration to the search system as
discussed in section 3. FM5 implements a funnel filter consisting of different meshes mapped to the user
profile parameters. Weights are applied to these meshes to disambiguate different user intentions of query
word as given in Equation 22. User can select the parameter or multiple parameters to be used. If users'
current search intention is related to his {'Interest'} rather than his {'Profession'} then only the parameters
like {'Interest, Gender, Location'} could be selected and other parameters like {'Profession'} could be
omitted.
Let
(8)
be the set of parameters considered for the experiment. For example, user profile consists of 5 parameters-
'Profession', 'Interests', 'Gender', 'Location' and 'Past searches'. For illustration purpose, higher weight is
assigned to {'Profession'} parameter followed by {'Gender', 'Interests', 'Location', 'Past searches'}
respectively. This is configurable and more parameters can be added to the funnel shown in Figure 2.
Figure 2. User Intention Identification Funnel
Let
(9)
be the set of weights such that
| (10)
The computation of these weights is explained in Equation 22.
Let Q be the prefix query input string which is progressively attached with an alphanumeric
character to complete the search string.
| (11)
where , , …. are characters to compose the search string assuming {n} character key presses. Initial
state of this set is empty. can be any character ranging from „a‟ to „z‟ and „0‟ to „9‟or characters like „:, ;, ~,
…‟ etc. Let is a partial search string.The composition of search string and related selection of are
done using elements of set as per Equation 7.
4.1. Proposed Circular Structure for User Profile Parameters
The user profiles are organized in a circular linked list as shown in Figure 3. A circular linked list is
used as one can add or remove parameters from the list easily and easy to traverse to reach an object.
Learning will add or subtract parameters from circular linked list. Sizeof the circular linked list will increase
or decrease accordingly. Let 'r' be the radius of the circle on which various user profiles are arranged. The
pointer in the circular linked list is placed as per the weight calculated by association rule in terms of support
as shown in Equation 20.
7. IJECE ISSN: 2088-8708
Context Sensitive Search String Composition Algorithm using User Intention to … (Uma Gajendragadkar)
438
Let be the cost
| (12)
associated with elements of Q such that
(13)
C= 1 when partial or complete search string does not exist in the search set or the algorithm fails to
predict the search string. C = 0 when search string is distinctly known then no search string prediction and
composition is required. The sorting of search string is done such that it always has the central tendency.
Figure 3. Circular Structure used for FM5 - user Intention Identification Algorithm
Since the search string is computed and mapped in the range [0,1], the central tendency predicts
most likely search string. Computation of logical search string is done with the various parameters of user
profile.Greedy algorithm is used for intention selection using cost (weight). Each time the mesh with largest
cost and greater than or equal to threshold value is selected as explained in Equation 21.
4.2. Mathematical Model for Funnel Mesh 5 (FM5) Algorithm
The algorithm in pseudo code form is represented in the Algorithm 1. Rest of the section explains it
in detail.
To illustrate it further, if = Profession then
= {Engineer, Doctor, Lawyer, Architect, ..} (14)
or combination of these. The system assumes that one user has one profession. If 6bits are used to store
profession parameter then 26
combinations are possible. Forexample, let the bit sequence '000001' indicates
profession = Engineer.The probability of choosing correct profession is
(15)
wheret = number of bits used to store the parameter. Let
(16)
be the past searches associated with the user profile vectors in circular linked list. Let
(17)
8. ISSN: 2088-8708
IJECE Vol. 7, No. 1, February 2017 : 432 – 450
439
be the past search strings associated with X3i . Then
(18)
where weight Wi is the support value calculated using association rule for search stringSi as shown in
Equation 20.
Si X3i a list of parameters is given by
(19)
Weight is computed using association rule of the form having value 0.5as the central tendency is
being calculated in circular linked list.
Algorithm 1. Funnel Mesh 5 (FM5) Algorithm
Here and and support value is given by
(20)
where N = total number of records in the circular linked list and X is a combination of parameters from X1
whose support 0.5.
Then, currentuser, starting with the prefix Q,
| (21)
(22)
let
(23)
9. IJECE ISSN: 2088-8708
Context Sensitive Search String Composition Algorithm using User Intention to … (Uma Gajendragadkar)
440
be the set of possible user intentions.Total 34 intentions are considered for the experimental setup.
Table 3 shows user intention taxonomy and example keywords for each of the intentions. For example, the
keyword „Seed‟falls under intention „Gardening‟ whereas „Stanza‟ belongs to „Poems‟.
For ai as in Equation 3, all search strings starting with prefix Q would be considered. Letthe set of search
strings starting with prefix be
(24)
Initial probability of choosing the next character is
| |
(25)
With each character keypressqi+1, this probability increases as | | (count of possible search strings) keeps
on reducing as shown in Figure 4. As in Figure 4, with every parameter having weight (support) greater than
the threshold, a mesh filtering is done and user intention list keeps on reducing and algorithm makes use of
the central tendency to identify matching intentions. From Equation 23, it is seen that there are 'h' intentions.
Initially there are 'h' intentions. Hence probability of choosing matching intention will be
(26)
where MI = matching intention.
Figure 4. User Intention Identification
For the purpose of experimental trial, total 34 intentions are considered as shown in User intention taxonomy
in Table 3. Table 3 shows example keywords for each of the intentions. For example, the keyword „seed‟ can
fall under „Agriculture‟intention, „song' in „Music‟.
Table 3. User intention taxonomy and examples
Intention Example Intention Example Intention Example
Social diwali Agriculture seed Movies actress
Technical java Bad meaning words shit Theater cast
Research survey Music song Art painting
Political parliament Cooking chef Craft weaving
Philosophy thought Sports cricket Painting color
Medical liver Gardening plants Travel destination
Military arsenal Health nutrition Hiking harness
Religious scripture Books author Social work society
Scientific experiment Writing article Sculpting idol
Legal battle Poems poet Photography light
New Generation twitter TV show Literature novel
Wildlife leopard
10. ISSN: 2088-8708
IJECE Vol. 7, No. 1, February 2017 : 432 – 450
441
Based on weights calculated in Equation 22, parameters are chosen from set X5i for each past search
string and a further function : Q is applied. gives a reduced set of matching intentions from initial
{‟h‟} intentions as shown in Figure 4. Let the returned multiset be given as
| (27)
So with recursive application of , using elements of , the probability becomes
(28)
After the last application of , whatever multiset of matching intentions is obtained, from that the final user
intention is chosen. For this, multiplicity of each member of the multiset is found and the member with
highest multiplicity is chosen as the final user intention.
(29)
where f(mi) = frequencies of intentions in
4.3. Query Expansion
Let Q be the original query selected by user.Let A be the set of ambiguous queries such that .
Let C be the set of context based words for each ambiguous word
{ | (30)
where m is maximum number of meanings associated with the word .
Query expansion patterns are used to expand the query selected by user based on user intention. Let be the
set of query expansion patterns available given by
{ | (31)
Let
(32)
be the function which maps the identified user intention to an expansion pattern from .
The FM5 algorithm identifies the user intention for a query using association rule and user profiles.
The original query is modified as
(33)
and given to search engine.
Table 4. FM5 Intention Identification Examples
Keyword Cost – Mesh selected Intentions of final user set
after filtering
Matching Intention
Jaguar
{User – Female, Engineer,
Music, India}
0.6 - Profession → Jaguar
0.7 - Location →Jaguar
Automobile
Wildlife
Automobile
Automobile
Automobile
Automobile
Java
{User – Female, Engineer,
Sports, India}
0.63 - Profession → Java
0.6 - Gender → Java
0.7 - Location → Java
Research
Technology
Technology
Technology
Bond
{User – Male, Engineer,
Movies, India}
0.76 - Location → Bond Movie
Legal
Movie
Movie
Legal
Movie
Movie
11. IJECE ISSN: 2088-8708
Context Sensitive Search String Composition Algorithm using User Intention to … (Uma Gajendragadkar)
442
Table 4 lists 3 sample test cases for FM5 algorithm. In the first column, test keyword and the profile
of the user entering test keyword is given. In the second column, the association rule (mesh parameter)
selected and its cost are given. In the third column, list of possible intentions obtained after filtering through
chosen meshes is given. In the fourth column, the matching intention generated as output for the keyword is
given. The first keyword Jaguar is entered by a user who is female and an engineer having music as her
interest and located in India. After computing the support (cost) for all possible association rules with user
profile parameters on LHS and keyword on RHS, only two rules are found having support greater than or
equal to threshold of 0.5. Hence two meshes – profession and location are applied. After filtering, a set of 5
users is obtained. From the past searches of these 5 users, intentions for keyword „Jaguar‟ are selected and
are displayed in the third column. Out of these 5, the most frequently occurring intention – „Automobile‟ is
returned as matching intention.
5. RESULTS AND DISCUSSION
Authors developed a questionnaire to collect the user profiles and the desired intents for the search
strings as shown in Figure 5. For English dataset-1, 25 users and 15 ambiguous queries per user and their
desired intent for each query were collected. Thus, 375 queries and intentions were evaluated for the first
dataset. For English dataset-2, 100 users and 20 ambiguous queries per user and their desired intent for each
query were collected. For English dataset-2 overall 2000 queries and intentions were evaluated. For Marathi
dataset, 20 users and 18 ambiguous queries per user were evaluated. Thus, Marathi dataset contained overall
360 queries and intentions. The survey was designed as a paper-and-pencil-based field survey to approach a
large number of users and a digital survey was also designed on the same lines. The paper-based
questionnaire was designed in two languages i.e. English and Marathi.To validate the proposed model, the
questionnaire was distributed to collect the user profile information and desired intention while searching for
various ambiguous queries. Population of the study comprises of Engineers, Doctors, Farmers and Lawyers.
Samples of 170 users were selected randomly. After scrutiny of filled questionnaire 150 were found to be fit
for the analysis. The users comprised of third year Engineering students from different streams of College of
Engineering Pune [60] as well as engineers, doctors, farmers and lawyers from different locations in India.
Figure 5. User Survey Questionnaire designed to collect user profile and intentions
12. ISSN: 2088-8708
IJECE Vol. 7, No. 1, February 2017 : 432 – 450
443
The system is evaluated for 25 users for English dataset-1 with about 15 ambiguous queries each.
The training data consists of about 55 user profiles and their past searches. Table 6 shows user intention
identification results for FM5 algorithm and KNN algorithm with different „k‟ values like 5 (KNN5), 10
(KNN10) and 15 (KNN15). Matched intention indicates the total number of ambiguous test queries for all
test users where the algorithm gave matching intention to the desired intention of user. Unmatched intentions
indicate the number of cases where algorithm failed to identify desired intention. The results for user
intention identification, obtained with FM5 are encouraging. For English ambiguous dataset-1, accuracy of
about 75% is observed with FM5 whereas with KNN an accuracy of about 38.4% is observed for KNN5 and
29.8% with KNN10 and 27% with KNN15.
Table 6. English Ambiguous Dataset Results
Intention/Method FM5 KNN15 KNN10 KNN5
Matched 282 102 112 144
Unmatched 93 273 263 231
Total 375 375 375 375
Accuracy 0.752 0.272 0.298667 0.384
The first graph in Figure 6 shows Matched intentions obtained for FM5 and KNN for different users
with various queries on English dataset. „Matched‟ is the legend used for cases where appropriate user
intention is obtained and „Unmatched‟ is the legend showing cases where the algorithm failed to identify
appropriate user intention. The second graph in Figure 6 shows comparison of FM5 with KNN algorithm for
different values of „k‟. FM5 gives better results than KNN.
Figure 6. Results for Ambiguous English Dataset-1
For Marathi ambiguous dataset evaluation, system is evaluated with 20 users with 18 ambiguous queries each
as shown in Table 7. Thus, 360 queries and intentions are evaluated for Marathi dataset. Accuracy of about
13. IJECE ISSN: 2088-8708
Context Sensitive Search String Composition Algorithm using User Intention to … (Uma Gajendragadkar)
444
77.5% is observed with FM5 algorithm whereas with KNN an accuracy of about 36.7% is observed for
KNN5 and 28.3% with KNN10 and 24.2% with KNN15.
Table 7. Marathi Ambiguous Dataset Results
Intention/Method FM5 KNN15 KNN10 KNN5
Matched 279 87 102 132
Unmatched 81 273 258 228
Total 360 360 360 360
Accuracy 0.775 0.242 0.283 0.366
User intention identification for search string with FM5 algorithm gives encouraging results.
Figure 7 depicts the results for Marathi dataset. The first graph in Figure 7 shows the total number of
Matched intentions obtained with FM5 and KNN for each user. The second graph in Figure 7 shows the total
number of matched and unmatched intentions for all users with FM5 and KNN.
Figure 7. Results for Ambiguous Marathi Dataset
The accuracy observed with FM5 and KNN algorithm for English dataset is plotted in Figure 8.
The accuracy observed with FM5 and KNN algorithm for Marathi dataset is plotted in Figure 9.
14. ISSN: 2088-8708
IJECE Vol. 7, No. 1, February 2017 : 432 – 450
445
Figure 8. Accuracy observed with FM5 User Intention Identification for English
Figure 9. Accuracy observed with FM5 User Intention Identification for Marathi
Further testing of the FM5 algorithm was done using English ambiguous dataset-2. The system is
further evaluated for 100 users for English dataset-2 with 20 ambiguous queries each as discussed earlier.
Table 8 shows user intention identification results for FM5 algorithm and KNN algorithm with different 'k'
values like 5 (KNN5), 10 (KNN10) and 15 (KNN15). Matched intention indicates the total number of
ambiguous test queries for all test users where the algorithm gave matching (correct) intention to the desired
intention of user. Unmatched intentions indicate the number of cases where algorithm failed to identify
desired intention. The results obtained for English dataset-2 with FM5 show improvement of 0.7% as
compared to English dataset-1. This may be because of increased number of past searches available while
computing the parameters to build the mesh. For English ambiguous dataset-2, accuracy of a 75.9% is
observed with FM5 whereas with KNN an accuracy of about 39.3% is observed for KNN5 and 29.05% with
KNN10 and 28.3% with KNN15.
Table 8. English Ambiguous Dataset-2 Results
Intention/Method FM5 KNN15 KNN10 KNN5
Matched 1518 566 581 786
Unmatched 482 1434 1419 1214
Total 2000 2000 2000 2000
Accuracy 0.759 0.283 0.2905 0.393
The first graph in Figure 10 shows Matched intentions obtained for FM5 and KNN for different
users with various queries on English dataset-2. „Matched‟ is the legend used for cases where appropriate
user intention is obtained and „Unmatched‟ is the legend showing cases where the algorithm failed to identify
15. IJECE ISSN: 2088-8708
Context Sensitive Search String Composition Algorithm using User Intention to … (Uma Gajendragadkar)
446
appropriate user intention. The second graph in Figure 10 shows comparison of FM5 with KNN algorithm
for different values of „k‟ for English dataset-2.
Figure 10. Results for Ambiguous English Dataset-2
Table 9 shows the results of query expansion after user intention identification is done. Top 50
URLs returned by Google API [16] were collected for each query after query expansion. The web pages of
these URLs were evaluated as either relevant or not relevant to the query under consideration. Table 9 shows
average precision values obtained for test queries after query expansion based on identified user intention.
Table 9. Average Precision after Query Expansion
Query Results Average Precision
Top 5 1
Top 10 0.986
Top 15 0.982
Top 20 0.984
Top 25 0.983
Top 30 0.98
Top 35 0.97
Top 40 0.964
Top 45 0.96
Top 50 0.955
Metric used for evaluation of search results returned after query expansion is
P@N = related queries / N (34)
It indicates how many valuable results are present in top N search results.
5.1. Precision Improvement with FM5 and Query Expansion
Top 50 URLs returned by Google were collected for each query. The results were evaluated as either
relevant or not relevant. Precision was calculated. It is given as
P = (Relevant results) / (Returned Results) (35)
16. ISSN: 2088-8708
IJECE Vol. 7, No. 1, February 2017 : 432 – 450
447
Graph in Figure 11 shows precision values for top 5, top 10 up to top 50 results obtained with our
system and with results from direct use of Google search engine. On X-axis, 1, 2, 3,.. indicate the average
precision values for all the queries given by the user for the top 5 search results, for top 10 search results, for
top 15 search results, … etc.
From the observed values, the system shows significant improvement in the average precision
values. The results are compared to the results obtained using Google search engine [16] and the results
obtained for ambiguous queries by Chirita et.al [7]. About 60.47% improvement is seen in average precision
with FM5 and query expansion as compared to the results obtained using Google [16]. This is our first
baseline comparison. Second baseline comparison is with the results obtained by Chirita et.al [7] and an
improvement of 40% is observed with the proposed method.
Figure 11. Improvement in Precision after Query Expansion
6. CONCLUSION
Composing the search string by providing better autocompletions to the user that will result in more
relevant and less redundant results is the goal of this research. In this algorithm, personalization is used to
add context and user intention to the search string composition. The algorithm selects the most appropriate
intention out of possible intentions for a keyword by using support as weight. The FM5 algorithm of intent
identification via funneling builds upon the advantages of simple k-nearest neighbor (KNN) algorithm.
The approach consists of identification of user intention with FM5 and then expanding original the
query based on this intention to obtain more relevant search results in the first few pages. FM5 user intention
identification algorithm uses association rule mining with user profiles and shows improvement in
performance as compared to KNN. This FM5 when extended with query expansion patterns shows
improvement in average precision values for ambiguous queries giving better search results. The system does
not use explicit feedback or other strategies like using click pages or session history for determining user
intention or for query expansion. Proposed user intention identification algorithm - FM5 showed
improvement in accuracy as compared to KNN.
Proposed query expansion approach using identified user intention with FM5 showed improvement
in average precision values for ambiguous queries giving better search results in top 50 pages. Experimental
results for the proposed approach and a comparison with direct use of search engine showed that performance
improved significantly. The proposed system provides better precision for search results for ambiguous
search strings with improved identification of the user intention for English language dataset as well as
Marathi (an Indian language) dataset of ambiguous search strings.
REFERENCES
[1] U. Gajendragadkar and S. Joshi, “User intended context sensitive mining algorithm for search string composition”,
in IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent AgentTechnology, WI-IAT 2015,
Singapore, December 6-9, 2015 - Volume I, 2015, pp. 233–236. [Online]. Available: http://dx.doi.org/10.1109/WI-
IAT.2015.212
[2] A.A.S. Mamoon H. Mamoon, Hazem M. El-Bakry, “Visualization for information retrieval based on fast search
technology”, Indonesian Journal of Electrical Engineering and Informatics (IJEEI), vol. 1, no. 1, pp. 27–42,Mar.
2013.
17. IJECE ISSN: 2088-8708
Context Sensitive Search String Composition Algorithm using User Intention to … (Uma Gajendragadkar)
448
[3] L. Bing, W. Lam, T.L. Wong, and S. Jameel, “Web query reformulation via joint modeling of latent
topicdependency and term context”, ACM Trans. Inf. Syst., vol. 33, no. 2, pp. 6:1–6:38, Feb. 2015. [Online].
Available: http://doi.acm.org/10.1145/2699666
[4] G. Salton and C. Buckley, “Improving retrieval performance by relevance feedback”, in Journal of the American
Society for Information Science, 1990, pp. 288–297.
[5] V. Lavrenko and W. B. Croft, “Relevance-based language models”, in Proceedings of SIGIR ‟01. New York, NY,
USA: ACM, 2001.
[6] M. Harvey, C. Hauff, and D. Elsweiler, “Learning by example: Training users with high-quality querysuggestions”,
in Proceedings of the SIGIR ‟15, ser. SIGIR ‟15. New York, NY, USA: ACM, 2015, pp.133–142. [Online].
Available: http://doi.acm.org/10.1145/2766462.2767731
[7] P.A. Chirita, C.S. Firan, and W. Nejdl, “Personalized query expansion for the web”, in Proceedingsof the 30th
Annual International ACM SIGIR Conference on Research and Development in InformationRetrieval, ser. SIGIR
‟07. New York, NY, USA: ACM, 2007, pp. 7–14. [Online]. Available:
http://doi.acm.org/10.1145/1277741.1277746
[8] A. Spink, H. Greisdorf, and J. Bateman, “From highly relevant to not relevant: Examining differentregions of
relevance”, Inf. Process. Manage, vol. 34, no. 5, pp. 599–621, Sep. 1998. [Online]. Available:
http://dx.doi.org/10.1016/S0306-4573(98)00025-9
[9] S.D. Torres and I. Weber, “What and how children search on the web”, in Proceeding CIKM ‟11. New York, USA:
ACM, 2011, pp. 393–402. IJECE Vol. x, No. x, October 2016: 1 – 20IJECE ISSN: 2088-8708 17
[10] S. Chelaru, I.S. Altingovde, S. Siersdorfer, and W. Nejdl, “Analyzing, detecting, and exploitingsentiment in web
queries”, ACM Trans. Web, vol. 8, no. 1, pp. 6:1–6:28, Dec. 2013. [Online]. Available:
http://doi.acm.org/10.1145/2535525
[11] H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li, “Context-aware query suggestion by mining
clickthroughand session data”, in KDD ‟08: Proceeding of the 14th ACM SIGKDD international conference on
Knowledge discovery and data mining. New York, NY, USA: ACM, 2008, pp. 875–883.
[12] F. Cai and M. de Rijke, “Learning from homologous queries and semantically related terms for query
autocompletion”, Information Processing & Management, 2016, (Online January 12, 2016.).
[13] D. Sullivan, “The older you are, the more you want personalized search”, 2004. [Online]. Available:
http://searchenginewatch.com/searchday/article.php/3385131
[14] M.R. Ghorab, D. Zhou, A. O‟connor, and V. Wade, “Personalised information retrieval: Survey andclassification”,
User Modeling and User-Adapted Interaction, vol. 23, no. 4, pp. 381–443, Sep. 2013. [Online]. Available:
http://dx.doi.org/10.1007/s11257-012-9124-1
[15] A.I.S. Shereen H Ali, Ali I El Desouky, “A new profile learning model for recommendation system basedon
machine learning technique”, Indonesian Journal of Electrical Engineering and Informatics (IJEEI), vol. 4, no. 1,
pp. 81–92, Mar. 2016.
[16] Google, “Google,” 2016, http://www.google.com.
[17] Z. Liu, S. Natarajan, and Y. Chen, “Query expansion based on clustered results”, Proc. VLDB Endow, vol. 4, no.
6, pp. 350–361, Mar. 2011. [Online]. Available: http://dx.doi.org/10.14778/1978665.1978667
[18] F. Liu, C. Yu, and W. Meng, “Personalized web search by mapping user queries to categories”, in Proceedingsof
the Eleventh International Conference on Information and Knowledge Management, ser. CIKM ‟02. NewYork,
NY, USA: ACM, 2002, pp. 558–565. [Online]. Available: http://doi.acm.org/10.1145/584792.584884
[19] S. Bhatia, D. Majumdar, and P. Mitra, “Query suggestions in the absence of query logs”, in Proceedings of the 34th
International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ‟11.
New York, NY, USA: ACM, 2011, pp. 795–804. [Online]. Available:http://doi.acm.org/10.1145/2009916.2010023
[20] N.K. Ziv Bar-Yossef, “Context-sensitive query auto-completion”, in Proceedings of the 20th International
Conferenceon World Wide Web, WWW 2011, Hyderabad, India, 2011.
[21] F. Cai, S. Liang, and M. de Rijke, “Time-sensitive personalized query auto-completion”, in Proceedings ofthe
CIKM ‟14, ser. CIKM ‟14. New York, NY, USA: ACM, 2014, pp. 1599–1608. [Online]. Available:
http://doi.acm.org/10.1145/2661829.2661921
[22] S. Whiting and J. M. Jose, “Recent and robust query auto-completion”, in Proceedings of the 23rd International
Conference on World Wide Web, ser. WWW ‟14. New York, NY, USA: ACM, 2014, pp. 971–982. [Online].
Available: http://doi.acm.org/10.1145/2566486.2568009
[23] P A. Chirita, W. Nejdl, R. Paiu, and C. Kohlsch¨utter, “Using odp metadata to personalize search”, in Proceedings
of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,
ser. SIGIR ‟05. New York, NY, USA: ACM, 2005, pp. 178–185. [Online]. Available:
http://doi.acm.org/10.1145/1076034.1076067
[24] X. Shen, B. Tan, and C. Zhai, “Implicit user modeling for personalized search”, in Proceedings of the 14th ACM
International Conference on Information and Knowledge Management, ser. CIKM ‟05. New York, NY, USA:
ACM, 2005, pp. 824–831. [Online]. Available: http://doi.acm.org/10.1145/1099554.1099747
[25] Z. Dou, R. Song, and J.-R. Wen, “A large-scale evaluation and analysis of personalized search strategies”, in
Proceedings of the 16th International Conference on World Wide Web, ser. WWW ‟07. New York, NY, USA:
ACM, 2007, pp. 581–590. [Online]. Available: http://doi.acm.org/10.1145/1242572.1242651
[26] J. Teevan, S.T. Dumais, and D.J. Liebling, “To personalize or not to personalize: Modeling queries withvariation
in user intent”, in Proceedings of SIGIR 08, ser. SIGIR ‟08. New York, NY, USA: ACM, 2008, pp. 163–170.
[Online]. Available: http://doi.acm.org/10.1145/1390334.1390364
18. ISSN: 2088-8708
IJECE Vol. 7, No. 1, February 2017 : 432 – 450
449
[27] E. Kharitonov and P. Serdyukov, “Demographic context in web search re-ranking”, in Proceedings of the 21st
ACM International Conference on Information and Knowledge Management, ser. CIKM ‟12. New York, NY,
USA: ACM, 2012, pp. 2555–2558. [Online]. Available: http://doi.acm.org/10.1145/2396761.2398690
[28] B. Xiang, D. Jiang, J. Pei, X. Sun, E. Chen, and H. Li, “Context-aware ranking in web search”, in Proceedings of
the 33rd International ACM SIGIR Conference on Research and Development inInformation Retrieval, ser. SIGIR
‟10. New York, NY, USA: ACM, 2010, pp. 451–458. [Online].
Available:http://doi.acm.org/10.1145/1835449.1835525
[29] M. Shokouhi, “Learning to personalize query auto-completion”, in Proceeding of SIGIR ‟13. New York, NY,
USA: ACM, 2013, pp. 103–112.
[30] M. Shokouhi and K. Radinsky, “Time-sensitive query auto-completion”, in Proceedings of SIGIR‟12, ser. SIGIR
‟12. New York, NY, USA: ACM, 2012, pp. 601–610. [Online].
Available:http://doi.acm.org/10.1145/2348283.2348364
[31] M. Shokouhi, M. Sloan, P.N. Bennett, K. Collins-Thompson, and S. Sarkizova, “Query suggestion and data fusion
in contextual disambiguation”, in Proceedings of the 24th International Conference on World Wide Web,
ser.WWW‟15. Republic and Canton of Geneva, Switzerland: InternationalWorldWideWeb Conferences
SteeringCommittee, 2015, pp. 971–980. [Online]. Available: http://dl.acm.org/citation.cfm?id=2736277.2741646
[32] B.J. Jansen and D. Booth, “Classifying web queries by topic and user intent”, in CHI ‟10 Extended Abstracts
onHuman Factors in Computing Systems, ser. CHI EA ‟10. New York, NY, USA: ACM, 2010, pp. 4285–4290.
[Online]. Available: http://doi.acm.org/10.1145/1753846.1754140
[33] R. Baeza-Yates, L. Calder´on-Benavides, and C. Gonz´alez-Caro, “The intention behind web queries”, in
Proceedings of the 13th International Conference on String Processing and Information Retrieval, ser. SPIRE‟06.
Berlin, Heidelberg: Springer-Verlag, 2006, pp. 98–109.
[34] M. Strohmaier and M. Kr¨oll, “Acquiring knowledge about human goals from search query logs”, Inf. Process.
Manage., vol. 48, no. 1, pp. 63–82, Jan. 2012. [Online]. Available: http://dx.doi.org/10.1016/j.ipm.2011.03.010
[35] D. Jiang, J. Pei, and H. Li, “Mining search and browse logs for web search: A survey”, ACM Trans. Intell. Syst.
Technol., vol. 4, no. 4, pp. 57:1–57:37, Oct. 2013. [Online]. Available:
http://doi.acm.org/10.1145/2508037.2508038
[36] A. Kathuria, B. J. Jansen, C. Hafernik, and A. Spink, “Classifying the user intent of web queries using k-means
clustering”, in Internet Research, Vol. 20 Iss: 5, pp.563 - 581, 2010.
[37] K. Park, H. Jee, T. Lee, S. Jung, and H. Lim, “Automatic extraction of user‟s search intention fromweb search
logs”, Multimedia Tools Appl., vol. 61, no. 1, pp. 145–162, 2012. [Online].
Available:http://dx.doi.org/10.1007/s11042-010-0723-8
[38] A. Das, C. Mandal, and C. Reade, “Determining the user intent behind web search queries by learning frompast
user interactions with search results”, in Proceedings of the 19th International Conference on Managementof Data,
ser. COMAD ‟13. Mumbai, India, India: Computer Society of India, 2013, pp. 135–138. [Online]. Available:
http://dl.acm.org/citation.cfm?id=2694476.2694508
[39] J.M. Fern´andez-Luna, J.F. Huete, and J.C. Rodr´ıguez-Cano, “User intent transition for explicit collaborative
search through groups recommendation”, in Proceedings of the 3rd International Workshop on Collaborative
Information Retrieval, ser. CIR ‟11. New York, NY, USA: ACM, 2011, pp. 23–28. [Online]. Available:
http://doi.acm.org/10.1145/2064075.2064083
[40] J. Hu, G. Wang, F. Lochovsky, J.-t. Sun, and Z. Chen, “Understanding user’s query intent with wikipedia”, in
Proceedings of the 18th International Conference on World Wide Web, ser. WWW ‟09. New York, NY, USA:
ACM, 2009, pp. 471–480. [Online]. Available: http://doi.acm.org/10.1145/1526709.1526773
[41] M. Hwang, D.H. Jeong, J. Kim, S.-K. Song, and H. Jung, “Activity inference for constructing user intention
model”, Comput. Sci. Inf. Syst., vol. 10, pp. 767–778, 2013.
[42] Z. Cheng, B. Gao, and T.Y. Liu, “Actively predicting diverse search intent from user browsing behaviors”, in
Proceedings of the 19th International Conference on World Wide Web, ser. WWW ‟10. New York, NY, USA:
ACM, 2010, pp. 221–230. [Online]. Available: http://doi.acm.org/10.1145/1772690.1772714
[43] W. Kong, R. Li, J. Luo, A. Zhang, Y. Chang, and J. Allan, “Predicting search intent based on pre-search context”,
in Proceedings of the 38th International ACM SIGIR Conference on Research and Development inInformation
Retrieval, ser. SIGIR ‟15. New York, NY, USA: ACM, 2015, pp. 503–512. [Online]. Available:
http://doi.acm.org/10.1145/2766462.2767757
[44] J.J. Rocchio, “Relevance feedback in information retrieval”, in The Smart retrieval system - experiments
inautomatic document processing, G. Salton, Ed. Englewood Cliffs, NJ: Prentice-Hall, 1971, pp. 313–323.
[45] A.M. Lam-Adesina and G.J.F. Jones, “Applying summarization techniques for term selection in relevance
feedback”, in Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development
in Information Retrieval, ser. SIGIR ‟01. New York, NY, USA: ACM, 2001, pp. 1–9. [Online]. Available:
http://doi.acm.org/10.1145/383952.383953
[46] J. Xu and W.B. Croft, “Query expansion using local and global document analysis”, in Proceedings of the 19th
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR
‟96. New York, NY, USA: ACM, 1996, pp. 4–11. [Online]. Available:http://doi.acm.org/10.1145/243199.243202
[47] M.C. Kim and K.S. Choi, “A comparison of collocation-based similarity measures in query expansion”, Inf.
Process. Manage, vol. 35, no. 1, pp. 19–30, Jan. 1999. [Online]. Available: http://dx.doi.org/10.1016/S0306-
4573(98)00040-5
[48] G. Miller, “Wordnet: An electronic lexical database”, in Communications of the ACM, vol. 38(11), 1995, p. 3941.
19. IJECE ISSN: 2088-8708
Context Sensitive Search String Composition Algorithm using User Intention to … (Uma Gajendragadkar)
450
[49] C. Shah and W.B. Croft, “Evaluating high accuracy retrieval techniques”, in Proceedings of the 27th Annual
International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ‟04.
New York, NY, USA: ACM, 2004, pp. 2–9. [Online]. Available: http://doi.acm.org/10.1145/1008992.1008996
[50] S. Kim, H. Seo, and H. Rim, “Information retrieval using word senses: root sense tagging approach”, in SIGIR
2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in
Information Retrieval, Sheffield, UK, July 25-29, 2004, 2004, pp. 258–265. [Online]. Available:
http://doi.acm.org/10.1145/1008992.1009038
[51] D. Fogaras and B. R´acz, “Scaling link-based similarity search”, in Proceedings of the 14th International
Conference on World Wide Web, ser. WWW ‟05. New York, NY, USA: ACM, 2005, pp. 641–650. [Online].
Available: http://doi.acm.org/10.1145/1060745.1060839
[52] H. Cui, J.R. Wen, J.Y. Nie, and W.Y. Ma, “Probabilistic query expansion using query logs”, in Proceedings of the
11th International Conference on World Wide Web, ser. WWW ‟02. New York, NY, USA: ACM, 2002, pp.325–
332. [Online]. Available: http://doi.acm.org/10.1145/511446.511489
[53] E. Di Buccio, M. Melucci, and F. Moro, “Detecting verbose queries and improving information retrieval”, Inf.
Process. Manage., vol. 50, no. 2, pp. 342–360, Mar. 2014. [Online]. Available:
http://dx.doi.org/10.1016/j.ipm.2013.09.003
[54] Reuters, “Reuters news feed”, 2014, http://feeds.reuters.com/reuters/.
[55] U. Gajendragadkar and S. Joshi, “Marathi language n-gram dataset”, in Indian Language Technology Proliferation
and Deployment Centre, 2015. [Online]. Available: ”http://www.tdildc.in/index.php?option=com
downloadtask¯showresourceDetailstoolid¯1644lang¯en”
[56] P. Norvig, Natural Language Corpus Data from Beautiful Data. Segaran and Hammerbacher, 2009.
[57] wordnet.princeton.edu, “Wordnet: http://wordnet.princeton.edu/,” March 2006, http://wordnet.princeton.edu/.
[58] M. IITB, “Marathi wordnet”, 2014, http://www.cfilt.iitb.ac.in/wordnet/webmwn/.
[59] R. Agrawal, T. Imieli´nski, and A. Swami, “Mining association rules between sets of items in large databases”, in
Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ‟93.
New York, NY, USA: ACM, 1993, pp. 207–216. [Online]. Available: http://doi.acm.org/10.1145/170035.170072
[60] C. of Engineering Pune, “Coep,” 2016, http://www.coep.org.in.
BIOGRAPHIES OF AUTHORS
Uma Gajendragadkar is a PhD Scholar at COEP, Savitribai Phule Pune University, Pune,
Maharashtra, India. She completed Masters in Computer Engineering from Mumbai
University, India in 2004 and Bachelors in Electronics Engineering in 1993 from Shivaji
University, Kolhapur, India. She worked as a TEQIP Research Fellow for past 4 years at
COEP, SPPU, Pune, Maharashtra, India. She has 8 years of experience in Software Industry
and 13 years of experience in Academics where she taught Undergraduate and Postgraduate
Engineeing students. She is member of IEEE and ACM for past eight years.
Dr. Sarang Joshiis a Professor at PICT, SavtribaiPhule Pune University, Pune, Maharashtra,
India. He completed his PhD in Computer Science and Engineering from Bharati Vidyapeeth,
Pune, India. He completed Masters in Computer Engineering and Bachelors in Computer
Engineering from University of Pune, India. He works as a Professor in Computer
Engineering at PICT, SPPU, Pune, Maharshtra, India for last 27 years. He was the Chairman
of Board of Studies of Computer Engineering at Savitribai Phule Pune University for past 3
years. He has written a book on Big Data Mining -Application Perspective ISBN: 978-81-203-
5116-5.