The document discusses dynamic user profiling for search personalization. It begins by describing classical search systems that return the same results for a given query regardless of individual user preferences. The research aims to enrich user profiles through dynamic group formation and temporal awareness to improve search personalization. For dynamic group formation, it constructs query-dependent user groups using latent Dirichlet allocation to model topic distributions. It then enriches individual user profiles by averaging over group profiles. For temporal profiles, it builds profiles that decay older document relevance over time to better reflect current interests. Evaluation on search logs shows the approaches outperform baselines by improving metrics like mean average precision.
Context Sensitive Search String Composition Algorithm using User Intention to...IJECEIAES
Finding the required URL among the first few result pages of a search engine is still a challenging task. This may require number of reformulations of the search string thus adversely affecting user's search time. Query ambiguity and polysemy are major reasons for not obtaining relevant results in the top few result pages. Efficient query composition and data organization are necessary for getting effective results. Context of the information need and the user intent may improve the autocomplete feature of existing search engines. This research proposes a Funnel Mesh-5 algorithm (FM5) to construct a search string taking into account context of information need and user intention with three main steps 1) Predict user intention with user profiles and the past searches via weighted mesh structure 2) Resolve ambiguity and polysemy of search strings with context and user intention 3) Generate a personalized disambiguated search string by query expansion encompassing user intention and predicted query. Experimental results for the proposed approach and a comparison with direct use of search engine are presented. A comparison of FM5 algorithm with K Nearest Neighbor algorithm for user intention identification is also presented. The proposed system provides better precision for search results for ambiguous search strings with improved identification of the user intention. Results are presented for English language dataset as well as Marathi (an Indian language) dataset of ambiguous search strings.
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...Aleksi Aaltonen
Presentation at the University of Miami on 3 December 2021 on how Stack Overflow improved the retention of new contributors whose initial question is rejected (closed) as substandard. The presentation is based on a paper coauthored with Sunil Wattal.
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...cseij
Query sensitive summarization aims at providing the users with the summary of the contents of single or multiple web pages based on the search query. This paper proposes a novel idea of generating a comparative summary from a set of URLs from the search result. User selects a set of web page links from the search result produced by search engine. Comparative summary of these selected web sites is generated. This method makes use of HTML DOM tree structure of these web pages. HTML documents are segmented into set of concept blocks. Sentence score of each concept block is computed with respect to the query and feature keywords. The important sentences from the concept blocks of different web pages are extracted to compose the comparative summary on the fly. This system reduces the time and effort required for the user to browse various web sites to compare the information. The comparative summary of the contents would help the users in quick decision making.
Context Sensitive Search String Composition Algorithm using User Intention to...IJECEIAES
Finding the required URL among the first few result pages of a search engine is still a challenging task. This may require number of reformulations of the search string thus adversely affecting user's search time. Query ambiguity and polysemy are major reasons for not obtaining relevant results in the top few result pages. Efficient query composition and data organization are necessary for getting effective results. Context of the information need and the user intent may improve the autocomplete feature of existing search engines. This research proposes a Funnel Mesh-5 algorithm (FM5) to construct a search string taking into account context of information need and user intention with three main steps 1) Predict user intention with user profiles and the past searches via weighted mesh structure 2) Resolve ambiguity and polysemy of search strings with context and user intention 3) Generate a personalized disambiguated search string by query expansion encompassing user intention and predicted query. Experimental results for the proposed approach and a comparison with direct use of search engine are presented. A comparison of FM5 algorithm with K Nearest Neighbor algorithm for user intention identification is also presented. The proposed system provides better precision for search results for ambiguous search strings with improved identification of the user intention. Results are presented for English language dataset as well as Marathi (an Indian language) dataset of ambiguous search strings.
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...Aleksi Aaltonen
Presentation at the University of Miami on 3 December 2021 on how Stack Overflow improved the retention of new contributors whose initial question is rejected (closed) as substandard. The presentation is based on a paper coauthored with Sunil Wattal.
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...cseij
Query sensitive summarization aims at providing the users with the summary of the contents of single or multiple web pages based on the search query. This paper proposes a novel idea of generating a comparative summary from a set of URLs from the search result. User selects a set of web page links from the search result produced by search engine. Comparative summary of these selected web sites is generated. This method makes use of HTML DOM tree structure of these web pages. HTML documents are segmented into set of concept blocks. Sentence score of each concept block is computed with respect to the query and feature keywords. The important sentences from the concept blocks of different web pages are extracted to compose the comparative summary on the fly. This system reduces the time and effort required for the user to browse various web sites to compare the information. The comparative summary of the contents would help the users in quick decision making.
The activity of finding significant data identified with a particular subject is troublesome in web because of the immensity of web information. This situation makes website streamlining strategies into an irreplaceable technique according to analysts, academicians, and industrialists. Inquiry history investigation is the definite examination of web information from various clients with the end goal of comprehension and upgrading web taking care of. Inquiry log or client seek history incorporates clients' beforehand submitted inquiries and their comparing clicked reports or locales' URLs. Accordingly question log investigation is considered as the most utilized technique for improving the clients' pursuit encounter. The proposed strategy investigates and groups client scan histories with the end goal of website streamlining. In this approach, the issue of getting sorted out clients' verifiable questions into bunches in a dynamic and robotized design is examined. The consequently arranged inquiry gatherings will help in various website streamlining systems like question proposal, item re-positioning, question adjustments and so on. The proposed strategy considers a question aggregate as an accumulation of inquiries together with the comparing set of clicked URLs that are identified with each other around a general data require. This technique proposes another strategy for joining word likeness measures alongside report similitude measures to frame a consolidated comparability measure. In the proposed strategy other question importance measures, for example, inquiry reformulation and clicked URL idea are likewise considered. Assessment comes about show how the proposed technique outflanks existing strategies.
Search Interface Feature Evaluation in BiosciencesZanda Mark
Read more here: http://pingar.com/
This paper reports findings on desirable interface features for different
search tasks in the biomedical domain. We conducted a user study where
we asked bioscientists to evaluate the usefulness of autocomplete, query
expansions, faceted refinement, related searches and results preview
implementations in new pilot interfaces and publicly available systems
while using baseline and their own queries. Our evaluation reveals that
there is a preference for certain features depending on the search task.
In addition, we touch on the current pain point of faceted search: the
acquisition of faceted subject metadata for unstructured documents.
We found a strong preference for prototypes displaying just a few facets
generated based on either the query or the matching documents.
Our evaluation reveals that there is a preference for certain features depending on the search task. In addition, we touch on the current pain point of faceted search: the acquisition of faceted subject metadata for unstructured documents. We found a strong preference for prototypes displaying just a few facets generated based on either the query or the matching documents.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text
classification. In this paper, Fast Fuzzy Feature clustering for text classification is proposed. It
is based on the framework proposed by Jung-Yi Jiang, Ren-Jia Liou and Shie-Jue Lee in 2011.
The word in the feature vector of the document is grouped into the cluster in less iteration. The
numbers of iterations required to obtain cluster centers are reduced by transforming clusters
center dimension from n-dimension to 2-dimension. Principle Component Analysis with slit
change is used for dimension reduction. Experimental results show that, this method improve
the performance by significantly reducing the number of iterations required to obtain the cluster
center. The same is being verified with three benchmark datasets
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
An Efficient Approach for Keyword Selection ; Improving Accessibility of Web ...dannyijwest
General search engines often provide low precise results even for detailed queries. So there is a vital need
to elicit useful information like keywords for search engines to provide acceptable results for user’s search
queries. Although many methods have been proposed to show how to extract keywords automatically, all
attempt to get a better recall, precision and other criteria which describe how the method has done its job
as an author. This paper presents a new automatic keyword extraction method which improves accessibility
of web content by search engines. The proposed method defines some coefficients determining features
efficiency and tries to optimize them by using a genetic algorithm. Furthermore, it evaluates candidate
keywords by a function that utilizes the result of search engines. When comparing to the other methods,
experiments demonstrate that by using the proposed method, a higher score is achieved from search
engines without losing noticeable recall or precision.
Text preprocessing is a vital stage in text classification (TC) particularly and text mining generally. Text preprocessing tools is to reduce multiple forms of the word to one form. In addition, text preprocessing techniques are provided a lot of significance and widely studied in machine learning. The basic phase in text classification involves preprocessing features, extracting relevant features against the features in a database. However, they have a great impact on reducing the time requirement and speed resources needed. The effect of the preprocessing tools on English text classification is an area of research. This paper provides an evaluation study of several preprocessing tools for English text classification. The study includes using the raw text, the tokenization, the stop words, and the stemmed. Two different methods chi-square and TF-IDF with cosine similarity score for feature extraction are used based on BBC English dataset. The Experimental results show that the text preprocessing effect on the feature extraction methods that enhances the performance of English text classification especially for small threshold values.
An Advanced IR System of Relational Keyword Search Techniquepaperpublications3
Abstract: Now these days keyword search to relational data set becomes an area of research within the data base and Information Retrieval. There is no standard process of information retrieval, which will clearly show the accurate result also it shows keyword search with ranking. Execution time is retrieving of data is more in existing system. We propose a system for increasing performance of relational keyword search systems. In the proposed system we combine schema-based and graph-based approaches and propose a Relational Keyword Search System to overcome the mentioned disadvantages of existing systems and manage the information and user access the information very efficiently. Keyword Search with the ranking requires very low execution time. Execution time of retrieving information and file length during Information retrieval can be display using chart.Keywords: Keyword Search, Datasets, Information Retrieval Query Workloads, Schema-based Systems, Graph-based Systems, ranking, relational databases.
Title: An Advanced IR System of Relational Keyword Search Technique
Author: Dhananjay A. Gholap, Gumaste S. V
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Paper Publications
The activity of finding significant data identified with a particular subject is troublesome in web because of the immensity of web information. This situation makes website streamlining strategies into an irreplaceable technique according to analysts, academicians, and industrialists. Inquiry history investigation is the definite examination of web information from various clients with the end goal of comprehension and upgrading web taking care of. Inquiry log or client seek history incorporates clients' beforehand submitted inquiries and their comparing clicked reports or locales' URLs. Accordingly question log investigation is considered as the most utilized technique for improving the clients' pursuit encounter. The proposed strategy investigates and groups client scan histories with the end goal of website streamlining. In this approach, the issue of getting sorted out clients' verifiable questions into bunches in a dynamic and robotized design is examined. The consequently arranged inquiry gatherings will help in various website streamlining systems like question proposal, item re-positioning, question adjustments and so on. The proposed strategy considers a question aggregate as an accumulation of inquiries together with the comparing set of clicked URLs that are identified with each other around a general data require. This technique proposes another strategy for joining word likeness measures alongside report similitude measures to frame a consolidated comparability measure. In the proposed strategy other question importance measures, for example, inquiry reformulation and clicked URL idea are likewise considered. Assessment comes about show how the proposed technique outflanks existing strategies.
Search Interface Feature Evaluation in BiosciencesZanda Mark
Read more here: http://pingar.com/
This paper reports findings on desirable interface features for different
search tasks in the biomedical domain. We conducted a user study where
we asked bioscientists to evaluate the usefulness of autocomplete, query
expansions, faceted refinement, related searches and results preview
implementations in new pilot interfaces and publicly available systems
while using baseline and their own queries. Our evaluation reveals that
there is a preference for certain features depending on the search task.
In addition, we touch on the current pain point of faceted search: the
acquisition of faceted subject metadata for unstructured documents.
We found a strong preference for prototypes displaying just a few facets
generated based on either the query or the matching documents.
Our evaluation reveals that there is a preference for certain features depending on the search task. In addition, we touch on the current pain point of faceted search: the acquisition of faceted subject metadata for unstructured documents. We found a strong preference for prototypes displaying just a few facets generated based on either the query or the matching documents.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text
classification. In this paper, Fast Fuzzy Feature clustering for text classification is proposed. It
is based on the framework proposed by Jung-Yi Jiang, Ren-Jia Liou and Shie-Jue Lee in 2011.
The word in the feature vector of the document is grouped into the cluster in less iteration. The
numbers of iterations required to obtain cluster centers are reduced by transforming clusters
center dimension from n-dimension to 2-dimension. Principle Component Analysis with slit
change is used for dimension reduction. Experimental results show that, this method improve
the performance by significantly reducing the number of iterations required to obtain the cluster
center. The same is being verified with three benchmark datasets
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
An Efficient Approach for Keyword Selection ; Improving Accessibility of Web ...dannyijwest
General search engines often provide low precise results even for detailed queries. So there is a vital need
to elicit useful information like keywords for search engines to provide acceptable results for user’s search
queries. Although many methods have been proposed to show how to extract keywords automatically, all
attempt to get a better recall, precision and other criteria which describe how the method has done its job
as an author. This paper presents a new automatic keyword extraction method which improves accessibility
of web content by search engines. The proposed method defines some coefficients determining features
efficiency and tries to optimize them by using a genetic algorithm. Furthermore, it evaluates candidate
keywords by a function that utilizes the result of search engines. When comparing to the other methods,
experiments demonstrate that by using the proposed method, a higher score is achieved from search
engines without losing noticeable recall or precision.
Text preprocessing is a vital stage in text classification (TC) particularly and text mining generally. Text preprocessing tools is to reduce multiple forms of the word to one form. In addition, text preprocessing techniques are provided a lot of significance and widely studied in machine learning. The basic phase in text classification involves preprocessing features, extracting relevant features against the features in a database. However, they have a great impact on reducing the time requirement and speed resources needed. The effect of the preprocessing tools on English text classification is an area of research. This paper provides an evaluation study of several preprocessing tools for English text classification. The study includes using the raw text, the tokenization, the stop words, and the stemmed. Two different methods chi-square and TF-IDF with cosine similarity score for feature extraction are used based on BBC English dataset. The Experimental results show that the text preprocessing effect on the feature extraction methods that enhances the performance of English text classification especially for small threshold values.
An Advanced IR System of Relational Keyword Search Techniquepaperpublications3
Abstract: Now these days keyword search to relational data set becomes an area of research within the data base and Information Retrieval. There is no standard process of information retrieval, which will clearly show the accurate result also it shows keyword search with ranking. Execution time is retrieving of data is more in existing system. We propose a system for increasing performance of relational keyword search systems. In the proposed system we combine schema-based and graph-based approaches and propose a Relational Keyword Search System to overcome the mentioned disadvantages of existing systems and manage the information and user access the information very efficiently. Keyword Search with the ranking requires very low execution time. Execution time of retrieving information and file length during Information retrieval can be display using chart.Keywords: Keyword Search, Datasets, Information Retrieval Query Workloads, Schema-based Systems, Graph-based Systems, ranking, relational databases.
Title: An Advanced IR System of Relational Keyword Search Technique
Author: Dhananjay A. Gholap, Gumaste S. V
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Paper Publications
Call for paper 2012, hard copy of Certificate, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJCER, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, research and review articles, IJCER Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathematics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer review journal, indexed journal, research and review articles, engineering journal, www.ijceronline.com, research journals,
yahoo journals, bing journals, International Journal of Computational Engineering Research, Google journals, hard copy of Certificate,
journal of engineering, online Submission
Web search engines help users find useful information on the WWW. However, when the same
query is submitted by different users, typical search engines return the same result regardless of who
submitted the query. Generally, each user has different information needs for his/her query. Therefore,
the search results should be adapted to users with different information needs. So, there is need of
several approaches to adapting search results according to each user’s need for relevant information
without any user effort. Such search systems that adapt to each user’s preferences can be achieved by
constructing user profiles based on modified collaborative filtering with detailed analysis of user’s
browsing history.
There are three possible types of web search system which can provide personalized
information: (1) systems using relevance feedback, (2) systems in which users register their interest, and
(3) systems that recommend information based on user’s history. In first technique, users have to provide
feedback on relevant or irrelevant judgments which is time consuming and the second one needs
registration of users with their static interests which need extra effort from user. So, the third technique
is best in which users don’t have to give explicit rating; relevancy automatically tracked by user
behavior with search results and history of data usage. It doesn’t require registration of interests; it
captures changing interests of user dynamically by itself. The result section shows that user’s browsing
history allows each user to perform more fine-grained search by capturing changes of each user’s
preferences without any user effort. Users need less time to find the relevant snippet in personalized
search results compared to original results
Web search engines help users find useful information on the WWW. However, when the same query is submitted by different users, typical search engines return the same result regardless of who submitted the query. Generally, each user has different information needs for his/her query. Therefore, the search results should be adapted to users with different information needs. So, there is need of
several approaches to adapting search results according to each user’s need for relevant information without any user effort. Such search systems that adapt to each user’s preferences can be achieved by constructing user profiles based on modified collaborative filtering with detailed analysis of user’s browsing history. There are three possible types of web search system which can provide personalized information: (1) systems using relevance feedback, (2) systems in which users register their interest, and (3) systems that recommend information based on user’s history. In first technique, users have to provide feedback on relevant or irrelevant judgments which is time consuming and the second one needs
registration of users with their static interests which need extra effort from user. So, the third technique is best in which users don’t have to give explicit rating; relevancy automatically tracked by user behavior with search results and history of data usage. It doesn’t require registration of interests; it captures changing interests of user dynamically by itself. The result section shows that user’s browsing history allows each user to perform more fine-grained search by capturing changes of each user’s
preferences without any user effort. Users need less time to find the relevant snippet in personalized
search results compared to original results.
User behavior model & recommendation on basis of social networks Shah Alam Sabuj
At present social networks play an important role to express people's sentiment and interest in a particular field. Extracting a user's public social network data (what the user shares with friends and relatives and how the user reacts over others' thought) means extracting the user's behavior. Defining some determined hypothesis if we make machine understand human sentiment and interest, it is possible to recommend a user about his/her personal interest on basis of the user's sentiment analyzed by machine. Our main approach is to suggest a user regarding the user's specific interest that is anticipated by analyzing the user's public data. This can be extended to further business analysis to suggest products or services of different companies depending on the consumer's personal choice. This automation will also help to choose the correct candidate for any questionnaire. This system will also help anyone to know about himself or herself, how one's behavior may influence others. It is possible to identify different types of people such as- dependable people, leadership skilled, people of supportive mentality, people of negative mentality etc.
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...GUANGYUAN PIAO
In this paper, we propose user modeling strategies which
use Concept Frequency - Inverse Document Frequency (CF-
IDF) as a weighting scheme and incorporate either or both
of the dynamics and semantics of user interests. To this end,
we first provide a comparative study on different user modeling strategies considering the dynamics of user interests in
previous literature to present their comparative performance.
In addition, we investigate different types of information (i.e.,
categories, classes and connected entities via various proper-
ties) for entities from DBpedia and the combination of them
for extending user interest profiles. Finally, we build our user
modeling strategies incorporating either or both of the best-
performing methods in each dimension. Results show that
our strategies outperform two baseline strategies significantly
in the context of link recommendations on Twitter.
User profiling is a fundamental component of any personalization applications. Most existing user profiling strategies are based on objects that users are interested in (i.e., positive preferences), but not the objects that users dislike (i.e., negative preferences). In this paper, we focus on search engine personalization and develop several concept-based user profiling methods that are based on both positive and negative preferences.
SUPPORTING PRIVACY PROTECTION IN PERSONALIZED WEB SEARCHnikhil421080
Personalized web search (PWS) has demonstrated its effectiveness in improving the quality of various search services on the Internet. However, evidence shows that users’ reluctance to disclose their private information during search has become a major barrier for the wide proliferation of PWS.
We study privacy protection in PWS applications that model user preferences as hierarchical user profiles. We propose a PWS framework called UPS that can adaptively generalize profiles by queries while respecting user-specified privacy requirements. Our runtime generalization aims at striking a balance between two predictive metrics that evaluate the utility of personalization and the privacy risk of exposing the generalized profile.
We present two greedy algorithms, namely GreedyDP and GreedyIL, for runtime generalization. We also provide an online prediction mechanism for deciding whether personalizing a query is beneficial. Extensive experiments demonstrate the effectiveness of our framework. The experimental results also reveal that GreedyIL significantly outperforms GreedyDP in terms of efficiency.
Web search engines (e.g. Google, Yahoo, Microsoft Live Search, etc.) are widely used to find certain data among a huge amount of information in a minimal amount of time. These useful tools also pose a privacy threat to users. Web search engines profile their users on the basis of past searches submitted by them. In the proposed system, we can implement the String Similarity Match Algorithm (SSM Algorithm) for improving better search quality results. To address this privacy threat, current solutions propose new mechanisms that introduce a high cost in terms of computation and communication. Personalized search is a promising way to improve the accuracy of web searches. However, effective personalized search requires collecting and aggregating user information, which often raises serious concerns of privacy infringement for many users. Indeed, these concerns have become one of the main barriers to deploying personalized search applications, and how to do privacy-preserving personalization is a great challenge. In this, we propose and try to resist adversaries with broader background knowledge, such as richer relationship among topics. Richer relationship means we generalize the user profile results by using the background knowledge which is going to store in history. Through this, we can hide the user search results. By using this mechanism, we can achieve privacy.
User Studies for APG: How to support system development with user feedback?Joni Salminen
Presentation at QCRI's Science Monday of the Social Computing group. January 14, 2019. Doha, Qatar. Access the Automatic Persona Generation system: https://persona.qcri.org
4 postsRe Topic 2 DQ 1Qualitative research produces a v.docxmeghanivkwserie
4 posts
Re: Topic 2 DQ 1
Qualitative research produces a variety of data, from a variety of sources. Data sources may be personal interviews (written or recorded), surveys, questionnaires, official documents or observation notes. To complicate matters, more often than not, there are numerous respondents or participants and multiple researchers. To extricate and code data from multiple data sources can be difficult, but made much easier if the data is organized appropriately. (Katherine B.2017)
The vast majority of qualitative data is "Unstructured Data," which includes documents, photographs, audio, and video.
The simplest things we can do to improve the usability of unstructured data for analysis are:
Convert it to a structured schema that can be evaluated with quantitative methods.
Make it simple to find.
On the first point, we can feed documents to full-text search engines such as Lucene, which make data retrieval simple. We can also design full text search engines to execute faceted searches, allowing us to attach Metadata facets (e.g., Author, Media Type, Creation Date, etc.) to enhance our quantitative research. The same search engine was used. (Bensal P and others…. 2010)
On the second point, there are a variety of methods for converting qualitative Unstructured Data into Structured Data (which may be quantitatively examined). But it all relies on what you want to do with the Structured Data and how you get it. You can, for example, create n-grams (continuous sequences of words) and then analyze those n-grams to identify what the most common terms are within a subset of texts.
You might wish to have someone manually transcribe all consumer references of a product when evaluating footage. There are already Machine Learning algorithms that can transcribe and recognize speech.
Machine Learning and Deep Learning programs that can extract usable and reliable quantitative data from qualitative data will be extremely important in the future of analytics. However, manual methods such as employing Amazon Mechanical Turk, or a combination of both, are equally viable options for extracting Quantitative Structured Data from Qualitative Unstructured Data.
Using 200-300 APA FORMAT with references to support this discussion,
Qualitative data has been described as voluminous and sometimes overwhelming to the researcher. Discuss two strategies that would help a researcher manage and organize the data.
.
UXPA 2023: Learn how to get over personas by swiping right on user rolesUXPA International
This session walks through the concept of user roles as an alternative to personas as a means to generate and disseminate user insights for product development teams. We will describe the tools and methods used to create a research database organized by user roles, along with examples and short exercises to help attendees think through user roles within their own context.
By the end of the session, attendees should be aware of tools and approaches for:
Organizing user research information in a database
Disseminating user role information to product and design teams
Managing a user roles database as part of a long term UX Research program
If you’re ready to ditch personas but don’t know how, this session is for you!
Supporting Exploratory People Search: A Study of Factor Transparency and User...Shuguang Han
People search is an active research topic in recent years. Related works includes expert finding, collaborator recommendation, link prediction and social matching. However, the diverse objectives and exploratory nature of those tasks make it difficult to develop a flexible method for people search that works for every task. In this project, we developed PeopleExplorer, an interactive people search system to support exploratory search tasks when looking for people. In the system, users could specify their task objectives by selecting and adjusting key criteria. Three criteria were considered: the content relevance, the candidate authoritativeness and the social similarity between the user and the candidates. This project represents a first attempt to add transparency to exploratory people search, and to give users full control over the search process. The system was evaluated through an experiment with 24 participants undertaking four different tasks. The results show that with comparable time and effort, users of our system performed significantly better in their people search tasks than those using the baseline system. Users of our system also exhibited many unique behaviors in query reformulation and candidate selection. We found that users’ general perceptions about three criteria varied during different tasks, which confirms our assumptions regarding modeling task difference and user variance in people search systems.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
The affect of service quality and online reviews on customer loyalty in the E...
Dynamic User Profiling for Search Personalisation
1. Thanh Vu
Computing and Communications
Department
The Open University
Dynamic User Profiling for Search
Personalisation
2. Classical Search Systems
2
AOL, Altavista return search results based on
The user input query
Regardless of the user searching preferences
Different users submit the same input query will
get the same returned result list
Queries are usually short and ambiguous, e.g.,
Michael Jordan, Java, etc.
Different users have different information needs
with the same input query
3. Search Personalisation
Return search results based on
The input query
The user searching interests
Different users submit the same input query will
probably get different search result lists
Even an individual user will get different search
results at different search times (e.g., Open US)
3
5. The performance of search
personalisation
depends on
the richness of a user profile
J. Teevan, M. R. Morris, and S. Bush. Discovering and using groups to improve personalized search. In WSDM’2009
5
6. Topic-based user profiles
Use Human generated ontology (ODP –
dmoz.org) to extract topics from all
clicked/relevant documents of a specific user to
build her profile
1. R. W. White, et al., Enhancing Personalized Search by Mining and Modeling Task Behavior. In WWW’2013
2. P. N. Bennett, et al., Modeling the impact of short- and long-term behavior on search personalization. In SIGIR’2012
6
7. Challenges for Human Generated
Ontology
New topics which are not covered in the Ontology
will possibly emerge overtime
Expensive human effort to classify/maintain each
document into correct categories
7
8. Enriching a user profile
Use information of the group of users who share
common interests
R. W. White, W. Chu, A. Hassan, X. He, Y. Song, and H. Wang. Enhancing personalized search by mining and
modeling task behavior. WWW '13, pages 1411-1420, Switzerland, 2013. ACM8
9. Challenges for grouping methods
Construct groups statically using some
predetermined criterions such as common clicked
documents
Users in a group may have different interests on
different topics w.r.t the input query
Z. Dou, R. Song, and J.-R. Wen. A large-scale evaluation and analysis of personalized search strategies. WWW '07,
pages 581-590, NY, USA, 2007. ACM.9
10. Research Question
How can we enrich user profiles with dynamic
group formation?
1. How can we dynamically group users who share
common interests?
2. How can we enrich user profiles with group
information?
3. Can enriched user profiles help to improve search
performance?
10
11. Dynamic group formation
The groups should be dynamically constructed
in response to the user’s input query
11
18. Enriching a user profile
Average all users in the group over topics
18
19. Re-ranking search results
For each input query q
Download the top n ranked search results from the
search engine
Compute a personalised score for each web page d
given the current user u – p(d|u)
Combine the personalised score p(d|u) and the
original rank r(q,d), to get a final score
),(
)|(
),|(
dqr
udp
qudf
19
21. Dataset
Query logs from Bing search engine for 15 days
from 1st to 15th July 2012, 106 anonymous users
A relevant document is a click with dwell time of
at least 30 seconds or the last click in a session
(SAT click)
21
23. Baseline and Personalisation
Strategies
Baseline and Personalisation Strategies
Baseline: The original ranked results from Bing
S_Profile: Use only the current user profile
S_Group: Enrich the profile with static group
D_Group: Enrich the profile with dynamic group
23
26. Challenges for Time-awareness
Previous methods use all the clicked/relevant
documents of a user to build her searching profile
The documents are treated equally without
considering temporal features (i.e., the time of
documents being clicked and viewed)
The profile is too broad
Cannot fully express the current interest of the user
1. T. T. Vu, et al., Improving search personalisation with dynamic group formation. In SIGIR’2014
2. K. Raman, et al., Toward whole-session relevance: Exploring intrinsic diversity in web search. In SIGIR’2013
26
27. Research Question
How can we build user profiles with time-
awareness?
1. How can we build temporal user profiles?
2. Can the time-aware profiles help improve search
performance?
27
28. Building temporal user profiles
(1)
Non-temporal method
4th 1st2nd3rd
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Clicked documents
Football
Law
OS
Health
0.55
0.27
0.10
0.08
Law
OS
Health
Football
0.41
0.37
0.12
0.10
OS
Law
Football
Health
0.65
0.21
0.10
0.04
Distribution over topics
Football
Law
OS
Health
0.32
0.30
0.29
0.09
Means over topics
The topic-based user profile
28
29. Building temporal user profiles
(2)
Our method
1st
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Football
Law
Health
OS
0.51
0.33
0.11
0.05
The temporal topic user profile
0.90
29
32. OS
Law
Football
Health
0.32
0.30
0.29
0.09
Building temporal user profiles
(2)
4th 1st2nd3rd
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Football
Health
OS
Law
0.55
0.27
0.10
0.08
Law
OS
Health
Football
0.41
0.37
0.12
0.10
OS
Law
Football
Health
0.65
0.21
0.10
0.04
Temporal topic profile
0.93
0.92 0.91
0.90
Football
Law
OS
Health
0.32
0.30
0.29
0.09
Non-temporal topic profile
32
33. Building temporal user profiles
(3)
Du = {d1, d2, …, dn} is a relevant document set of
the user u
The user profile of u is a distribution over the
topic Z (extracted by LDA)
tdi = n indicates that di is the nth most
relevant/clicked document of u
α is the decay parameter; K is the normalisation
factor
33
34. Building temporal user profiles
(4)
Long-term user profile
Use relevant documents extracted from the user’s
whole search history
Daily user profile
Use relevant documents extracted from the search
history of the user in the current searching day
Session user profile
Use relevant documents extracted from the search
history of the user in the current search session
34
35. Re-ranking search results (1)
1 32
Health
Law
Football
OS
0.51
0.33
0.11
0.05
Football
Law
Health
OS
0.55
0.27
0.13
0.05
Football
OS
Health
Law
0.41
0.37
0.12
0.10
Original Rank
132
Health
Law
Football
OS
0.51
0.33
0.11
0.05
Football
Law
Health
OS
0.55
0.27
0.13
0.05
Football
OS
Health
Law
0.41
0.37
0.12
0.10
After re-ranking
Football
Law
OS
Health
0.47
0.24
0.16
0.12
The user profile (p)
35
36. Re-ranking search results (2)
Personalised scores
Use Jensen-Shannon divergence (DJS[d||p] )
1 32
Health
Law
Football
OS
0.51
0.33
0.11
0.05
Football
Law
Health
OS
0.55
0.27
0.13
0.05
Football
OS
Health
Law
0.41
0.37
0.12
0.10
Football
Law
OS
Health
0.47
0.24
0.16
0.12
Returned documents (d)
The user profile (p)
36
37. Re-ranking search results (3)
Re-ranking Features
Re-Ranking Algorithm: LambdaMART[1]
1. C. J. Burges, et al., Learning to rank with non-smooth cost functions. In NIPS’2007.
Feature Description
Personalised Features
LongTermScore Personalised score between document and long-term
profile
DailyScore Personalised score between document and daily profile
SessionScore Personalised score between document and session
profile
Non-personalised Features
DocRank Rank of document on original returned list
QuerySim Cosine similarity score between current and previous
queries
QueryNo Total number of queries that have been submitted in the
current search session (included the current query)
37
38. Evaluation
Dataset
The query logs of 1166 anonymous users in four
weeks, from 01st to 28th July 2012
A log entity consists of an anonymous user
identifier, a query, top-10 returned URLs, and
clicked documents along with the user’s dwell
time
Download all the URLs’ content for learning topics
A search session is demarcated by 30 minutes of
user inactivity
A relevant document is a click with dwell time of
at least 30 seconds or the last click in a session
(SAT click)38
39. Evaluation methodology
Assign a positive (relevant) label to a returned
URL if
it is a SAT click in the current query
it is a SAT click in one of the other repeated queries
in the same search session
Assign negative (irrelevant) labels to the rest of
URLs
39
40. Personalisation Methods and
Baselines
Personalisation Methods
LON uses only LongTermScore from long-term
profile
DAI uses only DailyScore from daily profile
SES uses SessionScore from session profile
ALL uses all personalised scores from three
profiles (ALL)
Baselines
Default is the default ranking returned by the
search engine
Static uses the LongTermScore from long-term
profile without time-awareness (i.e., not using decay
function)40
41. Results
Evaluation metrics
Mean Average Precision (MAP)
Precision (P@k)
Mean Reciprocal Rank (MRR)
Normalized Discounted Cumulative Gain
(nDCG@k)
For each evaluation metric, the higher value
indicates the better ranking
41
42. Overall Performance
• All the improvements over the baselines
are significant with paired t-test of p <
0.001
42
47. Takeaways
Dynamic Grouping
Grouping improves search performance
Dynamic grouping outperforms static grouping
Temporal profiles
Three temporal profiles help to improve search
performance over the default ranking and the use of
non-temporal profile
Using all features (ALL) achieves the highest
performance
The short-term profile achieves better performance
than the longer-term profile
47
51. Click Entropies
P(d|q) is the percentage of the clicks on
document d among all the clicks for q
A smaller query click entropy value indicates
more agreement between users on clicking a
small number of web pages
51
53. Query Positions in Search
Session
Aim to study whether the position of a query has
any effect on the performance of the temporal
latent topic profiles
Label the queries by their positions during the
search
53
56. Pre-processing
Remove the queries whose positive label set is
empty from the dataset
Discard the domain-related queries (e.g.,
Facebook, Youtube)
56
Use the rank positions of the positive label as the ground truth to evaluate the search performance before and after re-ranking
The session profile (SES) achieves better performance than the daily profile (DAI). It also shows that the daily profile (DAI) gains advantage over the long-term profile (LON). This indicates that the short-term profiles capture more details of user interest than the longer ones.
The combination of all features (ALL) achieves the highest performance.
Three temporal profiles help to improve search performance over default ranking and the use of non-temporal profile
Using all features (ALL) achieves the highest performance
The session profile achieves better performance than the daily profile
The daily profile gains advantages over the long-term profile
Without time-awareness, the long-term profile gets no improvement over the default ranking
Show the improvement of the temporal profiles over the Default baseline using MAP metric for different magnitudes of click entropy
we show the improvement of the temporal profiles over the Default ranking from the search engine in term of MAP metric for different magnitudes of click entropy. Here the statistical significance is also guaranteed with the use of paired t-test (p < 0:001).
With smaller value of click entropy, the re-ranking performance is only slightly improved. For example, with click entropy between 0 and 0.5, the improvement of the MAP metric from long-term profile is of only 0.39%, in comparison with the original search engine. One may see that the effectiveness of the temporal pro les is increasing proportionally according to the value of click entropy.
The highest improvements are achieved when click entropies are >= 2
A query usually has a broader influence in a search session than only returning a list of URLs. The position of a query in a search session is also important because it may be fine-tuned by a user after the unsatisfactory results from previous queries.
In this experiment we aim to study whether the position of a query has any effect on the performance of the temporal latent topic profiles.
For each session, we label the queries by their positions during the search