This document describes research into understanding user goals in web search. The researchers developed a framework that categorizes search goals into navigational, informational, and resource-seeking categories. They then manually classified queries from a search engine log according to this framework. Their analysis suggests that navigational searches are less common than believed, while resource-seeking goals may account for many searches. Understanding search goals could help improve search engines by tailoring results and algorithms to the user's purpose.
This is a high-level summary of three important ways to help people find information. The slides were presented at Vera Rhoades' information architecture class at the University of Maryland.
Is search always the right solution? There are many things you can do with a hammer, but it’s not so great if you need to turn a screw.
Text Classification is an alternative to search that may be more appropriate for social media data analysis. Text classification is the task of assigning predefined categories to free-text documents. It can provide conceptual views of document collections and has important applications in the real world. Using text classification as the foundation for analysis – i.e., teaching a machine to categorize posts the way humans do – can dramatically improve your ability to gather the right data and, ultimately, increase the chances that you’ll uncover what you need to know.
This document discusses semantic search and how it can improve traditional information retrieval systems. It provides examples of how semantic search uses structured data and schemas to better understand user intent and content meaning. This allows semantic search to enhance various stages of the information retrieval process from query interpretation to result presentation. The document also outlines the growing adoption of semantic web standards like RDFa and schema.org to expose structured data on webpages.
The document discusses semantic search capabilities at Yahoo. It describes how Yahoo has developed techniques to extract structured data and metadata from webpages to power enhanced search results. This includes information extraction, data fusion, and curating knowledge in a graph. Yahoo uses this knowledge to better understand search queries and present relevant entities and attributes in results. Semantic search remains an active area of research.
Here are some options for completing your query:
- Freddie Mercury was the lead singer of Queen
- Brian May was the guitarist for Queen
- Queen was a British rock band formed in 1970
- Freddie Mercury died in 1991 from complications due to AIDS
Liquid Query: Multi-domain Exploratory Search on the WebAlessandro Bozzon
The document summarizes the Liquid Queries approach for multi-domain exploratory search across the web. Liquid Queries allows users to formulate queries across multiple semantic domains through an iterative process of querying services, exploring results, and refining queries. It aggregates results from different search services and domains, highlights the contribution of each, and allows joining results based on structural information. Key aspects include the Liquid Query template and life cycle, backend implementation using YQL, and demonstrations of the prototype. Future work aims to improve user evaluation, interactions, and personalization.
What IA, UX and SEO Can Learn from Each OtherIan Lurie
Google has become the arbiter how users experience a website. Their data-driven determinants of what constitute good UX directly influence how a site is found. This is wrong because people, not machines, should determine experience; Google does not tell the SEO or UX community what data is used to measure experience and many elements of experience cannot be measured.This presentation reveals why Google uses UX signals to determine placement in search results and how to create a customer pleasing and highly visible user experience for your website.
This is a high-level summary of three important ways to help people find information. The slides were presented at Vera Rhoades' information architecture class at the University of Maryland.
Is search always the right solution? There are many things you can do with a hammer, but it’s not so great if you need to turn a screw.
Text Classification is an alternative to search that may be more appropriate for social media data analysis. Text classification is the task of assigning predefined categories to free-text documents. It can provide conceptual views of document collections and has important applications in the real world. Using text classification as the foundation for analysis – i.e., teaching a machine to categorize posts the way humans do – can dramatically improve your ability to gather the right data and, ultimately, increase the chances that you’ll uncover what you need to know.
This document discusses semantic search and how it can improve traditional information retrieval systems. It provides examples of how semantic search uses structured data and schemas to better understand user intent and content meaning. This allows semantic search to enhance various stages of the information retrieval process from query interpretation to result presentation. The document also outlines the growing adoption of semantic web standards like RDFa and schema.org to expose structured data on webpages.
The document discusses semantic search capabilities at Yahoo. It describes how Yahoo has developed techniques to extract structured data and metadata from webpages to power enhanced search results. This includes information extraction, data fusion, and curating knowledge in a graph. Yahoo uses this knowledge to better understand search queries and present relevant entities and attributes in results. Semantic search remains an active area of research.
Here are some options for completing your query:
- Freddie Mercury was the lead singer of Queen
- Brian May was the guitarist for Queen
- Queen was a British rock band formed in 1970
- Freddie Mercury died in 1991 from complications due to AIDS
Liquid Query: Multi-domain Exploratory Search on the WebAlessandro Bozzon
The document summarizes the Liquid Queries approach for multi-domain exploratory search across the web. Liquid Queries allows users to formulate queries across multiple semantic domains through an iterative process of querying services, exploring results, and refining queries. It aggregates results from different search services and domains, highlights the contribution of each, and allows joining results based on structural information. Key aspects include the Liquid Query template and life cycle, backend implementation using YQL, and demonstrations of the prototype. Future work aims to improve user evaluation, interactions, and personalization.
What IA, UX and SEO Can Learn from Each OtherIan Lurie
Google has become the arbiter how users experience a website. Their data-driven determinants of what constitute good UX directly influence how a site is found. This is wrong because people, not machines, should determine experience; Google does not tell the SEO or UX community what data is used to measure experience and many elements of experience cannot be measured.This presentation reveals why Google uses UX signals to determine placement in search results and how to create a customer pleasing and highly visible user experience for your website.
This document provides guidance on how to evaluate the reliability of websites for research purposes. It identifies appropriate websites like academic journals, government publications, and encyclopedias. Inappropriate websites include personal blogs, forums, wikis, and commercial sites. To evaluate reliability, check the web address extension, background of the author and organization, and references cited. Examples demonstrate how to apply these criteria to determine if specific websites are reliable sources for research. Wikipedia can only be used to consult the references of topics as a last resort.
The document provides guidance on initial steps for developing a search application, including validating the need for full-text search, identifying ideal search results, considering clustering results, and producing requirements and choosing a technology. Some key recommendations include sketching out ideal results for sample queries, determining how results should be ordered and presented, and considering if and how results could be clustered. Determining ideal results and clustering options can help drive specific requirements and the selection of an appropriate technology.
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...Bradley Allen
Faceted navigation relies on metadata to organize and navigate large collections of information. Users are becoming an important source of metadata in the form of user-generated tags and annotations. By combining user-generated metadata with traditional subject indexing, new applications of faceted navigation can be created that bridge folksonomies and taxonomies to provide more compelling ways to explore and discover online information.
This document provides information on advanced Google searching techniques. It discusses how search engines work and user expectations. Various search operators and strategies are described, such as phrase searches, Boolean operators, title searches, URL searches, and site-limited searches. The document recommends beginning with a title field search using Boolean expressions that is limited to a top-level domain or specific website to find the most relevant information.
The document discusses search engines and how they have evolved over time. It explains that early search engines ranked results based mainly on content, while modern engines also consider factors like page structure, popularity, and reputation. The document provides definitions of key search-related terms and outlines some of the main components and processes involved in how search engines work, such as crawling websites, indexing pages, and ranking results. It also discusses different types of search tools and how to choose the best one depending on your information needs.
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...inventionjournals
This document discusses an enhanced web usage mining system using fuzzy clustering and collaborative filtering recommendation algorithms. It aims to address challenges with existing recommender systems like producing low quality recommendations for large datasets. The system architecture uses fuzzy clustering to predict future user access based on browsing behavior. Collaborative filtering is then used to produce expected results by combining fuzzy clustering outputs with a web database. This approach aims to provide users with more relevant recommendations in a shorter time compared to other systems.
The impact of domain-specific stop-word lists on ecommerce website search per...kaufmanmpbbjegmwn
This study aimed to improve search performance on ecommerce websites like eBay by developing a domain-specific stop word list for the furniture category. The researchers created a corpus of over 36,000 eBay furniture listings and used linguistic analysis and frequency counts to identify high-frequency, low-value words to add to a standard stop word list. Search times using the domain-dependent list were compared to a standard list and no list (control group) using furniture search queries. As expected, search times were fastest when using the domain-dependent list, demonstrating the potential for improved ecommerce search performance through specialized stop word filtering.
The document discusses the evolution of search engines from basic keyword search to semantic search using knowledge graphs and structured data. It provides examples of how search engines like Google are now able to provide direct answers to queries by searching structured data rather than just documents. It emphasizes the importance of representing web content as structured data using schemas like schema.org to be discoverable in semantic search and knowledge graphs.
This presentation teaches effective internet research skills. It discusses different types of search engines such as regular engines like Google and Yahoo compared to metasearch engines that search multiple engines. It recommends advanced searches to narrow results and Boolean operators like AND and OR. Finally, it stresses the importance of properly citing internet sources in a bibliography with the author, title, URL, copyright date and date accessed.
Semantic search uses language processing to analyze the meaning of content and search queries to return more relevant results. It involves classifying content using taxonomies, identifying named entities, extracting relationships between entities, and matching these based on meaning. Implementing semantic search requires preparing content through classification, metadata, and information architecture, as well as technologies for semantic tagging, entity extraction, triple stores, and integrating these capabilities with existing search and content management systems.
This document provides guidance on conducting effective online research. It explains that online research involves using internet resources, especially information on the world wide web, to systematically investigate and study materials to establish facts and reach new conclusions. It recommends starting with a focused question and keywords, then using advanced search techniques like Boolean operators and quotation marks to filter results. It also advises evaluating sources based on criteria like authority, affiliation, audience level, currency, and reliability to find the most credible information from sources like scientific journals and established news sites.
The document discusses evaluating online sources and provides examples of search techniques using Google and Bing to find information on topics like Martin Luther King Jr. and conversions between measurements. It also covers evaluating the credibility of websites and using subject specific search engines or limiting searches to particular domains or file types.
The document provides an overview of how to effectively search the internet. It discusses what the internet is, how it works, and the history and terminology associated with searching online. It then gives guidelines for developing successful search strategies, such as being specific, using keywords and phrases, trying different search engines and refining searches based on results. It emphasizes evaluating websites for credibility by examining aspects like the domain, author, date updated and external links.
This document provides information about researching topics online and evaluating sources. It discusses how to find useful information through search engines and remember the information found. It compares printed and internet sources, describing the publication and review process for printed materials versus the lack of oversight for internet sources. It also outlines how to use search engines and boolean operators effectively to search for topics and filter results.
This is a presentation delivered on December 1, 2020 by the UC Berkeley Library's Office of Scholarly Communication Services and the Research Data Management Program.
Are you unsure about how you can use or reuse other people’s data in your teaching or research, and what the terms and conditions are? Do you want to share your data with other researchers or license it for reuse but are wondering how and if that’s allowed? Do you have questions about university or granting agency data ownership and sharing policies, rights, and obligations? We will provide clear guidance on all of these questions and more in this interactive webinar on the ins-and-outs of data sharing and publishing.
- Explore venues and platforms for sharing and publishing data
- Unpack the terms of contracts and licenses affecting data reuse, sharing, and publishing
- Help you understand how copyright does (and does not) affect what you can do with the data you create or wish to use from other people
- Consider how to license your data for maximum downstream impact and reuse
- Demystify data ownership and publishing rights and obligations under university and grant policies
Search Analytics: Conversations with Your Customersrichwig
1. The document discusses analyzing search logs to understand how users interact with search engines and how to improve search and site organization based on these insights.
2. Key insights that can be gained from search log analysis include popular search terms, queries that return no results, frequently clicked search results, and patterns in search behavior over time and between user groups.
3. Information from search log analysis can be used to improve search features, results presentation, site navigation, metadata, and content.
This document provides instructions and examples for using different search operators and features in Google Search to refine search queries and find specific types of information. It explains the following operators: exclusion (-), inclusion (+), similar words (~), multiple words (OR), number range (..), fill-in-the-blank (*), and exact phrase (" "). It also demonstrates how to use Google Search to find one-box answers for weather, time, sports scores, stock prices, businesses, movies, zip codes, calculations, conversions, spell check, definitions, flights, earthquakes, and public data. The document encourages exploring these search operators and one-box features to more effectively find information online.
Este documento describe los pasos de la cadena de producción de galletitas rellenas Club Social, incluyendo la obtención de trigo y harina, la elaboración de la masa, el relleno y cocción de las galletitas, y su comercialización envasadas en Argentina, Uruguay, Paraguay y Bolivia.
The document discusses the history and workings of search engines. It describes how search engines like Archie were some of the earliest examples from 1990. It explains the process search engines use, including crawling the web with bots, indexing pages, and then processing queries to return relevant results. The document also discusses the importance of search engine algorithms and why they are kept secret by companies. Finally, it briefly describes Google's new Caffeine search engine and Yahoo's BOSS platform that allows developers to build custom search tools using Yahoo's index and infrastructure.
This document provides guidance on how to evaluate the reliability of websites for research purposes. It identifies appropriate websites like academic journals, government publications, and encyclopedias. Inappropriate websites include personal blogs, forums, wikis, and commercial sites. To evaluate reliability, check the web address extension, background of the author and organization, and references cited. Examples demonstrate how to apply these criteria to determine if specific websites are reliable sources for research. Wikipedia can only be used to consult the references of topics as a last resort.
The document provides guidance on initial steps for developing a search application, including validating the need for full-text search, identifying ideal search results, considering clustering results, and producing requirements and choosing a technology. Some key recommendations include sketching out ideal results for sample queries, determining how results should be ordered and presented, and considering if and how results could be clustered. Determining ideal results and clustering options can help drive specific requirements and the selection of an appropriate technology.
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...Bradley Allen
Faceted navigation relies on metadata to organize and navigate large collections of information. Users are becoming an important source of metadata in the form of user-generated tags and annotations. By combining user-generated metadata with traditional subject indexing, new applications of faceted navigation can be created that bridge folksonomies and taxonomies to provide more compelling ways to explore and discover online information.
This document provides information on advanced Google searching techniques. It discusses how search engines work and user expectations. Various search operators and strategies are described, such as phrase searches, Boolean operators, title searches, URL searches, and site-limited searches. The document recommends beginning with a title field search using Boolean expressions that is limited to a top-level domain or specific website to find the most relevant information.
The document discusses search engines and how they have evolved over time. It explains that early search engines ranked results based mainly on content, while modern engines also consider factors like page structure, popularity, and reputation. The document provides definitions of key search-related terms and outlines some of the main components and processes involved in how search engines work, such as crawling websites, indexing pages, and ranking results. It also discusses different types of search tools and how to choose the best one depending on your information needs.
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...inventionjournals
This document discusses an enhanced web usage mining system using fuzzy clustering and collaborative filtering recommendation algorithms. It aims to address challenges with existing recommender systems like producing low quality recommendations for large datasets. The system architecture uses fuzzy clustering to predict future user access based on browsing behavior. Collaborative filtering is then used to produce expected results by combining fuzzy clustering outputs with a web database. This approach aims to provide users with more relevant recommendations in a shorter time compared to other systems.
The impact of domain-specific stop-word lists on ecommerce website search per...kaufmanmpbbjegmwn
This study aimed to improve search performance on ecommerce websites like eBay by developing a domain-specific stop word list for the furniture category. The researchers created a corpus of over 36,000 eBay furniture listings and used linguistic analysis and frequency counts to identify high-frequency, low-value words to add to a standard stop word list. Search times using the domain-dependent list were compared to a standard list and no list (control group) using furniture search queries. As expected, search times were fastest when using the domain-dependent list, demonstrating the potential for improved ecommerce search performance through specialized stop word filtering.
The document discusses the evolution of search engines from basic keyword search to semantic search using knowledge graphs and structured data. It provides examples of how search engines like Google are now able to provide direct answers to queries by searching structured data rather than just documents. It emphasizes the importance of representing web content as structured data using schemas like schema.org to be discoverable in semantic search and knowledge graphs.
This presentation teaches effective internet research skills. It discusses different types of search engines such as regular engines like Google and Yahoo compared to metasearch engines that search multiple engines. It recommends advanced searches to narrow results and Boolean operators like AND and OR. Finally, it stresses the importance of properly citing internet sources in a bibliography with the author, title, URL, copyright date and date accessed.
Semantic search uses language processing to analyze the meaning of content and search queries to return more relevant results. It involves classifying content using taxonomies, identifying named entities, extracting relationships between entities, and matching these based on meaning. Implementing semantic search requires preparing content through classification, metadata, and information architecture, as well as technologies for semantic tagging, entity extraction, triple stores, and integrating these capabilities with existing search and content management systems.
This document provides guidance on conducting effective online research. It explains that online research involves using internet resources, especially information on the world wide web, to systematically investigate and study materials to establish facts and reach new conclusions. It recommends starting with a focused question and keywords, then using advanced search techniques like Boolean operators and quotation marks to filter results. It also advises evaluating sources based on criteria like authority, affiliation, audience level, currency, and reliability to find the most credible information from sources like scientific journals and established news sites.
The document discusses evaluating online sources and provides examples of search techniques using Google and Bing to find information on topics like Martin Luther King Jr. and conversions between measurements. It also covers evaluating the credibility of websites and using subject specific search engines or limiting searches to particular domains or file types.
The document provides an overview of how to effectively search the internet. It discusses what the internet is, how it works, and the history and terminology associated with searching online. It then gives guidelines for developing successful search strategies, such as being specific, using keywords and phrases, trying different search engines and refining searches based on results. It emphasizes evaluating websites for credibility by examining aspects like the domain, author, date updated and external links.
This document provides information about researching topics online and evaluating sources. It discusses how to find useful information through search engines and remember the information found. It compares printed and internet sources, describing the publication and review process for printed materials versus the lack of oversight for internet sources. It also outlines how to use search engines and boolean operators effectively to search for topics and filter results.
This is a presentation delivered on December 1, 2020 by the UC Berkeley Library's Office of Scholarly Communication Services and the Research Data Management Program.
Are you unsure about how you can use or reuse other people’s data in your teaching or research, and what the terms and conditions are? Do you want to share your data with other researchers or license it for reuse but are wondering how and if that’s allowed? Do you have questions about university or granting agency data ownership and sharing policies, rights, and obligations? We will provide clear guidance on all of these questions and more in this interactive webinar on the ins-and-outs of data sharing and publishing.
- Explore venues and platforms for sharing and publishing data
- Unpack the terms of contracts and licenses affecting data reuse, sharing, and publishing
- Help you understand how copyright does (and does not) affect what you can do with the data you create or wish to use from other people
- Consider how to license your data for maximum downstream impact and reuse
- Demystify data ownership and publishing rights and obligations under university and grant policies
Search Analytics: Conversations with Your Customersrichwig
1. The document discusses analyzing search logs to understand how users interact with search engines and how to improve search and site organization based on these insights.
2. Key insights that can be gained from search log analysis include popular search terms, queries that return no results, frequently clicked search results, and patterns in search behavior over time and between user groups.
3. Information from search log analysis can be used to improve search features, results presentation, site navigation, metadata, and content.
This document provides instructions and examples for using different search operators and features in Google Search to refine search queries and find specific types of information. It explains the following operators: exclusion (-), inclusion (+), similar words (~), multiple words (OR), number range (..), fill-in-the-blank (*), and exact phrase (" "). It also demonstrates how to use Google Search to find one-box answers for weather, time, sports scores, stock prices, businesses, movies, zip codes, calculations, conversions, spell check, definitions, flights, earthquakes, and public data. The document encourages exploring these search operators and one-box features to more effectively find information online.
Este documento describe los pasos de la cadena de producción de galletitas rellenas Club Social, incluyendo la obtención de trigo y harina, la elaboración de la masa, el relleno y cocción de las galletitas, y su comercialización envasadas en Argentina, Uruguay, Paraguay y Bolivia.
The document discusses the history and workings of search engines. It describes how search engines like Archie were some of the earliest examples from 1990. It explains the process search engines use, including crawling the web with bots, indexing pages, and then processing queries to return relevant results. The document also discusses the importance of search engine algorithms and why they are kept secret by companies. Finally, it briefly describes Google's new Caffeine search engine and Yahoo's BOSS platform that allows developers to build custom search tools using Yahoo's index and infrastructure.
This document appears to be a title and description for two artworks - "Crushed beer tin (detail)" and "War requiem (detail side panel)". The document provides brief identifying information for two separate art pieces in an concise manner.
This document discusses pre-processing of server log files to improve security in distributed database systems. It proposes storing the MAC address in server log files along with the IP address to better identify unauthorized users. It provides background on distributed databases and security issues when data is accessed over a network. The document also reviews related work on pre-processing log files and using data mining techniques like the Apriori algorithm and decision trees. The proposed method applies these techniques to optimize log file data, recognize patterns, and display clean log files with MAC addresses to help verify unauthorized clients.
Building Search Systems for the EnterpriseYunyao Li
This is a nice high-level summary for Gumshoe, the enterprise engine built by our group, which is currently powering IBM intranet search. One of SIGIR 2011 Industrial Track Keynote Talk.
O documento discute os resultados de um congresso internacional sobre felicidade que concluiu que a felicidade só é alcançada depois dos 40 anos de idade. Ele contrasta as dificuldades da adolescência com as liberdades e amadurecimento da vida após os 40, quando as pessoas têm mais autonomia e sabedoria para apreciar a vida.
Companies should lighten up and not take themselves too seriously to allow workers more breaks for rest and socializing with coworkers, as being overly serious and giving little break time leads to exhaustion despite productivity, while social interaction is important for every person.
Este documento proporciona un tutorial en 3 pasos para crear un blog en Blogger. Explica cómo crear una cuenta, publicar el primer mensaje del blog y configurar la apariencia y opciones básicas del blog.
The document discusses three new techniques for helping users interact with and gather web content more easily: 1) allowing users to specify relationships between websites to automatically collect data across multiple sites, 2) an interface for organizing content from multiple websites visually, and 3) a novel search paradigm using search templates to collect content from the web. It then provides an overview of the Summaries framework that the techniques are built on and describes an example usage scenario.
The document provides a summary of three books that review ethics in technology and cyberspace. It discusses key topics in each book such as the evolution of information technology, the need for computer ethics to ensure fairness, and special characteristics of online communication like scope, anonymity and reproducibility. The reviews highlight how information technology has transformed tasks worldwide, the importance of ethics for proper conduct, and how characteristics of online communication can impact society. Lessons learned are that computer use is widespread and ethics help establish fairness and equality when using technology.
Summary of Paper : Taxonomy of websearch by BroderBhavesh Singh
This document summarizes a paper that classified web search queries into three categories: navigational, informational, and transactional. Navigational queries aim to reach a specific website, informational queries seek information on a topic, and transactional queries want to perform an online activity like shopping. The paper found through surveys and query log analysis that around 20-25% of queries were navigational, 40-50% informational, and 25-35% transactional. It also proposed that early search engines only handled informational and navigational queries directly, while third generation engines aimed to better support all query types through semantic analysis and blending external databases.
This document discusses improving search literacy to help users learn as they search for technical solutions. The authors:
1) Analyzed questions and answers on Stack Overflow to understand features that make questions easy to answer, such as providing context and details. This revealed skills like properly formatting questions that help learning.
2) Propose search interface designs inspired by Stack Overflow, such as prompting users for more question details, using dialogue to elicit more information, and allowing users to explore definitions of key terms.
3) The goal is to design search engines that help users learn search skills and about technical domains as they search for solutions, similar to how asking questions on Stack Overflow supports learning.
The document describes a proposed framework called UPS for privacy-preserving personalized web search. The UPS framework aims to generalize user profiles for each query according to user-specified privacy requirements, while balancing privacy risk and personalization utility. Two key contributions of the proposed system are: 1) Supporting runtime profiling to dynamically generalize user profiles on a per-query basis; and 2) Allowing for customization of privacy requirements by users to designate sensitive topics in their profiles. Algorithms are proposed to generalize profiles to optimize these metrics during the personalization process.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
This document summarizes a research paper on developing user profiles from search engine queries to enable personalized search results. It discusses how current search engines generally return the same results regardless of individual user interests. The paper proposes methods to construct user profiles capturing both positive and negative preferences from search histories and click-through data. Experimental results showed profiles including both preferences performed best by improving query clustering and separating similar vs. dissimilar queries. Future work aims to use profiles for collaborative filtering and predicting new query intents.
Call for paper 2012, hard copy of Certificate, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJCER, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, research and review articles, IJCER Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathematics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer review journal, indexed journal, research and review articles, engineering journal, www.ijceronline.com, research journals,
yahoo journals, bing journals, International Journal of Computational Engineering Research, Google journals, hard copy of Certificate,
journal of engineering, online Submission
Quest Trail: An Effective Approach for Construction of Personalized Search En...Editor IJCATR
This document discusses developing a personalized search engine for software development organizations. It proposes using semantic analysis and genetic algorithms to personalize search results. Semantic analysis resolves ambiguity in queries by understanding their meaning, while genetic algorithms use machine learning to better understand user preferences over time. Quest analysis is also used to identify the goal or task behind a user's search by analyzing search logs at the quest level rather than query or session levels. Together these approaches aim to increase search relevance for users in software organizations by creating group profiles based on domain or project rather than individual user profiles.
Mapping a path to the empowered searcherSheila Webber
I have uploaded this older paper about using mindmapping whilst teaching searching, as the ideas are still current, and the article is difficult to get hold of. This was presented at the Online 2002 meeting, and has been published formally as:
Webber, S. (2002) “Mapping a path to the empowered searcher.” In: Graham, C. (Ed) Online Information
2002: Proceedings: 3-5 December 2002. Oxford: Learned Information Europe. 177-181.
This copy was produced from the author’s original file.
Have Your Essay Written. Online assignment writing service.Jamie Ruschel
This document discusses how to have an essay written through a 5-step process on the website HelpWriting.net. The steps include: 1) Creating an account; 2) Completing an order form with instructions, sources, and deadline; 3) Reviewing writer bids and choosing one; 4) Reviewing the completed paper; 5) Requesting revisions if needed and choosing HelpWriting.net for high-quality, original content.
The document discusses the query formulation process in information retrieval systems. It defines a query and explains that query formulation involves refining the original query entered by the user, such as through tokenization, normalization, and stemming of terms. This refinement stage is followed by a structural alteration stage where the query is segmented and expanded with related concepts. Effective query formulation improves search quality by better representing the user's intent.
Personalized web search using browsing history and domain knowledgeRishikesh Pathak
This document proposes a framework for improving personalized web search by constructing an enhanced user profile using both the user's browsing history and domain knowledge. The enhanced user profile is used to better suggest relevant web pages to the user based on their search query. An experiment found that suggestions made using the enhanced user profile performed better than using a standard user profile alone. The framework involves modeling the user, re-ranking search results, and displaying personalized results based on the enhanced user profile.
The document discusses how search log data can be used to build profiles of users and predict certain attributes about them. It provides an example of the types of inferences that can be made about a user from a single query, such as their occupation, education level, and intent. However, to fully profile users based on their interests, behaviors, and future intentions requires automated methods applied to large datasets with billions of queries over time. The document presents a framework for classifying user attributes and aspects that can be inferred from search logs.
`A Survey on approaches of Web Mining in Varied Areasinventionjournals
There has been lot of research in recent years for efficient web searching. Several papers have proposed algorithm for user feedback sessions, to evaluate the performance of inferring user search goals. When the information is retrieved, user clicks on a particular URL. Based on the click rate, ranking will be done automatically, clustering the feedback sessions. Web search engines have made enormous contributions to the web and society. They make finding information on the web quick and easy. However, they are far from optimal. A major deficiency of generic search engines is that they follow the ‘‘one size fits all’’ model and are not adaptable to individual users.
Improving search result via search keywords and data classification similarityConference Papers
This document proposes a new method to improve search results by predicting queries based on similarity between input queries and historical data of keywords and their classifications. It involves extracting keywords from queries and classifying them using named entity recognition and lexical databases. Prediction rules are generated by analyzing relationships between keywords and classifications in historical data. These rules are then used to predict additional related keywords for new queries to enhance search criteria and provide more relevant search results. The method aims to address limitations of existing search engines that rely only on raw historical query data without enrichment to predict queries.
The Use of Query Reformulation to Predict Future User ActionsJim Jansen
Using query reformulation data from search logs, researchers can build profiles of users that describe what they are interested in, what they may do in the future, and other attributes. Specifically, analysis of query sequences can provide information about a user's location, topics of interest, complexity of searches, commercial or purchase intent, and likelihood of clicking on results. While analysis of individual queries provides some insights, examining large datasets is needed to develop automated methods for comprehensive user profiling.
This study evaluated the digital research tool JSTOR for supporting humanities research. Surveys and in-person tests were conducted to observe how users searched for topics and identify trends. It was found that JSTOR's search algorithm and interface design did not facilitate easy or interdisciplinary searches. Specifically, search results often lacked relevance and filtering capabilities. The study concluded that improvements to JSTOR's algorithm and educating users on its tools could create a better search engine for humanities researchers.
A survey on various architectures, models and methodologies for information r...IAEME Publication
This document discusses various architectures, models, and methodologies used in information retrieval. It describes query models, ranking models, and feedback models used by researchers. It also highlights the importance of using context-based queries to better understand a user's search intent. The document provides an extensive survey of different approaches used in information retrieval systems and how adding context can help improve search results.
MDS 2011 Paper: An Unsupervised Approach to Discovering and Disambiguating So...Carlton Northern
This document presents an unsupervised approach to discover and disambiguate social media profiles for a large group of individuals, such as employees or university students. The approach uses a combination of search engine queries, semantic web queries, and directly polling social media sites to discover potential profiles. It then applies heuristics involving keyword matching, community structure analysis, and extracting semantic and profile features to disambiguate the true profiles from false positives. The approach was tested on a set of 2016 university computer science student logins, achieving a precision of 0.863 and F-measure of 0.654 at discovering their real social media profiles from a ground truth data set.
The document provides an overview of mining search and browse logs for web search. It discusses the major content recorded in search and browse logs, including queries, clicks, search results, and URLs visited. It also describes four commonly used data summarizations: query histograms, click-through bipartite graphs, click patterns, and session patterns. Finally, it outlines how log mining can enhance five major components of a search engine: query understanding, document understanding, document ranking, user understanding, and monitoring & feedbacks.
WK 2 DQ 1Read the journal article The Ethics of Internet Resear.docxambersalomon88660
This document discusses research ethics and methods. It provides a summary of a journal article on the ethics of internet research and discusses how the internet can impact the research process. It also covers a lecture on research ethics, design, and hypothesis testing. Some key points made include the importance of ensuring privacy and confidentiality for research participants, and that original research involves collecting your own data to answer a research question rather than just summarizing other's work.
I was invited to speak at OMCap Berlin 2014 about the close relationship between search engines and user experience with prescriptive guidance to gain higher rankings and more conversions.
Assessment and Planning in Educational technology.pptxKavitha Krishnan
In an education system, it is understood that assessment is only for the students, but on the other hand, the Assessment of teachers is also an important aspect of the education system that ensures teachers are providing high-quality instruction to students. The assessment process can be used to provide feedback and support for professional development, to inform decisions about teacher retention or promotion, or to evaluate teacher effectiveness for accountability purposes.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
How to Fix the Import Error in the Odoo 17Celine George
An import error occurs when a program fails to import a module or library, disrupting its execution. In languages like Python, this issue arises when the specified module cannot be found or accessed, hindering the program's functionality. Resolving import errors is crucial for maintaining smooth software operation and uninterrupted development processes.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
1. Understanding User Goals in Web Search
Daniel E. Rose
Danny Levinson
Yahoo! Inc.
701 First Avenue, MS B201
Sunnyvale, CA 94089 USA
+1 408 349 7992
Yahoo! Inc.
144 Fourth Avenue SW, Suite 2600
Calgary AB T2P 3N4 Canada
+1 403 303 4590
drose@yahoo-inc.com
dlevinso@yahoo-inc.com
number of other possibilities.
query might be used to convey
query “ceramics” might have
situations above (assuming it
question).
ABSTRACT
Previous work on understanding user web search behavior has
focused on how people search and what they are searching for,
but not why they are searching. In this paper, we describe a
framework for understanding the underlying goals of user
searches, and our experience in using the framework to manually
classify queries from a web search engine. Our analysis suggests
that so-called “navigational” searches are less prevalent than
generally believed, while a previously unexplored “resourceseeking” goal may account for a large fraction of web searches.
We also illustrate how this knowledge of user search goals might
be used to improve future web search engines.
In fact, in some cases the same
different goals – for example, the
been used in any of the three
is also the title of the book in
What difference would it make if the search engine knew the
user’s goal? At the very least, the engine might provide a user
experience tailored toward that goal. For example, the display of
relevant advertising might be welcome in a shopping context, but
unwelcome in a research context. In fact, we have argued
elsewhere [10] that goal-sensitivity will be one of the crucial
factors in future search user interfaces. But the potential to
capitalize on this goal sensitivity goes beyond the user interface.
The underlying relevance-ranking algorithms that determine
which results are presented to users might differ depending on the
search goal. For example, queries that express a need for advice
might rely more on usage- or connectivity-based relevance
factors, while those involving open-ended research might weight
traditional information retrieval measures (such as term
frequency) more highly.
Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: Information Search
and Retrieval – search process; H.4.m [Information Systems
Applications]: Miscellaneous.
General Terms
Measurement, Experimentation, Human Factors
Our premise is that web searches reflect a diverse set of
underlying user goals, and that knowledge of those goals offers
the prospect of future improvements to web search engines.
Achieving these improvements is an ambitious project involving
three primary tasks. First, we need to create a conceptual
framework for user goals. Second, we need a way for search
engines to associate user goals with queries. Third, we need to
modify the engines in order to exploit the goal information.
Keywords
Web search, information retrieval, user behavior, user goals,
query classification.
1. INTRODUCTION
If we imagine seeing the world from the perspective of a search
engine, our only view of user behavior would be the stream of
queries users produce. Search engine designers often adopt this
perspective, studying these query streams and trying to optimize
the engines based on such factors as the length of a typical query.
Yet this same perspective has prevented us from looking beyond
the query, at why the users are performing their searches in the
first place.
In this paper we focus on the first task, and the initial parts of the
second: characterizing user search goals and examining the
problem of inferring goals from query behavior. We begin in
section 2 by looking at previous work on understanding
information-seeking behavior. Next, in section 3, we describe our
model of search goals. In section 4, we review the methodology
used to classify queries using our model, and we provide some
results from this analysis. We conclude with some final thoughts
about the applicability of this work.
The “why” of user search behavior is actually essential to
satisfying the user’s information need. After all, users don’t sit
down at their computer and say to themselves, “I think I’ll do
some searches.” Searching is merely a means to an end – a way to
satisfy an underlying goal that the user is trying to achieve. (By
“underlying goal,” we mean how the user might answer the
question “why are you performing that search?”) That goal may
be choosing a suitable wedding present for a friend, learning
which local colleges offer adult education courses in pottery,
seeing if a favorite author’s new book has been released, or any
2. RELATED WORK
Studies of user search behavior have a long history in Information
and Library Science. These include studies of the reference
interview process, long before most users had access to computerassisted search tools. When search engines first became available
for use by researchers, many studies were conducted that
attempted to understand user search behavior in an online context.
For example, Bates [4] looked at the different ways in which
people performed searches, and later proposed ways to
characterize the overall search process [5]. Belkin’s Anomalous
States of Knowledge (ASK) framework was an early attempt to
Copyright is held by the author/owner(s).
WWW 2004, May 17–22, 2004, New York, New York, USA.
ACM 1-58113-844-X/04/0005.
13
2. This list served as a basis for an initial goal classification
framework, which we then used to categorize a sample of 100-200
queries. Next, we revised the framework to accommodate the
results of the classification test. Categories were modified, or new
categories added, when queries did not fit the existing framework.
Some goal categories proposed early on, such as “finding a place
in the world” (e.g. a map request), were dropped as
unrepresentative. Some categories were merged, some were split
more finely, and some entirely new ones arose. This proposeclassify-refine cycle was repeated three times, each with a new set
of queries.
model the cognitive state of the user and then translate this
understanding into a practical design for an information retrieval
system [6]. Included in the ASK study was an analysis of some of
the different types of information needs of different users. For
example, one type of ASK was summarized as “Well-defined
topic and problem,” while another was “Information needed to
produce directions for research.”
Once web search engines became available and popular, studies of
web search behavior followed quickly. For example, Silverstein
et al. conducted an analysis of query logs from the AltaVista
search engine, confirming some of the original findings of web
search use, such as the predominance of very short queries [11].
A summary of many of the early studies may be found in Jansen
and Pooch’s 2000 review [9].
One of our early findings was that there were many cases where
the goal of the search was neither to find a web site nor to get
information, but simply to get access to an online resource. For
example, a query such as beatles lyrics suggests not a
desire to learn about lyrics to Beatles songs, but simply a desire to
view the lyrics themselves. This led to the creation of a broad
new goal category that we call resource searches. We believe
these resource searches are a relatively neglected category in the
search engine world.
One of the most comprehensive attempts to understand web
search behavior has been the ongoing research of Spink and her
colleagues, who analyzed query logs of the Excite search engine
from 1997, 1999, and 2001 [13]. Although there have been some
changes in user behavior during this period (such as a decrease in
willingness to look at more than one page of search results), Spink
et al. found that general search strategies have remained fairly
constant.
As we repeatedly revised the set of goal categories, we gradually
reached the conclusion that the goals naturally fell into a
hierarchical structure. In fact, the top level of the hierarchy
resembles Broder’s trichotomy, but our more general “resource”
category replaces his notion of “transactional” queries. Our
resulting goal framework is shown in Table 1.
Prior to the advent of the worldwide web, search engine designers
could safely assume that users had an “informational” goal in
mind. That is, users’ reason for searching was generally to “find
out about” their search topic. This was due both to the nature of
the population with access to full-text search engines (students,
researchers, lawyers, intelligence analysts, etc.) and to the nature
of the databases that could be searched (with services such as
Westlaw, Dialog, Medline, Lexis/Nexis, etc.)
We define the navigational goal as demonstrating a desire by the
user to be taken to the home page of the institution or
organization in question. To be considered navigational, the
query must have a single authoritative web site that the user
already has in mind. For this reason, most queries consisting of
names of companies, universities, or well-known organizations
are considered navigational. Also for this reason, most queries for
people – including celebrities – are not. A search for celebrities
such as Cameron Diaz or Ben Affleck typically results in a variety
of fan sites, media sites, and so on; it’s unlikely that a user
entering the celebrity name as a query had the goal of visiting a
specific site.
But in the web era, search engines are used for more than just
research. Even the most cursory look at the query logs of any
major search engine makes it clear that the goals underlying web
searches are many and varied. And while the vast body of work
described above has helped us to understand what users are
searching for and how their information-seeking process works,
there have been few attempts to look at why users are searching.
One of the few exceptions is Broder’s “Taxonomy of Web
Search” [7]. Motivated by the idea that the traditional notion of
an “information need” might not adequately describe web
searching, Broder came up with a trichotomy of web search
“types”:
navigational, informational, and transactional.
Navigational searches are those which are intended to find a
specific web site that the user has in mind; informational searches
are intended to find information about a topic; transactional
searches are intended to “perform some web-mediated activity.”
Informational queries are all focused on the user goal of
obtaining information about the query topic. This category
includes goals for answering questions (both open- and closedended) that the user has in mind, requests for advice, and
“undirected” requests to simply learn more about a topic.
Undirected queries may be viewed as requests to “find out about”
or “tell me about” a topic; most queries consisting of topics in
science, medicine, history, or news qualify as undirected, as do
the celebrity queries mentioned above. Note that the two questiongoal categories do not require that the user explicitly express the
query in the form of a question; the query “last czar of russia” is
reasonably interpreted as a closed-class question “who was the
last czar of Russia?” Similarly, queries in the “advice” category
may take many forms.
3. A FRAMEWORK FOR SEARCH GOALS
Our first task was to understand the space of user goals. In
particular, we needed to come up with a framework that could
identify and organize a manageable set of canonical goal
categories. These goal categories, in turn, must encompass the
majority of actual goals users have in mind when searching.
The informational goal class also includes the desire to locate
something in the real world, or simply get a list of suggestions for
further research. Most product or shopping queries have the
“locate” goal – I’m searching the web for X because I want to
know where I can buy X. Plural query terms are a highly reliable
indicator of the list goal.
To develop the goal framework, we looked at a sample of queries
from the AltaVista search engine [1]. We brainstormed a variety
of goal possibilities, based on our own experiences, some
previous internal query analysis at AltaVista, and a preliminary
examination of the query set. This resulted in a flat list of goals.
14
3. Table 1: The Search Goal Hierarchy. Queries are only assigned to leaf nodes.
All examples are taken from actual AltaVista queries.
SEARCH
GOAL
EXAMPLES
DESCRIPTION
1. Navigational
My goal is to go to specific known website that I already
have in mind. The only reason I'm searching is that it's
more convenient than typing the URL, or perhaps I don't
know the URL.
2. Informational
My goal is to learn something by reading or viewing web
pages
2.1 Directed
aloha airlines
duke university hospital
kelly blue book
I want to learn something in particular about my topic
2.1.1 Closed
I want to get an answer to a question that has a single,
unambiguous answer.
what is a supercharger
2004 election dates
2.1.2 Open
I want to get an answer to an open-ended question, or one
with unconstrained depth.
baseball death and injury
why are metals shiny
2.2 Undirected
I want to learn anything/everything about my topic. A query
for topic X might be interpreted as "tell me about X."
color blindness
jfk jr
2.3 Advice
I want to get advice, ideas, suggestions, or instructions.
2.4 Locate
My goal is to find out whether/where some real world
service or product can be obtained
pella windows
phone card
2.5 List
My goal is to get a list of plausible suggested web sites (I.e.
the search result list itself), each of which might be
candidates for helping me achieve some underlying,
unspecified goal
travel
amsterdam universities
florida newspapers
3. Resource
help quitting smoking
walking with weights
My goal is to obtain a resource (not information) available
on web pages
3.1 Download
My goal is to download a resource that must be on my
computer or other device to be useful
kazaa lite
mame roms
3.2 Entertainment
My goal is to be entertained simply by viewing items
available on the result page
xxx porno movie free
live camera in l.a.
3.3 Interact
My goal is to interact with a resource using another
program/service available on the web site I find
weather
measure converter
3.4 Obtain
My goal is to obtain a resource that does not require a
free jack o lantern patterns
computer to use. I may print it out, but I can also just look ellis island lesson plans
at it on the screen. I'm not obtaining it to learn some
house document no. 587
information, but because I want to use the resource itself.
The search goal framework described above proved to be both
stable (requiring no major revisions as new queries were
encountered) and comprehensive (encompassing the goals of all
the queries we had seen). We were therefore able to move on to
the second major task, associating goals with queries.
Resource queries all represent a goal of obtaining something
(other than information). If the resource is something I plan to
use in the offline world, such as song lyrics, recipes, sewing
patterns, etc., we call it an “obtain” goal. If the resource is
something that needs to be installed on my computer or other
electronic device to be useful, the goal is “download.” If my goal
is simply to experience (typically view or read) the resource for
my enjoyment, the goal is “entertain.” The most common
example of queries with an entertainment goal were those that
dealt with pornography. Finally, the “interact” goal occurs when
the intended result of the search is a dynamic web service (such as
a stock quote server or a map service) that requires further
interaction to achieve the user’s task.
4. ASSOCIATING GOALS WITH QUERIES
There are two ways a search engine might associate goals with
queries at runtime: either the user can identify the goal explicitly
through the user interface, or the system can attempt to infer the
goal automatically. Google’s “I’m feeling lucky” feature [8], in
which users implicitly identify their goal as “navigate to a specific
web site,” may be thought of as an early example of the first
15
4. Figure 1: A screenshot of the tool used to assist manual query classification.
We believe that in many cases, user goals can be deduced from
looking at user behavior available to the search engine. Included
in this behavior are the following:
approach.
The second approach would involve automatic
classification using statistical or machine learning methods; these
methods in turn will require hundreds or thousands of examples of
classified queries (and their associated features) as training
examples.
the query itself
the results returned by the search engine
In either case, we need to know the relative prevalence of various
goals. And if we hope to infer goals automatically in the future,
we need to know that it is possible to do so manually. This section
describes our work on these initial aspects of the problem; the
remaining parts of the task will be the focus of future work.
results clicked on by the user
further searches or other actions by the user.
We wanted to determine whether this was sufficient information
for a human to consistently classify queries according to our goal
framework.
Once we could successfully classify queries
manually, we would be able to provide training data for a future
automatic classification system.
4.1 Manual Query Classification
In order to definitively know the underlying goal of every user
query, we would need to be able to ask the user about his or her
intentions. Clearly, this is not feasible in most cases. But can the
goal be determined simply by looking at the query itself, or is
more information required?
To facilitate the task of manual classification, we created a
research tool that provided these four types of information for sets
of queries taken from the AltaVista query logs. A screen shot of
the classification tool interface is shown in Figure 1.
16
5. Table 2: Events following the query final fantasy.
Time
Delta t
Event
Details
36
36
result click
pg 1, pos 1 http://www.ffonline.com
113
77
query
pg 1 final fantasy
118
5
result click
pg 1, pos 8 http://www.eyesonff.com
147
29
result click
pg 1, pos 8 http://www.eyesonff.com
The query (kelly blue book in this example) appears in the
gray-highlighted box at upper left. To the right of the query are
links which lead to the search results that appear when the query
is executed on two major search engines. Beneath this is a table
of search engine events (clicks and queries) that this user
performed following the initial query. In this case, we see that six
seconds after issuing the query, the user entered a new, more
specific query on the same topic. (The syntax suggests that this
query resulted from the user clicking on a suggested query
refinement term using AltaVista’s Prisma [2, 3] assisted search
tool.) Eight seconds later, he or she clicked on the first result,
www.kbb.com, which is the home page of the Kelly Blue Book
(a publication that gives guidelines for new and used car prices).
Thus a human classifier using the tool (namely, one of the
authors) concluded that the underlying user goal for this query
was “navigational,” and selected the corresponding radio button.
When the “Submit classification” button is pressed, a new query
is displayed, together with its corresponding information. In the
example shown, a human classifier could probably have guessed
the goal simply by viewing the initial query. Yet there are cases
where each of the sources of information played a role in
assessing the user’s goal.
4.2 Results
Three sets of approximately 500 U.S. English queries1 each were
randomly selected from the AltaVista query logs on different days
and at different times of the day. These were manually classified,
one set using the classification tool as described above, and two
sets using an earlier version that did not contain the user’s event
history. Results are shown in Table 3. (Note that the “open” and
“closed” categories have been collapsed into a single “directed”
category, due to the low numbers of results.)
It is interesting to note that nearly 40% of queries were noninformational in every case, and a large fraction of the
informational queries appeared to be attempts to locate a product
or service rather than to learn about it. In fact, just over 35% of
all queries appeared to have the kind of general research goal
(questions, undirected requests for information, and adviceseeking) for which traditional information retrieval systems were
designed.
It is also interesting that the relative distributions of goal
categories are quite similar across the different query sets, despite
the fact that they represented different dates during the year and
different times of day. Perhaps more importantly, the additional
information about user click behavior used in the Set 3 results did
not result in a substantially different mix of goals. Although this
requires further study, it suggests the surprising result that goals
can be inferred with almost no information about the user’s
behavior.
Consider the query final fantasy. This is the name of a
series of popular computer games. Did the user want to find a
place to buy one of the games (a “locate” goal)? Did he or she
intend to go to some official Final Fantasy web site (a
“navigational” goal)? A look at the search results on AltaVista
and Google shows that there isn’t an authoritative web site for the
game. The game’s manufacturer has a web site, but it covers
many games, has no specific page for the entire Final Fantasy
series, and is ranked #3 on both AltaVista and Google. This casts
some doubt on likelihood of a navigational goal. The result list
contains many sites with information about the games, and many
sites where one can buy the games. The user’s event history,
shown in Table 2, provides further information.
Because the top level of our goal classification framework is
similar to Broder’s web search taxonomy [7], we also examined
how the distribution of our queries into the top-level goal
categories compared with his. Broder used two methods to
classify queries, a user survey and manual classification of log
entries. The survey had one question intended to identify
navigational queries, and one that allowed users to choose any of
several tasks (shopping, downloading, etc.) that he considered
“transactional.” If none of these tasks was chosen, the query was
assumed to be informational. The log analysis followed a similar
decision procedure. Broder also eliminated sexually oriented
queries, which accounted for about 10% of the data.
The user examined the result list for 36 seconds, then visited the
web site www.ffonline.com, described as “an unofficial
guide to Final Fantasy.” About a minute later, s/he returned to
the original query, and then chose a different web site, “Eyes on
Final Fantasy,” (www.eyesonff.com), a site containing news
and information about the games. This pattern indicates that the
user was not interested in buying the game, but simply wanted
some sort of information about it – perhaps the latest news about
future releases. In this case, we’d conclude that the underlying
goal was “undirected” information.
Figure 2 compares our top-level goal classification with results
reported by Broder. (We are simplifying somewhat by equating
Broder’s “transactional” category with our more general
“resource” goal.) We consistently found a greater proportion of
informational queries, and a smaller proportion of navigational
1
17
The number was not exact because we started with a larger set
and then discarded those that were either not English or used
non-standard search operators such as “link:”.
6. Table 3: Results of Classifying Queries by Search Goals
GOAL
SET 1
directed
SET 2
SET 3
2.70%
undirected
3.30%
7.30%
31.30%
26.50%
22.70%
advice
2.00%
2.70%
5.00%
locate
24.30%
25.90%
24.40%
list
2.70%
2.90%
2.10%
63.00%
61.30%
61.50%
download
4.30%
4.30%
5.60%
entertain
4.00%
8.20%
5.80%
informational total
interact
4.30%
6.00%
7.70%
10.30%
7.70%
resource total
21.70%
27.00%
25.00%
navigational
100
90
80
70
60
50
40
30
20
10
0
5.70%
obtain
15.30%
11.70%
13.50%
14.7
11.7
13.5
48
60.9
61.3
61.5
36
30
24.3
27
25
Broder user
survey
Broder log
analysis
Current study,
set 1
Current study,
set 2
Current study,
set 3
24.5
20
39
Resource / Transactional
Informational
Navigational
Figure 2: Comparison of Broder’s search taxonomy to our top-level goals. Resource and informational
results in the first column are Broder’s estimates. Results do not total 100% due to rounding error.
more impact than the other methodological differences used to
obtain our respective data sets.
and resource/transaction queries than the earlier study. While the
differences in informational and resource/transactional queries
may be accounted for by our different definitions of those
categories, this does not account for the large difference in the
fraction of navigational queries.
If our findings about the relatively small number of navigational
queries are accurate, they suggest that much of the attention in the
commercial search engine world may be misdirected. Tests such
as the “Perfect Page Test” organized by one search engine
newsletter [12] encourage search engine providers to focus on
performance on navigational queries, even though this does not
appear to reflect the majority of user needs.
In fact, since Broder sampled all queries, while we sampled only
session-initial queries, the actual difference in navigational query
rates may be even higher. This is because navigational query
sessions are likely to be shorter and thus overrepresented in our
session-initial measure. However, it is not clear that this had any
18
7. 5. FUTURE WORK
In analyzing our results, we are aware of certain limitations that
may restrict the generalizability of our conclusions. One issue is
that we have no way of knowing conclusively whether the goal we
inferred for a query is in fact the user’s actual goal. In the future,
we would like to combine our work with user studies, including
qualitative data such as diary reports of user goals. In order to do
this, we first need to make sure that our goal framework and
classification methodology can be used by judges other than the
authors.
[3] Anick, P. Using Terminological Feedback for Web Search
Refinement: A Log-Based Study. Proceedings of SIGIR
2003, 88-95.
[4] Bates, M.J. Information Search Tactics. Journal of the
American Society for Information Science, 30, July 1979,
205-214.
[5] Bates, M.J. The Design of Browsing and Berrypicking
Techniques for the Online Search Interface. Online Review
13, October 1989, 407-424.
A second issue is that the AltaVista user population may not be
representative of search engine users in general. In particular,
AltaVista’s reputation for providing more powerful query tools,
combined with its relatively small market share, may make it the
engine of choice for users with difficult informational queries, but
not a first choice for typical users issuing common queries. It is
possible that this may account for some of the user behavior we
saw, despite the fact that we already excluded queries with
explicit Boolean syntax or other advanced search operators. In
order to investigate this issue, we hope to extend our research to
Yahoo! search users.
[6] Belkin, N.J., Oddy, R.N., and Brooks, H.M. ASK for
Information Retrieval: Part II. Results of a Design Study.
Journal of Documentation, 38(3), Sep. 1982, 145-164.
[7] Broder, A. A Taxonomy of Web Search. SIGIR Forum 36(2),
2002.
[8] Google, Description of “I’m Feeling Lucky” feature,
http://www.google.com/help/features.html#lucky.
[9] Jansen, B.J. and Pooch, U. A Review of Web Searching
Studies and a Framework for Future Research. Journal of
the American Society of Information Science and
Technology, 52(3), 235-246, 2000.
6. CONCLUSIONS
If web search engines are to continue to improve in the future,
they will need to take into account more knowledge of user
behavior – not just how people search, but why. We have created
a framework for understanding the underlying goals of search, and
have demonstrated that the framework can be used to associate
goals with queries given limited information.
[10] Rose, D.E. Reconciling Information-Seeking Behavior with
Search User Interfaces for the Web. Journal of the American
Society of Information Science and Technology, to appear.
[11] Silverstein, C., Henzinger, M., Marais, H., and Moricz, M.
Analysis of a Very Large Web Search Engine Query Log.
SIGIR Forum, 33(3), 1999. Originally published as DEC
Systems Research Center Technical Note, 1998.
This analysis of user goals has already yielded two unexpected
patterns in web search. First, so-called “navigational” queries
appear to be much less prevalent than generally believed. Second,
many queries appear to be motivated by a previously unexplored
goal involving the need to obtain online and offline resources.
[12] Sherman, C. and Sullivan, D. The Search Engine ‘Perfect
Page’ Test. Search Day 391 (Nov. 4, 2002),
http://www.searchenginewatch.com/searchday/02/sd1104pptest.html.
More importantly, an understanding of search goals provides a
foundation for tackling the larger problems of conveying user
goals to a search engine (either explicitly or by inference), and
modifying the engines’ algorithms and interfaces to exploit this
knowledge.
[13] Spink, A., Jansen, B.J., Wolfram, D., and Saracevic, T. From
E-Sex to E-Commerce: Web Search Changes. IEEE
Computer, 35(3), 107-109, 2002.
7. REFERENCES
[1] AltaVista, http://www.altavista.com.
[2] AltaVista, description of Prisma query refinement tool,
http://www.altavista.com/help/search/pp.
19