Searching and retrieval performance of search engines: a comparative study SangeetaNarang Librarian, All India Institute of Medical Sciences, New Delhi firstname.lastname@example.org
ABSTRACT Objective: This study was undertaken to investigate the number of hits and types of sources of information retrieved by searching for sample queries drawn from real reference questions on the search engine over Internet.
Study design: The Google, Yahoo!, and Bing (previously MSN) search engines were used to search for the single terms, multiple term queries, boolean logic, and phrase search capabilities. The first forty results retrieved for the “health literacy” search term was examined and thereafter, Information regarding the organization or individuals sponsoring the websites was compared based on domain study.
Results: Number of hits was high and sponsorship was significantly different among each search engine. Many of the websites were Commercial, Non-Government agencies, few from Government organizations and Educational institutes.
Conclusion: As the Internet has become frequently used source of information, Librarians should consider the best ways to disseminate best educational information to the users amidst insurmountable websites.
INTRODUCTION Search engines are defined as a remotely accessible program that lets you do keyword searches for the information on the internet. There are several types of search engines the search may cover titles of documents, URL’s, headers or the full text. (www.ameris.co.uk/glossary_of_terms.cfm)
COMPONENTS OF SEARCH ENGINE Search engine constitute the web crawlers called the spiders or the robots .They visit remote sites download their contents for indexing. While indexing the program they create an outline of the document by stripping out all the headers and then takes the first 20 % or 20 lines whichever is smaller as an excerpt or abstract. Statistically more salient terms in the document are taken as keywords. In this way a highly efficient data structure, a tree/ Index is generated that is associated with the specific webpage. So whenever a user submits a query it is this inverted index that is searched. Various search engines accepts queries as a simple text and breaks the users text into a sequence of search terms.
TYPES OF SEARCH ENGINE There are various types of search engines: General search engines: They have their own index of documents and web pages which is generated by their web crawlers e.g. Yahoo, Google, Bing, Ask.com, Lycos, Altavista. Metasearch engines: They combine the searches of multiple search engines and then deliver the results Mamma, Dogpile, Clusty, Kartoo Specialty search engines: They have indexes on a specific field and provide an in depth coverage e.g. PubMed, Scirus.
MATERIALS AND METHODS The following seven terms were chosen for study Health, Literacy as single term, Health literacy as multiple term, Health AND Literacy used as boolean operator, Role of health sciences librarian as phrase Role of librarians in health literacy as phrase Health Literacy in quotes as combined search
FINDINGS Number of hits retrieved for each search term
List of first ten websites retrieved for health Literacy by three search engines
Distributions of sponsorship of websites retrieved by search engines are
There is wide variation in Internet use and search strategy. Web users spend a lot of their time using search engine to locate material on the vast and unorganized web. According to Visualization and Usability Center user survey about 85% of users use search engine to locate information. There is a strong competition among various search engines where each search engine is striving harder to outperform the other search engines either by expanding their coverage or by adding more features. Since search engine existence the merging, break over and takeover have been witnessed. For example Ask.com is formed by the merging of Ask Jeeves and Teoma. In the study it was observed that each search engine gives different result for the search terms. By examining the single word search for Health and for Literacy the hits were more in case of Yahoo followed by Google and then Bing. While the multiword term Health Literacy again yahoo search results are more. But number of hits was less than single word search. When “health Literacy” in quotes is considered than Bing search engine shows more results while Google the least. Again boolean term Health and Literacy yahoo search results are more phrase search for Role of Health Sciences Librarian term Yahoo is ahead. When Role of librarians and the operator + with Literacy used Google leads in search results followed by Yahoo and Bing. It is observed that with the increase in numbers of query term the outcome is affected. When the websites of the first forty hits for health literacy was studied it was observed that there are sponsored sites, commercial sites, nongovernmental sites, educational sites. The percentage of non government organizations and commercial organizations was high as compared to others. Further it was found that there are changes in hits and number of returned documents over a period of time. The searches in yahoo India is different from yahoo also the number of hits is the estimate number of index terms. There does exist limitation for all the related websites are not possible to be studied, it is just the pilot study to seek the searching and retrieval behavior of search engine.
Single term queries return very high number of hits in each search engine. Multiple terms lead to less number of hits. Secondly, Health is more sought out term over the search engine than Literacy. Finally librarians will play a very important role in refining the search strategy in finding the most appropriate source of information among abundant resources where sponsorship of websites has to be taken care off.