The dot.com era saw the creation of many search engines. Many of the original search engines still exist today but are often powered by one of the top four search providers. For instance Go.com search engine exists but is powered by Yahoo Search. Some search engines are still around and have undergone name changes and face lifts. As is the case with Ask.com, formerly Ask Jeeves.
Spiders collect data for search engines Programmed based on algorithms Collect data including hyperlinks, texts, tags for indexing Limits to what can be searched Cannot search if page cannot be found or linked to Cannot search all links if there are too many
Each search engine uses a algorithm to determine what information will be added to its index.
Most search results are returned in order of relevancy, but not every search tool ranks relevancy the same! Most search services offer a general idea about how they rank results, but the specifics are typically kept secret. Term Frequency: High-ranking : a document in which the search term “gardening” occurs 12 times Not-as-high: a document in which the term “gardening” occurs 4 times Term Location: High-ranking : the search term “gardening” appearing at the top of the page, in the <title> tag, or in a <h#> tag Not-as-high: “gardening appearing near the bottom of the page Keep in mind when evaluating the results pages: look for other things included on the results pages and at some of the details provided with the links. Sometimes you can find news articles, images, directories, audio etc. Introduction to Relevance Ranking I want to tell you a little about the concept of relevance ranking. Relevance ranking explains why you can perform a search using one tool and get one set of results, and then perform the same search using a different tool and get a completely different sets of results. Relevance ranking determines the order in which search results are presented. How this relevance is determined is typically a proprietary function that is unique to each search tool. Most search tools keep the details of their relevance ranking algorithm secret. That’s what makes them competitive. Term frequency as a ratio to total document length a document with your query term 50 times is more relevant than one with your query term only once Location of query terms Near the top of the document in the HTML <TITLE> tag or heading tags Boolean AND vs. OR documents with ALL the terms should rank higher than documents with only some of the terms Proximity of query terms to one another. The closer together your terms appear, the more relevant the document is assumed to be. Relative commonness of query terms The less common a term in the language at large, the MORE subject significance it has when it DOES appear. i.e. heart attack vs. myocardial infarction . Paid Rankings Pay attention to your search results, and look for listings labeled “Sponsored Sites,&quot; “Featured Sites,” or &quot;Partner Sites.&quot; These sites have paid the search tools to put their sites at the top of the list. This means the content might be okay, but it might be completely irrelevant to your search. Google uses ad words: Site owners bid on search terms. When those terms are searched, the ad of the highest bidder displays at the top of the search results &quot;sponsored sites&quot; list. Advertisers pay Google when someone clicks through the ad.
BlindSearch, the search engine taste test, is a way to see how three different search engines display results for the same query. Enter your search terms then vote on the results list you like the best. The search engine name will then be revealed. Notice how the links are similar. Notice how the links differ. Algorithms determine who results will be ranked and displayed. Which search engine did you vote for? Why?
Google has developed several tools to help searchers better sort their results. Try at basic search in Google then use the tools on the left of the results page to sort the results. Limit by local, date posted and historical/history. If time demonstrate pages visited.
Demonstrate these three options. Focus on Directory and patents. Product if time.
Understanding how search engines operate. Each search engine is different. Search engines provide access to information available on the world wide web. Search engines do not search the entire internet.
What's included in the invisible web? Articles and information in password protected databases Data Statistics Government documents Web pages not “crawled” by spiders or bots License and password protected materials Up to 500 times larger than the surface web (static HTML pages & indexed materials)
This image shows the invisible web as the ocean. As you can see more information “fish” are at the bottom of the image. Think of the ships at the top as search engines. They get most of the information from the surface web.
Demonstrate GoogleScholar indexing PubMed. Advance search in GoogleScholar. WolframAlpha show basic search options.
What if you are looking for information that may not exist anymore? You might recall reading something on a website but the site has been updated and no longer as the same information you are looking for.
Launched in 2002. 1 800 Goog-411 provides a free 411 service based on a Google Search. Speech recognition search. Results can be send to the user via text. Map directions can also be sent.
SUPER SEARCHER: ENHANCING YOUR ONLINE SEARCH SUPER POWERS Max C Anderson Technology Coordinator National Network of Libraries of Medicine Greater Midwest Region S
“ The Pew Internet Project has found that search engines are the most popular way to locate a variety of types of information online – including health information, government information, and religious information.” (Pew Internet and American Life Project. Memo: Search Engines. 2002)