  1. 1. Search EnginesSearching More Effectively
  2. 2. What is a search engine? Multiple servers that run a program called a spider or a crawler Crawlers build an index of web sites They follow the links on the website, and crawl those pages You can search the index by keyword matching Search engines don’t search the web, they search the index, so current events may not be indexed yet
  3. 3. Surfing the Index Search engines don’t search the web, they search the index, so current events may not be indexed yet
  4. 4. Directories can often be more fruitful than search engines Because directories are created by people and not programs... They are smaller but reflect evaluated material instead of all of the web They are even less likely to be up to date They usually take you to the front door of a website They are organized in a thoughtful manner so you can browse
  5. 5. Directory Example Open Directory Project It helps you to think logically about the information you need—the structure is already there, you just have to follow it
  6. 6. Popular Search Engines  Google  AltaVista  Yahoo  Alltheweb  MSN  DogPile
  7. 7. Number of pages indexed The more pages indexed, the more likely you are to find what you need Hard to find a needle in a haystack if the haystack is dumped on your head Google— 4.3 Billion Yahoo— 3.2 Billion Teoma— 1 Billion AltaVista and AllTheWeb—Acquired by Yahoo and no longer available Source:
  8. 8. Google’s Index Half the searchable web, so perhaps 8 billion searchable pages are out there
  9. 9. What do search engines not search? Private Databases • Not fixed URLs • Professional, academic • Example at the end of presentation (MERLOT) Ask Jeeves only lists customers that pay to have their site indexed
  10. 10. Market Share comScore Media Metrix Search Engine Ratings
  11. 11. Which sites use what engine?  Google uses Google owned by Google  Yahoo uses Yahoo owned by Yahoo, but they used to use Google, and they did recently acquire Inktomi  AOL uses Google & Open Directory and owned by AOL  AltaVista uses Open Directory and Yahoo and owned by Yahoo  AlltheWeb uses and is owned by Yahoo  HotBot uses Google owned by Lycos
  12. 12. Why do we care?If two sites use the same engine, you’ll get the same results comScore Media Metrix Search Engine Ratings
  13. 13. Two sites, same results AltaVista—apple pie AltaVista found 2,410,000 results 1 - 10 of about 2,410,000 for apple pie
  14. 14. What does that mean?Because there are basically twoforces now in the search engineworld, based on market share, indexsize and unique searchingtechnologies, general searches arebest done at either Yahoo or Google
  15. 15. How does Google match websites? Page Rank • Google interprets a link from page A to page B as a vote by page A for page B. It also analyzes the page that casts the vote. If it’s important page (many links to it), its vote counts more heavily Text Matching • A page has to be both important (Page Rank) and relevant (text-matching) to be at the top of the list
  16. 16. Matching (continued) When engines rank results related to text matching, the location and frequency of the text string plays into account Pages with the phrase apple pie will rank higher than pages that mention both terms separately Pages that mention apple pie repeatedly rank high than pages with fewer occurrences Pages with apple pie in the title of the page rank higher
  17. 17. Title Tag When constructing a web page, the title tag is important Search engines look at them
  18. 18. Example of Title Tag Code Source code: <title>Orientation Programs--Washington and Lee University</title> In FrontPage, File/Save As/File Name/Title Google Search for New Page 1, 17,600,000
  19. 19. Meta Tag Searching Google does not search Meta Tags, too much “meta tag spam” Inktomi was the last major search engine that used it, now they have been bought by Yahoo Teoma might use meta tags
  20. 20. Meta Tag <head> <TITLE>Revisiting Meta Tags</title> <META NAME="authors" CONTENT=" Danny Sullivan"> <META NAME="date" CONTENT="20021205"> <META NAME="channel" CONTENT="internet technology"> <META NAME="description" CONTENT="Follow up to October 2002 article about the demise of the meta keywords tag."> </head>
  21. 21. Keyword Searching Be as specific as you can Don’t use “car” if you can use “Toyota” Search engines have a hard time differentiating between differences in meaning, i.e., hard exam, hard cider, hard times, hard drive It can’t think for you—if you put in “heart attack”, it won’t show pages with “cardiac arrest”
  22. 22. Boolean Searching George Boole, English Mathematician, Died 1864-logical combinatorial system AND, OR, NOT Used to get more targeted results Default Operator is AND at all major search engines, so if you type in apple pie, sites assume “apple AND pie”
  23. 23. Using Boolean Operators at Google Default Operator is AND apple pie— 1,710,000 apple AND pie (+pie)—1,690,000, default operator message, but it does take into account word order Fewer results, perhaps a little more useful
  24. 24. Boolean Operator OR apple OR pie—7,140,000 Use this if you don’t want to rule out too much Asthma, acute OR chronic
  25. 25. Boolean Operator NOT apple NOT pie (–pie) What will NOT do to the search results? —816,000 Lessened results by half How could you use NOT to search for information about Bass fishing? bass NOT guitar (when you want the fish)
  26. 26. Be Careful with “NOT” A search for apple pie NOT cobbler may remove useful results such as "Aunt Sarahs Better Than Cobbler Apple Pie"
  27. 27. Synonyms ~apple ~pie (synonyms)—4,520,000
  28. 28. Domain Restrict apple pie—733More appropriate example: admissions information—3,730,000 admissions information, search
  29. 29. Exact Search How do you get results that match exactly? Use quotation marks, i.e., “apple pie” 696,000 on Google
  30. 30. AltaVista & Google Cool Feature Link—find out how many indexed pages link to your page AltaVista—92 (searches Yahoo) Google—33
  31. 31. Cached Items Google “takes a picture” (indexes a site) As web sites often do, the site goes away You can still look at the old site through the cache
  32. 32. Meta Search Engine What is a Meta Search Engine? Search Engines that display results from several sites at once Dogpile--Google · Yahoo · Ask Jeeves About · LookSmart · Overture FindWhat Hmmm…Dogpile inserts sites that have paid for placement without telling into results from various search engines
  33. 33. Safe Search Google—SafeSearch Filter—preferences Yahoo—SafeSearch Filter—preferences AltaVista—Settings, Family Filter, can set a password
  34. 34. Advanced Settings Most search sites have a link for advanced settings, so you don’t have to remember the particular syntax for a particular type of search