Session5

1,691 views
1,641 views

Published on

Presentation for the Feb. 27 class on the Internet and searching

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,691
On SlideShare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
47
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Session5

  1. 1. Search Engines and Online Research February 27, 2008 IST 523 Denise A. Garofalo
  2. 2. What are search engines? designed to make surfing the web simple, fast and rewarding for Internet users designed to search out Web pages one at a time and collect the results
  3. 3. What do search engines do? gather together information store it in a database allow access to a list of individual pages based on: a word, or, set of words that you submit in the form of a query
  4. 4. How do search engines work? they send out computer programs known as “spiders” or “robotsquot; to search the web interested in reading and storing the actual text that is shown on a web page, not graphics, etc.
  5. 5. How, continued…. spider begins by visiting a single Web page it saves the text that if finds there after it has collected the information on that page, it looks for a link that will take it to another page when it reaches the next page, it starts the process all over again by following these steps over and over again, search engines are able to find and index far more web pages than a human being
  6. 6. More how….. search engines setup spiders to begin their searches at web sites known as directories large web sites that contain lists of links that have been collected by human beings no way for spiders to find every page listed on the World Wide Web millions of web pages do not have any links to them from other sites without these links, spiders can’t find and index those pages
  7. 7. How do search engines show the results? Sites are ranked based on the textual content of a web page A special set of criteria, or algorithm, is used to decide which pages to display Algorithms consider things like the title of the page, the text of the page, how many other web sites link to the page, and even what text web sites that link to a page use to describe it
  8. 8. Search engines--review a series of computer programs that find and save files at a very fast rate when combined with algorithms designed to sort content based on text queries search engines become a useful tool to find a little bit of information in that vast collection of files known as the World Wide Web
  9. 9. Which search engine is best? Need to understand how each search engine works Check out the Bruce Clay, Inc. search engine relationship chart: http://www.bruceclay.com/searchenginere lationshipchart.htm
  10. 10. Invisible Web (or Deep Web) Some pages and links are excluded from most search engines by policy Others are excluded because search engine spiders cannot access them. Pages that are excluded are referred to as the “Invisible Web” (or “Deep Web”) you don't see these pages in search engine results estimated to be two to three or more times larger than the “visible web”
  11. 11. Why invisible pages? If a search engine doesn’t locate a Web page it’s because: Technical barriers prohibit access Choices or decisions made by the search engine (policy) exclude the page
  12. 12. Technical barriers Typing or judgment is required Searchable specialized databases Logins and/or passwords required
  13. 13. Policy issues Page format Non-HTML pages Script-based programs (those URLs with a “?”)
  14. 14. Research issues Different search tools give different results Failure to retrieve does not mean that there is nothing available Develop a search strategy Learn the search engine’s search tips Evaluation
  15. 15. A selection of engines •Google www.google.com •Vivisimo www.vivisimo.com •Ask.com www.ask.com •Yahoo! www.yahoo.com •Open Directory www.dmoz.org •Ixquick www.ixquick.com/ •Mamma www.mamma.com/ •Gigablast www.gigablast.com/ Search.com www.search.com/
  16. 16. Failure to retrieve crawling Web pages and locating sites for search engines is based on using links from one page to reach other pages to crawl documents with few links tend to be overlooked if pages are never discovered, they are not available to researchers Failure to retrieve can also be linked to the search query used, or search strategy
  17. 17. Search strategy three main considerations in the search process Relevance Precision Recall
  18. 18. Successful search strategy ability to create an exact match between search statement and documents sought size and content of the search engine selected search engine’s search tools
  19. 19. Process involves consultation of definition tools subject dictionaries thesauri, etc. subject familiarization i.e. if searching on medical topics, become familiar with basic terminology same goes for research in any other subject area
  20. 20. Formulating a strategy be logical spend time on search term selection and combining to reduce the time spent eliminating irrelevant search results search engines are good for searching on unusual or unique keywords, and for combining keywords be creative and flexible look for subtle connections be prepared to make intuitive leaps
  21. 21. Simplified search strategy Formulation of the research question and its scope Identification of concepts within the question Identification of search terms to describe those concepts Consideration of synonyms and variations of those terms Preparation of the search logic Readiness to revise and redo a search
  22. 22. Boolean logic describes certain logical operations that are used to combine search terms basic Boolean operators are AND, OR and NOT
  23. 23. AND limits results to those items that contain both, or all, of the search terms in the query search query with the AND operator will retrieve only those items containing both all search terms
  24. 24. OR helpful in the first phases of a search especially if the searcher is unsure of what information is available on the topic or what words are used to categorize it when used between two words, it instructs the search tools to retrieve any record containing either of the words
  25. 25. NOT The third of the most common Boolean operators used to eliminate records containing a particular word or combination of words from the search results
  26. 26. Search engine search tips Check the Help files of a search engine Some search engines allow you to apply date restrictions to a search Word order in natural language searching can greatly influence the search A question phrased in difference ways can produce different results An added influence is the weight some search engines place on words located earlier in the search query
  27. 27. + sign ensures that a search engine finds pages that have all the words you enter, not just some of them
  28. 28. - sign a search engine will find pages that have one word on them but not another word
  29. 29. Phrase searching ensures that terms appear in the order they are entered placing the phrase within quotation marks tells the search engine to retrieve pages where the terms appear exactly in the order specified
  30. 30. Web page evaluation Before you leave the list of search results -- before you click and get interested in anything written on the page -- glean all you can from the URLs of each page. choose pages most likely to be reliable and authentic
  31. 31. Main evaluation points Accuracy Authority Objectivity Currency Coverage
  32. 32. Terminology Concept search Full-text index Fuzzy search Index Keyword search Precision
  33. 33. Terminology, cont. Proximity search Query-by-example Recall Relevancy Stemming Stop words Thesaurus
  34. 34. Resources and sources Final tip—links on Web pages may lead to other relevant sites, but be careful of going off on tangents
  35. 35. Resources 20 great Google secrets http://www.pcmag.com/article2/0,4149,1306 756,00.asp WWW Virtual Library http://vlib.org SearchEngineShowdown http://www.searchengineshowdown.com/
  36. 36. Resources Best Search Tools Chart http://www.infopeople.org/search/chart.html Searching the Internet Effectively http://www2.vuw.ac.nz/staff/alastair_smith/sea rching/ Finding Images Online http://www.tasi.ac.uk/resources/searchingresou rces.html FindSounds http://www.findsounds.com/
  37. 37. Resources SwitchBoard http://www.switchboard.com/ AnyWho http://www.anywho.com/ Yahoo! PeopleSearch http://people.yahoo.com/ Web 2.0 http://www.go2web20.net/ Library 2.0 http://instructionwiki.org/Library_2.0_in_15_min utes_a_day
  38. 38. Overall search engine info Best General Search Engines http://www.lib.berkeley.edu/TeachingLib/ Guides/Internet/SearchEngines.html
  39. 39. Questions?

×