Session5
Upcoming SlideShare
Loading in...5
×
 

Session5

on

  • 1,778 views

Presentation for the Feb. 27 class on the Internet and searching

Presentation for the Feb. 27 class on the Internet and searching

Statistics

Views

Total Views
1,778
Views on SlideShare
1,778
Embed Views
0

Actions

Likes
0
Downloads
41
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Session5 Session5 Presentation Transcript

  • Search Engines and Online Research February 27, 2008 IST 523 Denise A. Garofalo
  • What are search engines? designed to make surfing the web simple, fast and rewarding for Internet users designed to search out Web pages one at a time and collect the results
  • What do search engines do? gather together information store it in a database allow access to a list of individual pages based on: a word, or, set of words that you submit in the form of a query View slide
  • How do search engines work? they send out computer programs known as “spiders” or “robotsquot; to search the web interested in reading and storing the actual text that is shown on a web page, not graphics, etc. View slide
  • How, continued…. spider begins by visiting a single Web page it saves the text that if finds there after it has collected the information on that page, it looks for a link that will take it to another page when it reaches the next page, it starts the process all over again by following these steps over and over again, search engines are able to find and index far more web pages than a human being
  • More how….. search engines setup spiders to begin their searches at web sites known as directories large web sites that contain lists of links that have been collected by human beings no way for spiders to find every page listed on the World Wide Web millions of web pages do not have any links to them from other sites without these links, spiders can’t find and index those pages
  • How do search engines show the results? Sites are ranked based on the textual content of a web page A special set of criteria, or algorithm, is used to decide which pages to display Algorithms consider things like the title of the page, the text of the page, how many other web sites link to the page, and even what text web sites that link to a page use to describe it
  • Search engines--review a series of computer programs that find and save files at a very fast rate when combined with algorithms designed to sort content based on text queries search engines become a useful tool to find a little bit of information in that vast collection of files known as the World Wide Web
  • Which search engine is best? Need to understand how each search engine works Check out the Bruce Clay, Inc. search engine relationship chart: http://www.bruceclay.com/searchenginere lationshipchart.htm
  • Invisible Web (or Deep Web) Some pages and links are excluded from most search engines by policy Others are excluded because search engine spiders cannot access them. Pages that are excluded are referred to as the “Invisible Web” (or “Deep Web”) you don't see these pages in search engine results estimated to be two to three or more times larger than the “visible web”
  • Why invisible pages? If a search engine doesn’t locate a Web page it’s because: Technical barriers prohibit access Choices or decisions made by the search engine (policy) exclude the page
  • Technical barriers Typing or judgment is required Searchable specialized databases Logins and/or passwords required
  • Policy issues Page format Non-HTML pages Script-based programs (those URLs with a “?”)
  • Research issues Different search tools give different results Failure to retrieve does not mean that there is nothing available Develop a search strategy Learn the search engine’s search tips Evaluation
  • A selection of engines •Google www.google.com •Vivisimo www.vivisimo.com •Ask.com www.ask.com •Yahoo! www.yahoo.com •Open Directory www.dmoz.org •Ixquick www.ixquick.com/ •Mamma www.mamma.com/ •Gigablast www.gigablast.com/ Search.com www.search.com/
  • Failure to retrieve crawling Web pages and locating sites for search engines is based on using links from one page to reach other pages to crawl documents with few links tend to be overlooked if pages are never discovered, they are not available to researchers Failure to retrieve can also be linked to the search query used, or search strategy
  • Search strategy three main considerations in the search process Relevance Precision Recall
  • Successful search strategy ability to create an exact match between search statement and documents sought size and content of the search engine selected search engine’s search tools
  • Process involves consultation of definition tools subject dictionaries thesauri, etc. subject familiarization i.e. if searching on medical topics, become familiar with basic terminology same goes for research in any other subject area
  • Formulating a strategy be logical spend time on search term selection and combining to reduce the time spent eliminating irrelevant search results search engines are good for searching on unusual or unique keywords, and for combining keywords be creative and flexible look for subtle connections be prepared to make intuitive leaps
  • Simplified search strategy Formulation of the research question and its scope Identification of concepts within the question Identification of search terms to describe those concepts Consideration of synonyms and variations of those terms Preparation of the search logic Readiness to revise and redo a search
  • Boolean logic describes certain logical operations that are used to combine search terms basic Boolean operators are AND, OR and NOT
  • AND limits results to those items that contain both, or all, of the search terms in the query search query with the AND operator will retrieve only those items containing both all search terms
  • OR helpful in the first phases of a search especially if the searcher is unsure of what information is available on the topic or what words are used to categorize it when used between two words, it instructs the search tools to retrieve any record containing either of the words
  • NOT The third of the most common Boolean operators used to eliminate records containing a particular word or combination of words from the search results
  • Search engine search tips Check the Help files of a search engine Some search engines allow you to apply date restrictions to a search Word order in natural language searching can greatly influence the search A question phrased in difference ways can produce different results An added influence is the weight some search engines place on words located earlier in the search query
  • + sign ensures that a search engine finds pages that have all the words you enter, not just some of them
  • - sign a search engine will find pages that have one word on them but not another word
  • Phrase searching ensures that terms appear in the order they are entered placing the phrase within quotation marks tells the search engine to retrieve pages where the terms appear exactly in the order specified
  • Web page evaluation Before you leave the list of search results -- before you click and get interested in anything written on the page -- glean all you can from the URLs of each page. choose pages most likely to be reliable and authentic
  • Main evaluation points Accuracy Authority Objectivity Currency Coverage
  • Terminology Concept search Full-text index Fuzzy search Index Keyword search Precision
  • Terminology, cont. Proximity search Query-by-example Recall Relevancy Stemming Stop words Thesaurus
  • Resources and sources Final tip—links on Web pages may lead to other relevant sites, but be careful of going off on tangents
  • Resources 20 great Google secrets http://www.pcmag.com/article2/0,4149,1306 756,00.asp WWW Virtual Library http://vlib.org SearchEngineShowdown http://www.searchengineshowdown.com/
  • Resources Best Search Tools Chart http://www.infopeople.org/search/chart.html Searching the Internet Effectively http://www2.vuw.ac.nz/staff/alastair_smith/sea rching/ Finding Images Online http://www.tasi.ac.uk/resources/searchingresou rces.html FindSounds http://www.findsounds.com/
  • Resources SwitchBoard http://www.switchboard.com/ AnyWho http://www.anywho.com/ Yahoo! PeopleSearch http://people.yahoo.com/ Web 2.0 http://www.go2web20.net/ Library 2.0 http://instructionwiki.org/Library_2.0_in_15_min utes_a_day
  • Overall search engine info Best General Search Engines http://www.lib.berkeley.edu/TeachingLib/ Guides/Internet/SearchEngines.html
  • Questions?