Search Engines
and Online Research

   February 27, 2008
   IST 523
   Denise A. Garofalo
What are search engines?
 designed to make surfing the web
 simple, fast and rewarding for
 Internet users
 designed to se...
What do search engines do?
 gather together information
 store it in a database
 allow access to a list of individual
 pag...
How do search engines work?
 they send out computer programs
 known as “spiders” or “robotsquot; to
 search the web
 inter...
How, continued….
 spider begins by visiting a single Web
 page
   it saves the text that if finds there
   after it has co...
More how…..
 search engines setup spiders to begin
 their searches at web sites known as
 directories
   large web sites t...
How do search engines show
the results?
 Sites are ranked based on the textual
 content of a web page
 A special set of cr...
Search engines--review
 a series of computer programs that find
 and save files at a very fast rate
 when combined with al...
Which search engine is best?
  Need to understand how each search
  engine works
  Check out the Bruce Clay, Inc. search
 ...
Invisible Web (or Deep Web)
 Some pages and links are excluded
 from most search engines by policy
 Others are excluded be...
Why invisible pages?
 If a search engine doesn’t locate a
 Web page it’s because:
   Technical barriers prohibit access
  ...
Technical barriers
 Typing or judgment is required
   Searchable specialized databases
   Logins and/or passwords required
Policy issues
 Page format
 Non-HTML pages
 Script-based programs (those URLs
 with a “?”)
Research issues
 Different search tools give
 different results
 Failure to retrieve does not mean
 that there is nothing ...
A selection of engines
 •Google           www.google.com
 •Vivisimo         www.vivisimo.com
 •Ask.com          www.ask.co...
Failure to retrieve
 crawling Web pages and locating sites for
 search engines is based on using links from
 one page to r...
Search strategy
 three main considerations in the search
 process
   Relevance
   Precision
   Recall
Successful search strategy
 ability to create an exact match
 between search statement and
 documents sought
 size and con...
Process
 involves consultation of definition tools
   subject dictionaries
   thesauri, etc.
 subject familiarization
   i...
Formulating a strategy
 be logical
 spend time on search term selection and
 combining to reduce the time spent
 eliminati...
Simplified search strategy
 Formulation of the research question and its
 scope
 Identification of concepts within the que...
Boolean logic
 describes certain logical operations that
 are used to combine search terms
 basic Boolean operators are AN...
AND
limits results to those items that contain
both, or all, of the search terms in the
query
search query with the AND op...
OR
helpful in the first phases of a search
  especially if the searcher is unsure of what
  information is available on th...
NOT
The third of the most common Boolean
operators
used to eliminate records containing a
particular word or combination o...
Search engine search tips
 Check the Help files of a search engine
 Some search engines allow you to apply date
 restricti...
+ sign
 ensures that a search engine finds
 pages that have all the words you
 enter, not just some of them
- sign
 a search engine will find pages
 that have one word on them but
 not another word
Phrase searching
 ensures that terms appear in the order
 they are entered
 placing the phrase within quotation
 marks tel...
Web page evaluation
   Before you leave the list of search
  results -- before you click and get
  interested in anything ...
Main evaluation points
 Accuracy
 Authority
 Objectivity
 Currency
 Coverage
Terminology
 Concept search
 Full-text index
 Fuzzy search
 Index
 Keyword search
 Precision
Terminology, cont.
 Proximity search
 Query-by-example
 Recall
 Relevancy
 Stemming
 Stop words
 Thesaurus
Resources and sources
 Final tip—links on Web pages may lead
 to other relevant sites, but be careful of
 going off on tan...
Resources
 20 great Google secrets
 http://www.pcmag.com/article2/0,4149,1306
 756,00.asp
 WWW Virtual Library
 http://vli...
Resources
  Best Search Tools Chart
  http://www.infopeople.org/search/chart.html
  Searching the Internet Effectively
htt...
Resources
  SwitchBoard
  http://www.switchboard.com/
  AnyWho http://www.anywho.com/
  Yahoo! PeopleSearch
  http://peopl...
Overall search engine info
  Best General Search Engines
http://www.lib.berkeley.edu/TeachingLib/
  Guides/Internet/Search...
Questions?
Upcoming SlideShare
Loading in...5
×

Session5

1,542

Published on

Presentation for the Feb. 27 class on the Internet and searching

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,542
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
46
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Session5"

  1. 1. Search Engines and Online Research February 27, 2008 IST 523 Denise A. Garofalo
  2. 2. What are search engines? designed to make surfing the web simple, fast and rewarding for Internet users designed to search out Web pages one at a time and collect the results
  3. 3. What do search engines do? gather together information store it in a database allow access to a list of individual pages based on: a word, or, set of words that you submit in the form of a query
  4. 4. How do search engines work? they send out computer programs known as “spiders” or “robotsquot; to search the web interested in reading and storing the actual text that is shown on a web page, not graphics, etc.
  5. 5. How, continued…. spider begins by visiting a single Web page it saves the text that if finds there after it has collected the information on that page, it looks for a link that will take it to another page when it reaches the next page, it starts the process all over again by following these steps over and over again, search engines are able to find and index far more web pages than a human being
  6. 6. More how….. search engines setup spiders to begin their searches at web sites known as directories large web sites that contain lists of links that have been collected by human beings no way for spiders to find every page listed on the World Wide Web millions of web pages do not have any links to them from other sites without these links, spiders can’t find and index those pages
  7. 7. How do search engines show the results? Sites are ranked based on the textual content of a web page A special set of criteria, or algorithm, is used to decide which pages to display Algorithms consider things like the title of the page, the text of the page, how many other web sites link to the page, and even what text web sites that link to a page use to describe it
  8. 8. Search engines--review a series of computer programs that find and save files at a very fast rate when combined with algorithms designed to sort content based on text queries search engines become a useful tool to find a little bit of information in that vast collection of files known as the World Wide Web
  9. 9. Which search engine is best? Need to understand how each search engine works Check out the Bruce Clay, Inc. search engine relationship chart: http://www.bruceclay.com/searchenginere lationshipchart.htm
  10. 10. Invisible Web (or Deep Web) Some pages and links are excluded from most search engines by policy Others are excluded because search engine spiders cannot access them. Pages that are excluded are referred to as the “Invisible Web” (or “Deep Web”) you don't see these pages in search engine results estimated to be two to three or more times larger than the “visible web”
  11. 11. Why invisible pages? If a search engine doesn’t locate a Web page it’s because: Technical barriers prohibit access Choices or decisions made by the search engine (policy) exclude the page
  12. 12. Technical barriers Typing or judgment is required Searchable specialized databases Logins and/or passwords required
  13. 13. Policy issues Page format Non-HTML pages Script-based programs (those URLs with a “?”)
  14. 14. Research issues Different search tools give different results Failure to retrieve does not mean that there is nothing available Develop a search strategy Learn the search engine’s search tips Evaluation
  15. 15. A selection of engines •Google www.google.com •Vivisimo www.vivisimo.com •Ask.com www.ask.com •Yahoo! www.yahoo.com •Open Directory www.dmoz.org •Ixquick www.ixquick.com/ •Mamma www.mamma.com/ •Gigablast www.gigablast.com/ Search.com www.search.com/
  16. 16. Failure to retrieve crawling Web pages and locating sites for search engines is based on using links from one page to reach other pages to crawl documents with few links tend to be overlooked if pages are never discovered, they are not available to researchers Failure to retrieve can also be linked to the search query used, or search strategy
  17. 17. Search strategy three main considerations in the search process Relevance Precision Recall
  18. 18. Successful search strategy ability to create an exact match between search statement and documents sought size and content of the search engine selected search engine’s search tools
  19. 19. Process involves consultation of definition tools subject dictionaries thesauri, etc. subject familiarization i.e. if searching on medical topics, become familiar with basic terminology same goes for research in any other subject area
  20. 20. Formulating a strategy be logical spend time on search term selection and combining to reduce the time spent eliminating irrelevant search results search engines are good for searching on unusual or unique keywords, and for combining keywords be creative and flexible look for subtle connections be prepared to make intuitive leaps
  21. 21. Simplified search strategy Formulation of the research question and its scope Identification of concepts within the question Identification of search terms to describe those concepts Consideration of synonyms and variations of those terms Preparation of the search logic Readiness to revise and redo a search
  22. 22. Boolean logic describes certain logical operations that are used to combine search terms basic Boolean operators are AND, OR and NOT
  23. 23. AND limits results to those items that contain both, or all, of the search terms in the query search query with the AND operator will retrieve only those items containing both all search terms
  24. 24. OR helpful in the first phases of a search especially if the searcher is unsure of what information is available on the topic or what words are used to categorize it when used between two words, it instructs the search tools to retrieve any record containing either of the words
  25. 25. NOT The third of the most common Boolean operators used to eliminate records containing a particular word or combination of words from the search results
  26. 26. Search engine search tips Check the Help files of a search engine Some search engines allow you to apply date restrictions to a search Word order in natural language searching can greatly influence the search A question phrased in difference ways can produce different results An added influence is the weight some search engines place on words located earlier in the search query
  27. 27. + sign ensures that a search engine finds pages that have all the words you enter, not just some of them
  28. 28. - sign a search engine will find pages that have one word on them but not another word
  29. 29. Phrase searching ensures that terms appear in the order they are entered placing the phrase within quotation marks tells the search engine to retrieve pages where the terms appear exactly in the order specified
  30. 30. Web page evaluation Before you leave the list of search results -- before you click and get interested in anything written on the page -- glean all you can from the URLs of each page. choose pages most likely to be reliable and authentic
  31. 31. Main evaluation points Accuracy Authority Objectivity Currency Coverage
  32. 32. Terminology Concept search Full-text index Fuzzy search Index Keyword search Precision
  33. 33. Terminology, cont. Proximity search Query-by-example Recall Relevancy Stemming Stop words Thesaurus
  34. 34. Resources and sources Final tip—links on Web pages may lead to other relevant sites, but be careful of going off on tangents
  35. 35. Resources 20 great Google secrets http://www.pcmag.com/article2/0,4149,1306 756,00.asp WWW Virtual Library http://vlib.org SearchEngineShowdown http://www.searchengineshowdown.com/
  36. 36. Resources Best Search Tools Chart http://www.infopeople.org/search/chart.html Searching the Internet Effectively http://www2.vuw.ac.nz/staff/alastair_smith/sea rching/ Finding Images Online http://www.tasi.ac.uk/resources/searchingresou rces.html FindSounds http://www.findsounds.com/
  37. 37. Resources SwitchBoard http://www.switchboard.com/ AnyWho http://www.anywho.com/ Yahoo! PeopleSearch http://people.yahoo.com/ Web 2.0 http://www.go2web20.net/ Library 2.0 http://instructionwiki.org/Library_2.0_in_15_min utes_a_day
  38. 38. Overall search engine info Best General Search Engines http://www.lib.berkeley.edu/TeachingLib/ Guides/Internet/SearchEngines.html
  39. 39. Questions?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×