Internet Research:  Finding Websites, Blogs, Wikis, and More
The Internet vs. the Web   Internet :  “the world’s largest computer network made up of millions of computers.  It’s really nothing more than the ‘plumbing’ that allows information of various kinds to flow from computer to computer around the world.” Web :  “one of many interfaces to the Internet, making it easy to retrieve text, pictures, and multimedia files from computers without having to know complicated commands.” Other Internet protocols and interfaces include e-mail, chat rooms and bulletin boards, internet mailing lists, newsgroups, and databases accessed via Web interfaces.
Search Engines “ databases containing full-text indexes of Web pages” like white pages
Issues with Search Engines The cost of crawling can be high. Web crawlers are “dumb.” Users can have unrealistic expectations and limited skills. Because people want immediate results, they cannot be thorough. Search engines are biased toward text—though this is changing.
Main Functional Parts of a Search Engine Crawler or spider – a computer program that “crawls” a website and sends information back to the database Database – collection of information from websites crawled Indexing program – a program that indexes words in the database Retrieval engine – the computer program that takes your keywords and brings back the hits HTML interface – what you see on the search engine’s website
Typical Retrieval and Ranking Factors   Popularity of the page Frequency of terms Number of query terms that are matched Rarity of terms Weighting by field Proximity of terms Weighting according to the order in which the searcher entered terms Word variants (and/or truncation) Case-sensitivity Analysis of documents in database Relevance feedback applied to retrieved records Date
Comparing Results from Major Search Engines Thumbshots.com Ranking
Information Needed for Reviews of Internet Search Tools* Default operation Advanced searching Operators (Boolean, proximity, truncation, etc.) Case sensitivity Field searching Limiters Stop words Sorting/ranking Display Other features Strengths and weaknesses * See  Search Engine Features Chart  for explanations
Sample Search Engine AOL Search
List of Search Engines 4R x T Wiki:  Search Engines
Meta and Multi Search Engines Both meta and multi search engines search other search engines, directories, and so on rather than their own databases. A meta search engine combines results from the search. A multi search engine displays results from each database separately.
Additional Information for Metasearch Engine Demonstrations Databases (search engines, directories, etc.) available for searching
Sample Metasearch Engine Search.com
List of Meta and Multi Search Engines 4R x T Wiki:  Meta and Multi Search Engines
Web Directories Web directories are “collections of links to Web pages and sites that are arranged by subject” like yellow pages
Web Directory Models Closed models rely on paid workers to choose links and are subject to some quality control Yahoo! About.com Open model directories rely on volunteers and can develop quality-control problems Open Directory Project
Issues with Web Directories Directories are inherently small. They may have unseen editorial policies. They are not always current. They may provide lopsided coverage. They may charge for listings.
Advantages to Web Directories Human beings are involved in assigning web sites to specific categories making your hits more relevant. The databases are small, so you get fewer hits.
Sample Directory Yahoo! Directory
List of Directories 4R x T Wiki:  Directories
Invisible Web “ Text  pages, files, or other often high-quality authoritative information available via the World Wide Web that general-purpose search engines cannot, due to technical limitations, or will not, due to deliberate choice, add to their indices of Web pages.  Sometimes also referred to as the ‘Deep Web’ or ‘dark matter.’”
Four Types of Invisibility Opaque Web— “files that can be, but are not, included in search engine indices” Private Web — “technically indexable Web pages that have deliberately been excluded from search engines” Proprietary Web — “content that’s only accessible to users willing to register to use [it]” Truly Invisible Web—material that cannot be indexed by a search engine’s web crawler for technical research
Sample Invisible Web Search Tool IncyWincy:  The Invisible Web Search Engine
List of Invisible Web Directories and Search Engines  4R x T Wiki:  Invisible Web Search Tools
Weblogs A weblog is  “a Web site with frequent, dated entries listed in reverse chronological order. The entries have links and commentary and often an opportunity for others to comment.” “ Enter the Web log. Quickly conjugated to "Weblog," the shift of a space makes "we blog," and the shortened version is "blog." It has become the "in" technology of the moment on the Net.”
Advantages and Disadvantages of Blogs “ Despite the many purely personal-focused blogs and opinionated pontificating of others, Weblogs offer access to breaking news, rumors, evaluations, and other information that might not otherwise be readily available from our traditional databases. Above and beyond their information value, the software for creating blogs is basic content management software, and it can fulfill purposes well beyond the keeping of an online diary.”
Sample Blog ResourceShelf
Sample Blog Search Engine Google Blog Search
List of Blog Search Engines 4R x T Wiki:  Blog and Social Media Search Engines
Wikis A wiki is “type of website that allows the visitors themselves to easily add, remove and otherwise edit and change some available content, sometimes without the need for registration. This ease of interaction and operation makes a wiki an effective tool for collaborative authoring. The term wiki can also refer to the collaborative software itself (wiki engine) that facilitates the operation of such a website, or to certain specific wiki sites, including the computer science site (an original wiki), WikiWikiWeb, and the online encyclopedias such as Wikipedia.”
Sample Wikis Wiki Wiki Web Wookieepedia
List of Wiki Directories and Search Engines 4R x T Wiki:  Wiki Directories and Search Engines
Web Rings “ Similar sites are grouped together in rings and each site is linked to another by a simple navigation bar. Rings form a concentration of sites, allowing visitors to quickly find what they are looking for. Each Ring is created and maintained by an individual web site owner called the RingMaster. RingMasters determine the look and feel of the Ring, approve and manage member sites, and encourage other sites to join. RingMasters help to develop virtual communities based on the Ring topic.”
Finding Web Rings WebRing Directory and Online Community Ringlink Webring Directory
Finding Listservs and Groups CataList Google Groups Ning Social Networks Yahoo! Groups
Finding Message Boards and Forums BoardReader.com
Finding Websites Using Social Bookmarking Services Delicious  (search box on main page) Explore tags Digg Furl StumbleUpon Wikipedia's list of social bookmarking sites
Bookmark Search Engines thagoo/ Xmarks
Using Google Alerts Google Alerts
Miscellaneous Browsys Finder findingDulcinea iResearch Reporter Joongel Symbaloo
List of Other Search Tools 4R x T Wiki:  Other Search Tools
Sources Curling, Cindy.  “A Closer Look at Weblogs.”  LLRX.com  15 Oct. 2001.  8 July 2002 <http://www.llrx.com/columns/ notes46.htm>. Notes, Greg R.  “The Blog Realm:  News Sources, Searching with Daypop, and Content Management.”  Online  26.5 (Sep./Oct. 2002).  20 June 2003 <http://www.infotoday.com/ online/sep02/OnTheNet.htm>.   Sherman, Chris, and Gary Price.  The Invisible Web:  Uncovering Sources Search Engines Can’t See .  Medford, NJ:  Information Today-CyberAge Books, 2001. “ Wiki.”  8 Oct. 2006.  Wikipedia, the Free Encyclopedia .  8 Oct. 2006 <http://en.wikipedia.org/wiki/Wiki>.

Internet Research: Finding Websites, Blogs, Wikis, and More

  • 1.
    Internet Research: Finding Websites, Blogs, Wikis, and More
  • 2.
    The Internet vs.the Web Internet : “the world’s largest computer network made up of millions of computers. It’s really nothing more than the ‘plumbing’ that allows information of various kinds to flow from computer to computer around the world.” Web : “one of many interfaces to the Internet, making it easy to retrieve text, pictures, and multimedia files from computers without having to know complicated commands.” Other Internet protocols and interfaces include e-mail, chat rooms and bulletin boards, internet mailing lists, newsgroups, and databases accessed via Web interfaces.
  • 3.
    Search Engines “databases containing full-text indexes of Web pages” like white pages
  • 4.
    Issues with SearchEngines The cost of crawling can be high. Web crawlers are “dumb.” Users can have unrealistic expectations and limited skills. Because people want immediate results, they cannot be thorough. Search engines are biased toward text—though this is changing.
  • 5.
    Main Functional Partsof a Search Engine Crawler or spider – a computer program that “crawls” a website and sends information back to the database Database – collection of information from websites crawled Indexing program – a program that indexes words in the database Retrieval engine – the computer program that takes your keywords and brings back the hits HTML interface – what you see on the search engine’s website
  • 6.
    Typical Retrieval andRanking Factors Popularity of the page Frequency of terms Number of query terms that are matched Rarity of terms Weighting by field Proximity of terms Weighting according to the order in which the searcher entered terms Word variants (and/or truncation) Case-sensitivity Analysis of documents in database Relevance feedback applied to retrieved records Date
  • 7.
    Comparing Results fromMajor Search Engines Thumbshots.com Ranking
  • 8.
    Information Needed forReviews of Internet Search Tools* Default operation Advanced searching Operators (Boolean, proximity, truncation, etc.) Case sensitivity Field searching Limiters Stop words Sorting/ranking Display Other features Strengths and weaknesses * See Search Engine Features Chart for explanations
  • 9.
  • 10.
    List of SearchEngines 4R x T Wiki: Search Engines
  • 11.
    Meta and MultiSearch Engines Both meta and multi search engines search other search engines, directories, and so on rather than their own databases. A meta search engine combines results from the search. A multi search engine displays results from each database separately.
  • 12.
    Additional Information forMetasearch Engine Demonstrations Databases (search engines, directories, etc.) available for searching
  • 13.
  • 14.
    List of Metaand Multi Search Engines 4R x T Wiki: Meta and Multi Search Engines
  • 15.
    Web Directories Webdirectories are “collections of links to Web pages and sites that are arranged by subject” like yellow pages
  • 16.
    Web Directory ModelsClosed models rely on paid workers to choose links and are subject to some quality control Yahoo! About.com Open model directories rely on volunteers and can develop quality-control problems Open Directory Project
  • 17.
    Issues with WebDirectories Directories are inherently small. They may have unseen editorial policies. They are not always current. They may provide lopsided coverage. They may charge for listings.
  • 18.
    Advantages to WebDirectories Human beings are involved in assigning web sites to specific categories making your hits more relevant. The databases are small, so you get fewer hits.
  • 19.
  • 20.
    List of Directories4R x T Wiki: Directories
  • 21.
    Invisible Web “Text pages, files, or other often high-quality authoritative information available via the World Wide Web that general-purpose search engines cannot, due to technical limitations, or will not, due to deliberate choice, add to their indices of Web pages. Sometimes also referred to as the ‘Deep Web’ or ‘dark matter.’”
  • 22.
    Four Types ofInvisibility Opaque Web— “files that can be, but are not, included in search engine indices” Private Web — “technically indexable Web pages that have deliberately been excluded from search engines” Proprietary Web — “content that’s only accessible to users willing to register to use [it]” Truly Invisible Web—material that cannot be indexed by a search engine’s web crawler for technical research
  • 23.
    Sample Invisible WebSearch Tool IncyWincy: The Invisible Web Search Engine
  • 24.
    List of InvisibleWeb Directories and Search Engines 4R x T Wiki: Invisible Web Search Tools
  • 25.
    Weblogs A weblogis “a Web site with frequent, dated entries listed in reverse chronological order. The entries have links and commentary and often an opportunity for others to comment.” “ Enter the Web log. Quickly conjugated to &quot;Weblog,&quot; the shift of a space makes &quot;we blog,&quot; and the shortened version is &quot;blog.&quot; It has become the &quot;in&quot; technology of the moment on the Net.”
  • 26.
    Advantages and Disadvantagesof Blogs “ Despite the many purely personal-focused blogs and opinionated pontificating of others, Weblogs offer access to breaking news, rumors, evaluations, and other information that might not otherwise be readily available from our traditional databases. Above and beyond their information value, the software for creating blogs is basic content management software, and it can fulfill purposes well beyond the keeping of an online diary.”
  • 27.
  • 28.
    Sample Blog SearchEngine Google Blog Search
  • 29.
    List of BlogSearch Engines 4R x T Wiki: Blog and Social Media Search Engines
  • 30.
    Wikis A wikiis “type of website that allows the visitors themselves to easily add, remove and otherwise edit and change some available content, sometimes without the need for registration. This ease of interaction and operation makes a wiki an effective tool for collaborative authoring. The term wiki can also refer to the collaborative software itself (wiki engine) that facilitates the operation of such a website, or to certain specific wiki sites, including the computer science site (an original wiki), WikiWikiWeb, and the online encyclopedias such as Wikipedia.”
  • 31.
    Sample Wikis WikiWiki Web Wookieepedia
  • 32.
    List of WikiDirectories and Search Engines 4R x T Wiki: Wiki Directories and Search Engines
  • 33.
    Web Rings “Similar sites are grouped together in rings and each site is linked to another by a simple navigation bar. Rings form a concentration of sites, allowing visitors to quickly find what they are looking for. Each Ring is created and maintained by an individual web site owner called the RingMaster. RingMasters determine the look and feel of the Ring, approve and manage member sites, and encourage other sites to join. RingMasters help to develop virtual communities based on the Ring topic.”
  • 34.
    Finding Web RingsWebRing Directory and Online Community Ringlink Webring Directory
  • 35.
    Finding Listservs andGroups CataList Google Groups Ning Social Networks Yahoo! Groups
  • 36.
    Finding Message Boardsand Forums BoardReader.com
  • 37.
    Finding Websites UsingSocial Bookmarking Services Delicious (search box on main page) Explore tags Digg Furl StumbleUpon Wikipedia's list of social bookmarking sites
  • 38.
  • 39.
    Using Google AlertsGoogle Alerts
  • 40.
    Miscellaneous Browsys FinderfindingDulcinea iResearch Reporter Joongel Symbaloo
  • 41.
    List of OtherSearch Tools 4R x T Wiki: Other Search Tools
  • 42.
    Sources Curling, Cindy. “A Closer Look at Weblogs.” LLRX.com 15 Oct. 2001. 8 July 2002 <http://www.llrx.com/columns/ notes46.htm>. Notes, Greg R. “The Blog Realm: News Sources, Searching with Daypop, and Content Management.” Online 26.5 (Sep./Oct. 2002). 20 June 2003 <http://www.infotoday.com/ online/sep02/OnTheNet.htm>. Sherman, Chris, and Gary Price. The Invisible Web: Uncovering Sources Search Engines Can’t See . Medford, NJ: Information Today-CyberAge Books, 2001. “ Wiki.” 8 Oct. 2006. Wikipedia, the Free Encyclopedia . 8 Oct. 2006 <http://en.wikipedia.org/wiki/Wiki>.