7. Explore the deep
web (databases)
www.invisibleweb.com/
www.search.com/
www.webdata.com/
8. PRACTICAL STEPS FOR
WEB RESEARCH:
How to formulate queries
•Identify your concepts
•List keywords for each concept
•Specify the logical relationships
among your keywords
9. TIPS ON CONDUCTING
SEARCHES
1. Read the directions
2. Use Boolean logic
3. Use synonyms
4. Check your spelling
5. Use capitalization if case sensitive
6. Repeat using alternative terms
7. Try different sources within search engines
8. Try different search engines
9. Try search engines which search multiple
search engines
10. What if I have too many
results?
• Field search
• Add concept words
• Use vocab. specific to topic
• Link terms with “and” (+)
• Use term proximity operators
• Enclose phrases within
quotations
• Use “not” to keep out
unwanted terms
11. What if I have too few
results?
• Drop off the least
important concept
• Use more general
vocabulary
• Add alternate terms
or spellings
• Try “related
documents” option
12. Keeping track of where
you have been
•Who
•How
•Where
•What
•Why
•When
13. Your assignment:
Do the internet
activity
Use any of the
search
techniques to
obtain your
information
Keep track of the
who, how, where,
what, why, and
when
14. References
Cohen, L. (2001, May) Conducting Research on
the Internet,
http://library.albany.edu/internet/research.html
Keeping an Internet Activity Log,
http://schoolnet.ca/aboriginal/lessons/surf-log-e.htm
Editor's Notes
If you know the Internet address of a site you wish to visit, you can use a Web browser to access that site. All you need to do is type the URL in the appropriate location window. URL stands for Uniform Resource Locator. The URL specifies the Internet address of the electronic document. Every file on the Internet, no matter what its access protocol, has a unique URL. Web browsers use the URL to retrieve the file from the host computer and the directory in which it resides. This file is then displayed on the user's computer monitor.
This is the format of the URL: &nsp; protocol://host/path/filename
For example:
http://www.house.gov/agriculture/schedule.htm - a hypertext file on the Web
ftp://ftp.uu.net/graphics/picasso - a file at an FTP site
telnet://opac.albany.edu - a Telnet connection
Any of these address can be typed into the location window of a Web browser.
Browsing home pages on the Web is a haphazard but interesting way of finding desired material on the Internet. Because the creator of a home page programs each link, you never know where these links might lead. High quality starting pages will contain high quality links. The University Libraries Web site contains quality links leading into the World Wide Web, and is a good place to start your exploration. This site is located at http://library.albany.edu/.
Join any of the thousands of e-mail discussion groups or Usenet newsgroups. These groups cover a wealth of topics. You can ask questions of the experts and read the answers to questions that others ask. Belonging to these groups is somewhat like receiving a daily newspaper on topics that interest you. These groups provide a good way of keeping up with what is being discussed on the Internet about your subject area. In addition, they can help you find out how to locate information--both online and offline--that you want.
E-mail discussion groups can be associated with academic institutions. Many topics are scholarly in nature, and it is not unusual for experts in the field to be among the participants. In contrast, Usenet newsgroups cover a far wider variety of topics and participants have a range of expertise. Be careful to evaluate the knowledge and opinions offered in any discussion forum. Note also that a small number of e-mail groups are cross-posted as Usenet newsgroups. For example, the early music e-mail group EARLYM-L also exists as the newsgroup rec.music.early.
E-mail discussion groups are managed by software programs. There are three in common use: Listserv, Majordomo, and Listproc. The commands for using these programs are similar.
A list of Usenet newsgroups can be accessed from within a newsreader program. Web browser suites such as Netscape Communicator include a newsreader. This offers the convenience of Usenet access in a graphical environment as a part of the Web experience.
A good Web-based directory to assist in locating e-mail discussion groups and Usenet newsgroups is Liszt, located at http://www.liszt.com/.
An Internet search engine allows the user to enter keywords relating to a topic and retrieve information about Internet sites containing those keywords. Search engines are available for many of the Internet protocols. For example, Archie searches for files stored at anonymous FTP sites.
Search engines located on the World Wide Web have become quite popular as the Web itself has become the Internet's environment of choice. Web search engines have the advantage of offering access to a vast range of information resources located on the Internet. Many search engines compile a database spanning multiple Internet protocols, including HTTP, FTP, and Usenet. They may also search multimedia or other file types on the deep Web, often accessible as separate searches. Web search engines tend to be developed by private companies, though most of them are available free of charge.
A Web search engine service consists of three components:
*Spider: Program that traverses the Web from link to link, identifying and reading pages
*Index: Database containing a copy of each Web page gathered by the spider
*Search engine mechanism: Software that enables users to query the index and that usually returns results in term relevancy ranked order
Keep in mind that spiders are indiscriminate. Be aware that some of the resources they collect may be outdated, inaccurate, or incomplete. Others, of course, may come from responsible sources and provide you with valuable information. Be sure to evaluate all your search results carefully.
With most search engines, you fill out a form with your search terms and then ask that the search proceed. The engine searches its index and generates a page with links to those resources containing some or all of your terms. These resources are usually presented in term ranked order. For example, a document will appear higher in your list of results if your search term appears many times, near the beginning of the document, close together in the document, in the document title, etc. These may be thought of as first generation search engines.
A new development in search engine technology is the ordering of search results by concept, keyword, site, links or popularity. Engines that support these features may be thought of as second generation search engines. These engines offer improvements in the ranking of results. One reason for this is the insertion of the human element in determining what is relevant. For example, Google ranks results according to the number of highly ranked Web pages that link to other pages. A Web page becomes highly ranked if still other highly ranked pages link to them. This scheme represents an intriguing melding of technology and human judgment.
All search engines have rules for formulating queries. It is imperative that you read the help files at the site before proceeding. Online tutorials can also help you learn the rules. A short list of recommended tutorials appears at the end of this file.
Recommended starting points:
1.Start with Google. This is a second generation search engine that ranks pages by the number of links from pages ranked high by the service. These highly ranked pages, in turn, are also determined by the number of links to them. The idea here is that a high quality page will be found and linked to from another high quality page. Many users find that Google does an excellent job of finding Web documents relevant to their topics.
2.Another interesting choice is Direct Hit. Calling itself a Popularity Engine, this second generation engine ranks results according to sites other searchers have chosen from their results to similar queries. The service also tracks roughly how long users spend at the selected sites. Depending on the topic, retrieval can be quite rewarding.
3.Also try Northern Light. This unique engine clusters results into Custom Search Folders, which contain specific subtopics and sites retrieved by your search. Northern Light therefore gives you quick access to aspects of your topic that interest you. Query Server does a similar job and derives its content from multiple search sites.
4.MetaCrawler is a good site to try if your topic is obscure or if you want to retrieve results from a variety of search engines with a single search statement. This service searches multiple search engines simultaneously and offers useful search options. MetaCrawler returns your results in a single list and removes the duplicate files. This type of search processing is called meta searching. Other recommended meta search engines include Ixquick Metasearch and ProFusion.
For a more extensive list of recommended Web search engines, see Internet Search Engines.
An increasing number of universities, libraries, companies, organizations, and even volunteers are creating subject directories to catalog portions of the Internet. These directories are organized by subject and consist of links to Internet resources relating to these subjects. The major subject directories available on the Web tend to have overlapping but different databases. Most directories provide a search capability that allows you to query the database on your topic of interest.
When to use directories? Directories are useful for general topics, for topics that need exploring, and for browsing.
There are two basic types of directories: academic and professional directories often created and maintained by subject experts to support the needs of researchers, and directories contained on commercial portals that cater to the general public and are competing for traffic. Be sure you use the directory that appropriately meets your needs.
*INFOMINE, from the University of California, is a good example of an academic subject directory
*Yahoo! is the most famous example of a commercial portal
Subject directories differ significantly in selectivity. For example, the famous Yahoo! site does not carefully evaluate user-submitted content when adding Web pages to its database. It is therefore NOT a reliable research source and should not be used for this purpose. In contrast, the Argus Clearinghouse selects only a small number of the subject guides submitted for inclusion, and rates them according to a standard. Consider the policies of any directory that you visit. One challenge to this is the fact that not all directory services are willing to disclose either their policies or the names and qualifications of site reviewers. A number of subject directories consist of links accompanied by annotations that describe or evaluate site content. A well-written annotation from a known reviewer is more useful than an annotation written by the site creator as is usually the case with Yahoo!
It is useful to understand that certain directories are the result of many years of intellectual effort. For this reason, it is important to consult subject directories when doing research on the Web.
The University Libraries Web site includes a list of Internet Subject Directories.
Recommended starting points:
*If you want to explore a large number and variety of sources, try The Librarians' Index to the Internet. Supported by a federal grant, a large number of indexers select and annotate Web resources across a broad range of topics. With its extensive but careful selection, objective and useful annotations, and heirarchical organization, LII might well be thought of as "the thinking person's Yahoo."
*The Argus Clearinghouse is one of the highest quality subject directories on the Internet. This site consists of rated collections of recommended sites organized into subject-specific guides. The guide authors are often specialists in the field. This site is highly recommended for academic research.
*The WWW Virtual Library is one of the oldest and most respected subject directories on the Web. This directory consists of individual subject collections, many of which are maintained at universities throughout the world.
*INFOMINE is a large directory of Web sites of scholarly interest compiled by the University of California. The directory may be browsed or searched by subject, keyword, or title. Each site listed is accompanied by a description.
The concept of the "deep" or "invisible" Web has emerged in recent months. This refers to content that is stored in databases accessible on the Web but not available via search engines. In other words, this content is "invisible" to search engines. This is because spiders cannot or will not enter into databases and extract content from them as they can from static Web pages. In the past, these databases were fewer in number and referred to as specialty databases, subject specific databases, and so on.
The only way to access information on the invisible Web is to search the databases themselves. Topical coverage runs the gamut from scholarly resources to commercial entities. Very current, dynamically changing information is likely to be stored in databases, including news, job listings, available airline flights, etc. As the number of Web-accessible databases grows, it will become essential that they be used to conduct successful information finding on the Web.
Other content not gathered by spiders includes non-textual files such as multimedia files, graphical files, and documents in non-standard formats such as Portable Document Format (PDF).
Keep in mind that many search engine sites and commercial portals feature searchable databases as part of their package of services. This phenomenon falls under the heading of converging content. For example, you can visit AltaVista and look up news, maps, jobs, auctions, items for purchase, etc., all things outside the purview of a spider- gathered index. As another example, Google integrates searches of PDF files into its general search service.
Here are a few examples of sites that collect content from the deep Web:
Direct Search
http://gwis2.circ.gwu.edu/~gprice/direct.htm
Large compilation of links to the search interfaces of a wide variety of research resources on the Web compiled by Gary Price of George Washington University [warning: large file]
The Invisible Web
http://www.invisibleweb.com/
Directory of over 10,000 databases, with offers the option to search for the database you need, from IntelliSeek
Search.Com
http://www.search.com/
Dozens of topic-based databases from CNET
WebData
http://www.webdata.com/
Collection of searchable databases on the Web organized into topics maintained by ExperTelligence, Inc.
When conducting any database search, you need to break down your topic into its component concepts. For example, if you want to find information on the budget negotiations between President Clinton and the Republicans, these are your concepts: CLINTON, REPUBLICANS, BUDGET.
Who - The name of the site
How - How you got there - what search engine, etc.
Where - The actual internet address
What - A brief description of what the place is all about
Why - Reasons why you might want to return to this place
When - best time, traffic-wise to go to this site, also when you downloaded it or printed it
Who - The name of the site
How - How you got there - what search engine, etc.
Where - The actual internet address
What - A brief description of what the place is all about
Why - Reasons why you might want to return to this place
When - best time, traffic-wise to go to this site, also when you downloaded it or printed it