2. What is a search engine?
Multiple servers that run a program called
a spider or a crawler
Crawlers build an index of web sites
They follow the links on the website, and
crawl those pages
You can search the index by keyword
matching
Search engines don’t search the web, they
search the index, so current events may
not be indexed yet
3. Surfing the Index
Search engines don’t search the web,
they search the index, so current
events may not be indexed yet
4. Directories can often be more
fruitful than search engines
Because directories are created by people
and not programs...
They are smaller but reflect evaluated
material instead of 'all of the web'
They are even less likely to be up to date
They usually take you to the front door of
a website
They are organized in a thoughtful manner
so you can browse
5. Directory Example
Open Directory Project
It helps you to think logically about
the information you need—the
structure is already there, you just
have to follow it
7. Number of pages indexed
The more pages indexed, the more likely
you are to find what you need
Hard to find a needle in a haystack if the
haystack is dumped on your head
Google— 4.3 Billion
Yahoo— 3.2 Billion
Teoma— 1 Billion
AltaVista and AllTheWeb—Acquired by
Yahoo and no longer available
Source: Infopeopleproject.org
8. Google’s Index
Half the searchable web, so perhaps
8 billion searchable pages are out
there
9. What do search engines not
search?
Private Databases
• Not fixed URLs
• Professional, academic
• Example at the end of presentation
(MERLOT)
Ask Jeeves only lists customers that
pay to have their site indexed
10. Market Share
comScore Media Metrix Search Engine Ratings
11. Which sites use what engine?
Google uses Google owned by Google
Yahoo uses Yahoo owned by Yahoo, but they
used to use Google, and they did recently
acquire Inktomi
AOL uses Google & Open Directory and
owned by AOL
AltaVista uses Open Directory and Yahoo
and owned by Yahoo
AlltheWeb uses and is owned by Yahoo
HotBot uses Google owned by Lycos
12. Why do we care?
If two sites use the same engine, you’ll
get the same results
comScore Media Metrix Search Engine Ratings
13. Two sites, same results
AltaVista—apple pie AltaVista
found 2,410,000 results
Yahoo.com--Results 1 - 10 of about
2,410,000 for apple pie
14. What does that mean?
Because there are basically two
forces now in the search engine
world, based on market share, index
size and unique searching
technologies, general searches are
best done at either Yahoo or Google
15. How does Google match websites?
Page Rank
• Google interprets a link from page A to
page B as a vote by page A for page B.
It also analyzes the page that casts the
vote. If it’s important page (many links
to it), its vote counts more heavily
Text Matching
• A page has to be both important (Page
Rank) and relevant (text-matching) to
be at the top of the list
16. Matching (continued)
When engines rank results related to text
matching, the location and frequency of
the text string plays into account
Pages with the phrase 'apple pie' will rank
higher than pages that mention both
terms separately
Pages that mention apple pie repeatedly
rank high than pages with fewer
occurrences
Pages with apple pie in the title of the
page rank higher
17. Title Tag
When constructing a web page, the
title tag is important
Search engines look at them
18. Example of Title Tag Code
http://campuslife.wlu.edu
Source code: <title>Orientation
Programs--Washington and Lee
University</title>
In FrontPage, File/Save As/File
Name/Title
Google Search for New Page 1,
17,600,000
19. Meta Tag Searching
Google does not search Meta Tags,
too much “meta tag spam”
Inktomi was the last major search
engine that used it, now they have
been bought by Yahoo
Teoma might use meta tags
20. Meta Tag
<head>
<TITLE>Revisiting Meta Tags</title>
<META NAME="authors" CONTENT=" Danny
Sullivan">
<META NAME="date" CONTENT="20021205">
<META NAME="channel" CONTENT="internet
technology">
<META NAME="description" CONTENT="Follow
up to October 2002 article about the demise of
the meta keywords tag.">
</head>
21. Keyword Searching
Be as specific as you can
Don’t use “car” if you can use “Toyota”
Search engines have a hard time
differentiating between differences in
meaning, i.e., hard exam, hard cider, hard
times, hard drive
It can’t think for you—if you put in “heart
attack”, it won’t show pages with “cardiac
arrest”
22. Boolean Searching
George Boole, English
Mathematician, Died 1864-logical
combinatorial system
AND, OR, NOT
Used to get more targeted results
Default Operator is AND at all major
search engines, so if you type in
apple pie, sites assume “apple AND
pie”
23. Using Boolean Operators at
Google
Default Operator is AND
apple pie— 1,710,000
apple AND pie (+pie)—1,690,000,
default operator message, but it
does take into account word order
Fewer results, perhaps a little more
useful
24. Boolean Operator OR
apple OR pie—7,140,000
Use this if you don’t want to rule out
too much
Asthma, acute OR chronic
25. Boolean Operator NOT
apple NOT pie (–pie)
What will NOT do to the search
results?
—816,000
Lessened results by half
How could you use NOT to search for
information about Bass fishing?
bass NOT guitar (when you want the
fish)
26. Be Careful with “NOT”
A search for 'apple pie NOT cobbler'
may remove useful results such as
"Aunt Sarah's Better Than Cobbler
Apple Pie"
28. Domain Restrict
apple pie site:www.allrecipes.com—733
More appropriate example:
admissions information—3,730,000
admissions information site:www.wlu.edu
68
www.wlu.edu, search
29. Exact Search
How do you get results that match
exactly?
Use quotation marks, i.e., “apple pie”
696,000 on Google
30. AltaVista & Google Cool Feature
Link—find out how many indexed
pages link to your page
http://www.altavista.com
link:leechapel.wlu.edu
AltaVista—92 (searches Yahoo)
Google—33
31. Cached Items
Google “takes a picture” (indexes a
site)
As web sites often do, the site goes
away
You can still look at the old site
through the cache
www.google.com
32. Meta Search Engine
What is a Meta Search Engine?
Search Engines that display results from
several sites at once
Dogpile--Google · Yahoo · Ask Jeeves
About · LookSmart · Overture
FindWhat
Hmmm…Dogpile inserts sites that have
paid for placement without telling into
results from various search engines
33. Safe Search
Google—SafeSearch
Filter—preferences
Yahoo—SafeSearch
Filter—preferences
AltaVista—Settings, Family Filter, can
set a password
34. Advanced Settings
Most search sites have a link for
advanced settings, so you don’t have
to remember the particular syntax
for a particular type of search