The document discusses search engines and how they work to index the vast amount of information on the web. It explains that search engines build indexes by having software agents crawl the web, download pages, and extract key information to build searchable databases. It also notes that search engines compete based on factors like the size of their indexes, speed of searches, and relevance of results. Finally, it provides statistics on the size of indexes and recent indexing activity for some major search engines like Google, FAST, AltaVista, and others.
Design Issues for Search Engines and Web Crawlers: A ReviewIOSR Journals
Abstract: The World Wide Web is a huge source of hyperlinked information contained in hypertext documents.
Search engines use web crawlers to collect these web documents from web for storage and indexing. The prompt
growth of the World Wide Web has posed incomparable challenges for the designers of search engines and web
crawlers; that help users to retrieve web pages in a reasonable amount of time. In this paper, a review on need
and working of a search engine, and role of a web crawler is being presented.
Key words: Internet, www, search engine, types, design issues, web crawlers.
Design Issues for Search Engines and Web Crawlers: A ReviewIOSR Journals
Abstract: The World Wide Web is a huge source of hyperlinked information contained in hypertext documents.
Search engines use web crawlers to collect these web documents from web for storage and indexing. The prompt
growth of the World Wide Web has posed incomparable challenges for the designers of search engines and web
crawlers; that help users to retrieve web pages in a reasonable amount of time. In this paper, a review on need
and working of a search engine, and role of a web crawler is being presented.
Key words: Internet, www, search engine, types, design issues, web crawlers.
International conference On Computer Science And technologyanchalsinghdm
ICGCET 2019 | 5th International Conference on Green Computing and Engineering Technologies. The conference will be held on 7th September - 9th September 2019 in Morocco. International Conference On Engineering Technology
The conference aims to promote the work of researchers, scientists, engineers and students from across the world on advancement in electronic and computer systems.
This is a presentation that I did for the Enterprise Search Summit West 2008 that has been amended for a Web Project Management class at the University of Washington
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...inventionjournals
Information is overloaded in the Internet due to the unstable growth of information and it makes information search as complicate process. Recommendation System (RS) is the tool and largely used nowadays in many areas to generate interest items to users. With the development of e-commerce and information access, recommender systems have become a popular technique to prune large information spaces so that users are directed toward those items that best meet their needs and preferences. As the exponential explosion of various contents generated on the Web, Recommendation techniques have become increasingly indispensable. Web recommendation systems assist the users to get the exact information and facilitate the information search easier. Web recommendation is one of the techniques of web personalization, which recommends web pages or items to the user based on the previous browsing history. But the tremendous growth in the amount of the available information and the number of visitors to web sites in recent years places some key challenges for recommender system. The recent recommender systems stuck with producing high quality recommendation with large information, resulting unwanted item instead of targeted item or product, and performing many recommendations per second for millions of user and items. To avoid these challenges a new recommender system technologies are needed that can quickly produce high quality recommendation, even for a very large scale problems. To address these issues we use two recommender system process using fuzzy clustering and collaborative filtering algorithms. Fuzzy clustering is used to predict the items or product that will be accessed in the future based on the previous action of user browsers behavior. Collaborative filtering recommendation process is used to produce the user expects result from the result of fuzzy clustering and collection of Web Database data items. Using this new recommendation system, it results the user expected product or item with minimum time. This system reduces the result of unrelated and unwanted item to user and provides the results with user interested domain.
IST 561 Spring 2007--Session7, Sources of InformationD.A. Garofalo
Presentation provides a brief overview of Internet searching, Boolean operators, and internet resources of use to libraries in providing reference services.
Uw Digital Communications Social Media Is Not SearchMarianne Sweeny
I had the pleasure of speaking to one of the Digital Communication classes at the University of Washington on my favorite topic, why social media will never replace search as an information finding medium. Those students were wicked smart and I walked away learning a lot myself.
International conference On Computer Science And technologyanchalsinghdm
ICGCET 2019 | 5th International Conference on Green Computing and Engineering Technologies. The conference will be held on 7th September - 9th September 2019 in Morocco. International Conference On Engineering Technology
The conference aims to promote the work of researchers, scientists, engineers and students from across the world on advancement in electronic and computer systems.
This is a presentation that I did for the Enterprise Search Summit West 2008 that has been amended for a Web Project Management class at the University of Washington
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...inventionjournals
Information is overloaded in the Internet due to the unstable growth of information and it makes information search as complicate process. Recommendation System (RS) is the tool and largely used nowadays in many areas to generate interest items to users. With the development of e-commerce and information access, recommender systems have become a popular technique to prune large information spaces so that users are directed toward those items that best meet their needs and preferences. As the exponential explosion of various contents generated on the Web, Recommendation techniques have become increasingly indispensable. Web recommendation systems assist the users to get the exact information and facilitate the information search easier. Web recommendation is one of the techniques of web personalization, which recommends web pages or items to the user based on the previous browsing history. But the tremendous growth in the amount of the available information and the number of visitors to web sites in recent years places some key challenges for recommender system. The recent recommender systems stuck with producing high quality recommendation with large information, resulting unwanted item instead of targeted item or product, and performing many recommendations per second for millions of user and items. To avoid these challenges a new recommender system technologies are needed that can quickly produce high quality recommendation, even for a very large scale problems. To address these issues we use two recommender system process using fuzzy clustering and collaborative filtering algorithms. Fuzzy clustering is used to predict the items or product that will be accessed in the future based on the previous action of user browsers behavior. Collaborative filtering recommendation process is used to produce the user expects result from the result of fuzzy clustering and collection of Web Database data items. Using this new recommendation system, it results the user expected product or item with minimum time. This system reduces the result of unrelated and unwanted item to user and provides the results with user interested domain.
IST 561 Spring 2007--Session7, Sources of InformationD.A. Garofalo
Presentation provides a brief overview of Internet searching, Boolean operators, and internet resources of use to libraries in providing reference services.
Uw Digital Communications Social Media Is Not SearchMarianne Sweeny
I had the pleasure of speaking to one of the Digital Communication classes at the University of Washington on my favorite topic, why social media will never replace search as an information finding medium. Those students were wicked smart and I walked away learning a lot myself.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
1. Introduction to Informatics - Fall 02
Accessing digital information: Search engines
I.The problem
• How to find the needle in the haystack
II. How search engines work
• Building the index
IV. Types of search engines
V. Problems with search engines
2. Introduction to Informatics - Fall 02
The Problem
The WWW contains more than 2.5 billion pages with 7.3
million pages added each day
The surface Web contains 19 terabytes (trillions of
bytes)
This where most of our stuff is
There are 7,500 terabytes hidden in the "deep" Web
This is largely proprietary information, dynamically
generated pages, or pages behind firewwalls
http://www.howstuffworks.com/news-item127.htm
Without the URLs of the particular pages you want, you
must rely on search engines to uncover potentially
relevant pages
3. Introduction to Informatics - Fall 02
The web lacks bibliographic control standards that we
take for granted in the print world
There is no equivalent to the ISBN to uniquely identify a
document
There is no standard system of cataloguing or
classification, analogous those developed by the
Library of Congress,
There is no central catalogue of the Web's holdings
Many documents lack the name of the author and the
date of publication
Updating? Version control? Not likely!
4. Introduction to Informatics - Fall 02
The net is not a digital library
It was not designed to support the organized
publication and retrieval of information
It has become a chaotic repository for the output of the
hundreds of thousands of digital publishers
It is filled with an amazing variety of digital artifacts
The ephemeral mixes everywhere with works of lasting
importance
The librarian’s classification and selection skills must be
complemented by the computer scientist’s ability to
automate the task of indexing, storing, and providing
access to information
Lynch (1999)
http://www.sciam.com/0397issue/0397lynch.html
5. Introduction to Informatics - Fall 02
How can we find what we want when we want it?
This is becoming an increasingly important problem for
people in the information professions
Strategies:
Follow links from page to page, hoping that you will
stumble across the pages that will help you answer the
question
Maintain a personal collection of bookmarks
Use search engines
~85% of all web sessions begin with or involve a
search
6. Introduction to Informatics - Fall 02
Search engines are a critical web tool
The Web offers the choice of hundreds of different
search tools
The problem is that each has its own database,
command language, search capabilities, and method of
displaying results
Each covers a different portion of the web with some
overlap
This means that you have to learn a variety of search
tools and develop effective search techniques to take
advantage of Web resources
7. Introduction to Informatics - Fall 02
Search engine coverage relative to the estimated size of the
publicly indexable web has decreased substantially since
December 97
No engine indexes more than about 16% of the web
Combined coverage of eleven major search engines is 42%
of the Web
Overlap between individual search engines is low, with
approximately 40% of a given search engine’s content
unique
The biggest search engines are currently FAST and
Google, with over 600 million pages indexed
http://healthlinks.washington.edu/hsl/liaisons/stanna/navweb/nav2.html
8. Introduction to Informatics - Fall 02
Search engines are typically more likely to index sites
that have more links to them (more 'popular' sites)
Google does this
Google interprets a link from page A to page B as a
vote, by page A, for page B
But, Google looks at more than the sheer volume of
votes, or links a page receives
It also analyzes the page that casts the vote
Votes cast by pages that are themselves
“important” weigh more heavily and help to make
other pages “important”
http://www.google.com/technology/index.html
9. Introduction to Informatics - Fall 02
Other difficulties with search engine coverage
Search engines are more likely to index US sites than non-
US sites (AltaVista is an exception), and more likely to
index .com sites than .edu sites
Indexing of new or modified pages by just one of the major
search engines can take months
The pages that one engine indexes do not have extensive
overlap with other indexes’ databases
Lawrence and Giles (1999)
http://www.wwwmetrics.com/
10. Introduction to Informatics - Fall 02
85% of users use search engines to find information (GVU
survey)
We use search engines to locate and buy goods and
research many decisions
Search engines are currently lacking in timeliness and
comprehensiveness and do not index sites equally
The current state of search engines can be compared to a
phone book which is
Updated irregularly
Biased toward listing more popular information
Missing many pages
Filled with duplicate listings
11. Introduction to Informatics - Fall 02
Search engine indexing and ranking may have economic,
social, political, and scientific effects
Indexing and ranking of online stores can substantially
effect economic viability
Some search engines charge for placement and
companies are willing to pay
Delayed indexing of scientific research can lead to the
duplication of work
Delayed or biased indexing may affect social or political
decisions
12. Introduction to Informatics - Fall 02
What do search engines do?
They attempt to index and provide access to the “relevant
web”
This is defined differently by different engines
It ranges from brute force indexing to the use of
algorithms to gauge relevance and popularity
Search engines/tools have four components
The collection of entries for their databases
The structure of their database
The search process
The interface
13. Introduction to Informatics - Fall 02
Data collection is done by
Humans, who review and index in the employ of the
search engine company (Yahoo!)
Humans, by self submission
Software
Software collection agents include automated robot
wanderers, spiders, harvesters, bots, and crawlers
They roam the internet (mostly www, gopher and ftp
sites), and bring back copies of resources
This actually means systematically downloading pages
and following links
They sort, index and create database entries out of them
14. Introduction to Informatics - Fall 02
The search component concerns the end user
It involves the interface between the human searcher and
the indexed database of resources
Several factors determine the success of a search engine:
The size of the database
The content and coverage of the database
The currency of the entries and frequency of updating
The elimination of redundancy and dead links
The speed of searching
The availability of advanced search features
The interface design and ease of use
15. Introduction to Informatics - Fall 02
Search engines provide “electronic egalitarianism”
Indexing and cataloguing tools are highly democratic
They categorize information differently than human
indexers do
Machine-based approaches to information gathering,
organization and retrieval provide uniform and equal
access to all the information on the Net
This is a source of one of our problems with search engines
We type in a search request and receive thousands of
URLs in response
These results frequently contain references to irrelevant
Web sites while leaving out others that hold important
material
16. Introduction to Informatics - Fall 02
Accessing digital information: Search engines
I.The problem
• How to find the needle in the haystack
II. How search engines work
• Building the index
IV. Types of search engines
V. Problems with search engines
17. Introduction to Informatics - Fall 02
How search engines work
Many search engines use two interdependent approaches
Browsing through subject trees and hierarchies
Keyword searching of an extensive database
A subject tree provides a structured and organized
hierarchy of categories for browsing for information by
subject
Under each category and/or sub- category, links to
appropriate Web pages are listed
Web pages are assigned categories either by the author
or by subject tree administrators
Many subject trees also have their own keyword
searchable indexes
18. Introduction to Informatics - Fall 02
Search tools with elaborate subject trees present links
with brief annotations
Examples include Yahoo, Galaxy, the WWW Virtual
Library)
Search engines allow keyword searching of indexes
These are automatically compiled by robots and spiders,
which are constantly collecting net resources
Searchers enter keywords to query the index
Some allow Boolean operators and other advanced
features
Web pages and other Internet resources that satisfy the
query are identified and listed
19. Introduction to Informatics - Fall 02
Search engines compete on the
Size of their indexes
Frequency of updating the index
Range of advanced search options
Speed of returning a result set
Result set presentation
Relevance of the items included in a result set
Design of the interface
Overall ease of use
Range of additional services offered
20. Introduction to Informatics - Fall 02
Engine Size Expected Actual Rank
Score Score
Google 560 1.0 1.0 1
FAST 340 2.0 1.8 2
Northern Light 265 3.0 2.3 3
HotBot 110 4.0 2.3 3
iWon 110 4.0 2.3 3
AltaVista 350 2.0 2.5 4
Yahoo-Google 560 1.0 3.0 5
Excite 250 3.0 3.0 5
Yahoo-Inktomi 110 4.0 4.3 6
Data from: July 2002
Searchengine Showdown
http://www.searchenginewatch.com/sereport/00/07-sizetest.html
Claimed size and “obscure search” test results
21. Introduction to Informatics - Fall 02
How big are they?
Google
FAST
Alta
Vista
Inktomi
Northern
Light
SearchEnginewatch, 7/02
http://www.searchenginewatch.com/reports/sizes.html
22. Introduction to Informatics - Fall 02
How big are they?
SearchEnginewatch, 7/02
http://www.searchenginewatch.com/reports/sizes.html
Google FAST Alta Vista
Inktomi Northern Light
23. Introduction to Informatics - Fall 02
Recent activity (indexing)
SearchEnginewatch, 7/02
http://www.searchenginewatch.com/reports/sizes.html
Google FAST Alta Vista
Inktomi Northern Light
24. Introduction to Informatics - Fall 02
Search engines are powered by robots, indexing software,
and “ontologists” who classify, sort, and arrange the Web
into a searchable matrix
The most popular search engines are always among
the most visited sites on the Net
Competition is high for the advertising dollars that
keep these search tools free of charge
Despite their similar approaches to scanning the Internet,
search engines don't always turn up the same results
Depending on the type of search being conducted, one
engine might give you more satisfactory results than
another
25. Introduction to Informatics - Fall 02
Three methods for indexing web resources
Full text index
Includes all terms and URLs
Uses filters to remove words not important to searching
Keyword index
Based on the location and frequency of words and
phrases
If a term is mentioned only once or twice, it won’t be
indexed
Human index
Created by individuals who review pages and select the
best words and phrases to describe their content
26. Introduction to Informatics - Fall 02
Engines use index searches, concept searches, or
browsing
Index searching
Many search engines use this method because it casts a
wider net than a catalog does
Results come from a dynamic index of pages and use an
algorithm to sort documents to determine relevance
For instance, the number of times a key-word appears as
well as its proximity to the top of the document
They don’t recognize context, synonyms, or homonyms
Searching "beat" returns Ginsberg and Burroughs but
also pages on metronomes, raves, and gingersnaps
There are problems of redundancy and dead links
27. Introduction to Informatics - Fall 02
Concept searching
With this type of search, your search term is treated as a
concept and not a keyword
If you type a word in the search box, you search for that
word, other forms of the word, and synonyms
The search also includes other words that are highly
statistically related to that word
A concept search looks for ideas related to a literal
query
Excite uses this strategy
28. Introduction to Informatics - Fall 02
Browsing services exist in great numbers on the net
These are systematically grouped hotlists, starting points
or systematic lists of interesting resources
These pages are typically smaller and well-maintained
The browsing structures typically do not use a controlled
system of knowledge-structuring,or an established
classification system
Selection, classification and description of the
resources are made by the list owner using
idiosyncratic criteria
Browsing systems covering rapidly changing areas are
more difficult to maintain because they often don’t have
automatic mechanisms for rapid and continual updating
29. Introduction to Informatics - Fall 02
Accessing digital information: Search engines
I.The problem
• How to find the needle in the haystack
II. How search engines work
• Building the index
IV. Types of search engines
V. Problems with search engines
30. Introduction to Informatics - Fall 02
What are the different types of search engines?
Single, niche, and multiple-threaded search engines
Single search engines
These engines operate alone
Your query is run against a single database and/or index
A directory search tool allows searches by subject matter
It is a hierarchical search that starts with a general
subject heading and follows with more specific sub-
headings.
The information is reviewed and indexed by humans
However, the number of reviews are limited.
Yahoo is an example
31. Introduction to Informatics - Fall 02
Niche search engines
These engines are like single search engines, except
that they cover a restricted subset of resources
Examples might include engines for business,
engineering, physics, or government information
A very restricted version of a niche engine only allows
you to search that site
Northern Light is a good example
32. Introduction to Informatics - Fall 02
Multiple-threaded search engines
These are also called meta-search engines
These engines submit your query to two or more search
engines simultaneously
They gather and display the results as a single page
These engines compete on the basis of the number and
variety of engines they allow you to search
These engines are becoming more popular
One problem is the amount of redundancy in the returns
33. Introduction to Informatics - Fall 02
Accessing digital information: Search engines
I.The problem
• How to find the needle in the haystack
II. How search engines work
• Building the index
IV. Types of search engines
V. Problems with search engines
34. Introduction to Informatics - Fall 02
What are the problems with search engines?
There are weaknesses and problems common to all
attempts to index the Internet
These are still more important than the limitations of
single search services
The theoretical problem of indexing virtual hypertext
It is not economical and not even possible to index all
information on the Internet in “full text”
It is necessary to define the limit of documents and
information units in order to allow a target-oriented
access while searching
In comparison with the world of printed information this
involves considerable difficulties
35. Introduction to Informatics - Fall 02
The information units are considerably smaller and less
defined
“Containers” like a book, a series, a journal title or an
issue do not occur often
The information units ranges in size from a whole
server or service to single text strings or icons
The mix of different types of information on the net make
uniform and homogenous indexing and searching
impossible
Document types include: directories, lists, menus, full-
text of every-day electronic mail, scientific articles and
books, field-structured database records, software,
audio, video, images and numerical information
36. Introduction to Informatics - Fall 02
Considering the great number of authors on the net and
their different abilities, the quality of input into the search
services varies a great deal
Often it is so poor that the search-results are seriously
influenced in a negative way
There is incorrect, uncontrolled HTML coding and
incomplete use of important content-describing metadata
like titles or keywords
Incorrect functional text mark up and abuse of the same
for layout purposes are occurring as well as the reverse,
abuse of layout markings as functional characterization
37. Introduction to Informatics - Fall 02
Other problems include
Terminological weaknesses, incorrect formulation of
titles and headings, and ambiguity
Inability to distinguish between permanent and
temporary documents
There are problems with harvesting methods, indexing
programs, IR-methods, and user interfaces
Performance problems
38. Introduction to Informatics - Fall 02
Data from: Aug. 14, 2001
Searchengine Showdown
http://www.notess.com/search/stats/dead.shtml
Search engine % Dead links % 400 errors
Alta vista 13.7% 9.3%
Excite 8.7% 5.7%
Northern Light 5.7% 2.0%
Google 4.3% 3.3%
Hotbot 2.3% 2.0%
Fast 2.3% 1.8%
MSN Inktomi 1.7% 1.0%
Anzwers 1.3% .07%
Dead links on search engines