4. WEB CLAWER
• A web crawler (also known as a web spider or web robot) is a
program or automated script which browses the World Wide
Web in a methodical, automated manner.
• This process is called Web crawling or spidering.
• They keep an index of the words they find, and where they
find them forming a big database.
• The crawler will periodically return to the sites to check for
any information that has changed. The frequency with which
this happens is determined by the administrators of the
search engine.
5.
6. • Search engines are supposed to rank results based on either
popularity or relevancy but many studies have shown other
things going on.
• There could be political, economic or social biases. Companies
can pay to advertise and get a higher ranking . Politically
there could be search results that are removed because it
violates local laws.
• Fastest crawlers are able to traverse up to 10 million pages
per day.
7. • Invalid links can exist.
• Due to web crawler loading speed of website
decreases.
• To prevent a search engine from indexing the
webpage, the following line can be added:
• <META name= “ROBOTS” content=“NOINDEX”>
• Web crawlers are made up of any programing
languages like c++ or java or python.