Web Crawling and Indexing in Information Retrieval.pptx

Web crawling and
Indexing
Presented by:-
Amit Kumar
Ajit Kumar
Deepak Rathore

What is
Web crawling?
Crawling refers to following the links on a page to new pages,
and continuing to find and follow links on new pages to other
new pages.
The process of crawling needs to start somewhere. Google uses an
initial “seed list” of trusted websites that tend to link to many
other sites.
Crawling the Internet is a continual process for a search engine. It
never really stops.

Web Crawler
 A web crawler is an Internet bot that systematically
browses the World Wide Web.
 It is typically operated by search engines for the
purpose of Web indexing (web spidering).
 Web Crawler has a assigned job.
 Web Crawler examples : Googlebot, Bingbot, Yahoo
Slurp.

General
Web Crawler
Algorithm
Start with a list of initial URLs, called the seeds.
Start
Visit these URLs.
Visit
Retrieve required information from the page.
Retrieve
Identify all the hyperlinks on the page.
Identify
Add the links to the queue of URLs, called crawler frontier.
Add
Recursively visit the URLs from the crawler frontier.
Visit

Indexing and
Rendering
 Indexing is storing and organizing the information found on
the pages. The bot renders the code on the page in the same
way a browser does.
 Rendering is interpreting the HTML, CSS, and JavaScript on
the page to build the visual representation of exactly what
you see in your web browser.

Differences and
Importance
 What is the difference between crawling and indexing?
 Crawling is the discovery of pages and links that lead to more
pages.
 Indexing is storing, analyzing, and organizing the content and
connections between pages.
 Importance of Crawling and Indexing for your Website
 This is where your search engine optimization starts. If Google
can’t crawl your website, you won’t be included in any search
results. Make sure to check robots.txt.

Web Crawling and Indexing in Information Retrieval.pptx

Recommended

Recommended

More Related Content

Similar to Web Crawling and Indexing in Information Retrieval.pptx

Similar to Web Crawling and Indexing in Information Retrieval.pptx (20)

Recently uploaded

Recently uploaded (20)

Web Crawling and Indexing in Information Retrieval.pptx