SlideShare a Scribd company logo
1 of 7
Web crawling and
Indexing
Presented by:-
Amit Kumar
Ajit Kumar
Deepak Rathore
What is
Web crawling?
Crawling refers to following the links on a page to new pages,
and continuing to find and follow links on new pages to other
new pages.
The process of crawling needs to start somewhere. Google uses an
initial “seed list” of trusted websites that tend to link to many
other sites.
Crawling the Internet is a continual process for a search engine. It
never really stops.
Web Crawler
 A web crawler is an Internet bot that systematically
browses the World Wide Web.
 It is typically operated by search engines for the
purpose of Web indexing (web spidering).
 Web Crawler has a assigned job.
 Web Crawler examples : Googlebot, Bingbot, Yahoo
Slurp.
General
Web Crawler
Algorithm
Start with a list of initial URLs, called the seeds.
Start
Visit these URLs.
Visit
Retrieve required information from the page.
Retrieve
Identify all the hyperlinks on the page.
Identify
Add the links to the queue of URLs, called crawler frontier.
Add
Recursively visit the URLs from the crawler frontier.
Visit
Indexing and
Rendering
 Indexing is storing and organizing the information found on
the pages. The bot renders the code on the page in the same
way a browser does.
 Rendering is interpreting the HTML, CSS, and JavaScript on
the page to build the visual representation of exactly what
you see in your web browser.
Indexing and
Rendering
Differences and
Importance
 What is the difference between crawling and indexing?
 Crawling is the discovery of pages and links that lead to more
pages.
 Indexing is storing, analyzing, and organizing the content and
connections between pages.
 Importance of Crawling and Indexing for your Website
 This is where your search engine optimization starts. If Google
can’t crawl your website, you won’t be included in any search
results. Make sure to check robots.txt.

More Related Content

Similar to Web Crawling and Indexing in Information Retrieval.pptx

Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
Rujata Patil
 
Google webmaster tools
Google webmaster toolsGoogle webmaster tools
Google webmaster tools
ianolsonsb
 
SEO Training in Hyderabad | SEO Classes in Hyderbad | SEO Coaching in Hyde...
SEO Training in Hyderabad |  SEO  Classes in Hyderbad | SEO Coaching in  Hyde...SEO Training in Hyderabad |  SEO  Classes in Hyderbad | SEO Coaching in  Hyde...
SEO Training in Hyderabad | SEO Classes in Hyderbad | SEO Coaching in Hyde...
Prasad Reddy
 

Similar to Web Crawling and Indexing in Information Retrieval.pptx (20)

Crawling and Indexing
Crawling and IndexingCrawling and Indexing
Crawling and Indexing
 
The Best Guide to SEO
The Best Guide to SEOThe Best Guide to SEO
The Best Guide to SEO
 
Introduction to SEO Basics
Introduction to SEO BasicsIntroduction to SEO Basics
Introduction to SEO Basics
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
 
15 minutes seo audit
15 minutes seo audit15 minutes seo audit
15 minutes seo audit
 
Search Engine Optimization - Fundamentals - SEO
Search Engine Optimization - Fundamentals - SEOSearch Engine Optimization - Fundamentals - SEO
Search Engine Optimization - Fundamentals - SEO
 
Seo by Google
Seo by GoogleSeo by Google
Seo by Google
 
Beginners guide to seo
Beginners guide to seoBeginners guide to seo
Beginners guide to seo
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
 
Google webmaster tools
Google webmaster toolsGoogle webmaster tools
Google webmaster tools
 
Google webmaster tools
Google webmaster toolsGoogle webmaster tools
Google webmaster tools
 
Lvr ppt
Lvr pptLvr ppt
Lvr ppt
 
SEO Training in Hyderabad | SEO Classes in Hyderbad | SEO Coaching in Hyde...
SEO Training in Hyderabad |  SEO  Classes in Hyderbad | SEO Coaching in  Hyde...SEO Training in Hyderabad |  SEO  Classes in Hyderbad | SEO Coaching in  Hyde...
SEO Training in Hyderabad | SEO Classes in Hyderbad | SEO Coaching in Hyde...
 
SEO
SEOSEO
SEO
 
Seo
Seo Seo
Seo
 
Crawl optimization - ( How to optimize to increase crawl budget)
Crawl optimization - ( How to optimize to increase crawl budget)Crawl optimization - ( How to optimize to increase crawl budget)
Crawl optimization - ( How to optimize to increase crawl budget)
 
SEO (search engine optimization)
SEO (search engine optimization)SEO (search engine optimization)
SEO (search engine optimization)
 
Effective Searching Policies for Web Crawler
Effective Searching Policies for Web CrawlerEffective Searching Policies for Web Crawler
Effective Searching Policies for Web Crawler
 
Search engine
Search engineSearch engine
Search engine
 
Introduction to Search Engine Optimization
Introduction to Search Engine OptimizationIntroduction to Search Engine Optimization
Introduction to Search Engine Optimization
 

Recently uploaded

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 

Recently uploaded (20)

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 

Web Crawling and Indexing in Information Retrieval.pptx

  • 1. Web crawling and Indexing Presented by:- Amit Kumar Ajit Kumar Deepak Rathore
  • 2. What is Web crawling? Crawling refers to following the links on a page to new pages, and continuing to find and follow links on new pages to other new pages. The process of crawling needs to start somewhere. Google uses an initial “seed list” of trusted websites that tend to link to many other sites. Crawling the Internet is a continual process for a search engine. It never really stops.
  • 3. Web Crawler  A web crawler is an Internet bot that systematically browses the World Wide Web.  It is typically operated by search engines for the purpose of Web indexing (web spidering).  Web Crawler has a assigned job.  Web Crawler examples : Googlebot, Bingbot, Yahoo Slurp.
  • 4. General Web Crawler Algorithm Start with a list of initial URLs, called the seeds. Start Visit these URLs. Visit Retrieve required information from the page. Retrieve Identify all the hyperlinks on the page. Identify Add the links to the queue of URLs, called crawler frontier. Add Recursively visit the URLs from the crawler frontier. Visit
  • 5. Indexing and Rendering  Indexing is storing and organizing the information found on the pages. The bot renders the code on the page in the same way a browser does.  Rendering is interpreting the HTML, CSS, and JavaScript on the page to build the visual representation of exactly what you see in your web browser.
  • 7. Differences and Importance  What is the difference between crawling and indexing?  Crawling is the discovery of pages and links that lead to more pages.  Indexing is storing, analyzing, and organizing the content and connections between pages.  Importance of Crawling and Indexing for your Website  This is where your search engine optimization starts. If Google can’t crawl your website, you won’t be included in any search results. Make sure to check robots.txt.