SlideShare a Scribd company logo
1 of 6
Download to read offline
Web Crawler
●   Each search engine uses
    a crawler and spider.
●   A web crawler is a
    computer program that
    browses the WWW in a
    methodical.
●   A web spider is a kind of
    web crawler.
●   This process is called
    Web crawling or
    spidering.
●   Image source :
    http://www.codeproject.com/KB/IP/Crawler.aspx
Spider
 A spider is a program that crawls the Internet in
      a specific way for a specific purpose.
    Spiders are the basis for modern search
    engines, such as Google and AltaVista.
 These spiders automatically retrieve data from
the Web and pass it on to other applications that
 index the contents of the Web site for the best
                set of search terms.
 Source : http://www.ibm.com/developerworks/linux/library/l-spider/
Information Indexing
 Documents from an                  Indexing
                                    Software
                                                  Index
agent, are indexed by   Agents
an indexing software.                  Extract
                                       words or
                                      something    Database

                                 Documents

● Information is putted into a certain database
● There are many different types of indexing

● The kind of index built how the information will

be displayed.
Searching and Visiting

If you visit web pages related your searching
 keywords, you type those in a web page.



A particular search engine allow you to use
      several keywords for searching.
Searching

An engine searched Your keyword from the
database.
Results are returned by HTML document.
There are some additional information.
Visiting


If you are interested in a title of the result
page, you click the link and go to directly.
Search engines or databases do not store
the documents of the indexed sites.

More Related Content

What's hot

SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...CloudTechnologies
 
DomainTools Fingerprinting Threat Actors with Web Assets
DomainTools Fingerprinting Threat Actors with Web AssetsDomainTools Fingerprinting Threat Actors with Web Assets
DomainTools Fingerprinting Threat Actors with Web AssetsDomainTools
 
How To Build your own Custom Search Engine
How To Build your own Custom Search EngineHow To Build your own Custom Search Engine
How To Build your own Custom Search EngineRicha Budhraja
 
Recovered file 1
Recovered file 1Recovered file 1
Recovered file 1Uthara Iyer
 
Elastisearch ur own local google
Elastisearch   ur own local googleElastisearch   ur own local google
Elastisearch ur own local googleaseem agarwal
 
Winning SEO Using Schema Markup and Structured Data
Winning SEO Using Schema Markup and Structured DataWinning SEO Using Schema Markup and Structured Data
Winning SEO Using Schema Markup and Structured DataMarc Trimble
 
presentation-week10
presentation-week10presentation-week10
presentation-week10Ryo Watanabe
 
Building Windows Phone Database App Using MVVM Pattern
Building Windows Phone Database App Using MVVM PatternBuilding Windows Phone Database App Using MVVM Pattern
Building Windows Phone Database App Using MVVM PatternFiyaz Hasan
 

What's hot (13)

SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
 
DomainTools Fingerprinting Threat Actors with Web Assets
DomainTools Fingerprinting Threat Actors with Web AssetsDomainTools Fingerprinting Threat Actors with Web Assets
DomainTools Fingerprinting Threat Actors with Web Assets
 
gRSShopper
gRSShoppergRSShopper
gRSShopper
 
How To Build your own Custom Search Engine
How To Build your own Custom Search EngineHow To Build your own Custom Search Engine
How To Build your own Custom Search Engine
 
Salesforce connect
Salesforce connectSalesforce connect
Salesforce connect
 
Indexing repositories: Pitfalls & best practices
Indexing repositories: Pitfalls & best practicesIndexing repositories: Pitfalls & best practices
Indexing repositories: Pitfalls & best practices
 
Recovered file 1
Recovered file 1Recovered file 1
Recovered file 1
 
Elastisearch ur own local google
Elastisearch   ur own local googleElastisearch   ur own local google
Elastisearch ur own local google
 
Winning SEO Using Schema Markup and Structured Data
Winning SEO Using Schema Markup and Structured DataWinning SEO Using Schema Markup and Structured Data
Winning SEO Using Schema Markup and Structured Data
 
presentation-week10
presentation-week10presentation-week10
presentation-week10
 
Schema Tags In Seo
Schema Tags In SeoSchema Tags In Seo
Schema Tags In Seo
 
WebCrawler
WebCrawlerWebCrawler
WebCrawler
 
Building Windows Phone Database App Using MVVM Pattern
Building Windows Phone Database App Using MVVM PatternBuilding Windows Phone Database App Using MVVM Pattern
Building Windows Phone Database App Using MVVM Pattern
 

Similar to Week10 Web Presentation

Presentation 10all
Presentation 10allPresentation 10all
Presentation 10allguestaa4c059
 
EP3 Week10 Presentation
EP3 Week10 PresentationEP3 Week10 Presentation
EP3 Week10 Presentationguest4026aa5
 
presentation_GroupE
presentation_GroupEpresentation_GroupE
presentation_GroupEyucky
 
Design Issues for Search Engines and Web Crawlers: A Review
Design Issues for Search Engines and Web Crawlers: A ReviewDesign Issues for Search Engines and Web Crawlers: A Review
Design Issues for Search Engines and Web Crawlers: A ReviewIOSR Journals
 
An Intelligent Meta Search Engine for Efficient Web Document Retrieval
An Intelligent Meta Search Engine for Efficient Web Document RetrievalAn Intelligent Meta Search Engine for Efficient Web Document Retrieval
An Intelligent Meta Search Engine for Efficient Web Document Retrievaliosrjce
 
Week10
Week10Week10
Week10kenji
 
Introduction to Search Engine Optimization
Introduction to Search Engine OptimizationIntroduction to Search Engine Optimization
Introduction to Search Engine OptimizationGauravPrajapati39
 
Search engines by Gulshan K Maheshwari(QAU)
Search engines by Gulshan  K Maheshwari(QAU)Search engines by Gulshan  K Maheshwari(QAU)
Search engines by Gulshan K Maheshwari(QAU)GulshanKumar368
 
Notes for
Notes forNotes for
Notes for9pallen
 

Similar to Week10 Web Presentation (20)

Presentation 10all
Presentation 10allPresentation 10all
Presentation 10all
 
Week10
Week10Week10
Week10
 
Week10
Week10Week10
Week10
 
Week10
Week10Week10
Week10
 
EP3 Week10 Presentation
EP3 Week10 PresentationEP3 Week10 Presentation
EP3 Week10 Presentation
 
presentation_GroupE
presentation_GroupEpresentation_GroupE
presentation_GroupE
 
How search engine work ppt
How search engine work pptHow search engine work ppt
How search engine work ppt
 
Design Issues for Search Engines and Web Crawlers: A Review
Design Issues for Search Engines and Web Crawlers: A ReviewDesign Issues for Search Engines and Web Crawlers: A Review
Design Issues for Search Engines and Web Crawlers: A Review
 
G017254554
G017254554G017254554
G017254554
 
An Intelligent Meta Search Engine for Efficient Web Document Retrieval
An Intelligent Meta Search Engine for Efficient Web Document RetrievalAn Intelligent Meta Search Engine for Efficient Web Document Retrieval
An Intelligent Meta Search Engine for Efficient Web Document Retrieval
 
Search engine
Search engineSearch engine
Search engine
 
Week10
Week10Week10
Week10
 
Week10
Week10Week10
Week10
 
Week10
Week10Week10
Week10
 
Search engines
Search enginesSearch engines
Search engines
 
Introduction to Search Engine Optimization
Introduction to Search Engine OptimizationIntroduction to Search Engine Optimization
Introduction to Search Engine Optimization
 
Search engines by Gulshan K Maheshwari(QAU)
Search engines by Gulshan  K Maheshwari(QAU)Search engines by Gulshan  K Maheshwari(QAU)
Search engines by Gulshan K Maheshwari(QAU)
 
How Google Works
How Google WorksHow Google Works
How Google Works
 
Notes for
Notes forNotes for
Notes for
 
Week12presentation
Week12presentationWeek12presentation
Week12presentation
 

Recently uploaded

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Recently uploaded (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Week10 Web Presentation

  • 1. Web Crawler ● Each search engine uses a crawler and spider. ● A web crawler is a computer program that browses the WWW in a methodical. ● A web spider is a kind of web crawler. ● This process is called Web crawling or spidering. ● Image source : http://www.codeproject.com/KB/IP/Crawler.aspx
  • 2. Spider A spider is a program that crawls the Internet in a specific way for a specific purpose. Spiders are the basis for modern search engines, such as Google and AltaVista. These spiders automatically retrieve data from the Web and pass it on to other applications that index the contents of the Web site for the best set of search terms. Source : http://www.ibm.com/developerworks/linux/library/l-spider/
  • 3. Information Indexing Documents from an Indexing Software Index agent, are indexed by Agents an indexing software. Extract words or something Database Documents ● Information is putted into a certain database ● There are many different types of indexing ● The kind of index built how the information will be displayed.
  • 4. Searching and Visiting If you visit web pages related your searching keywords, you type those in a web page. A particular search engine allow you to use several keywords for searching.
  • 5. Searching An engine searched Your keyword from the database. Results are returned by HTML document. There are some additional information.
  • 6. Visiting If you are interested in a title of the result page, you click the link and go to directly. Search engines or databases do not store the documents of the indexed sites.