SlideShare a Scribd company logo
By
Vikram Parmar (110173116010)
Hitesh Mavani (100170116023)

Internal Guide:
Mr. Safvan Vahora
Abstract


Now a day there are supposed to many web sites which
provide student details as well as college details along with
rank and fees structure, this sites have legal statement for
providing all this kind of information to the world.



Our project web crawler with seo analysis actually providing
information from that sites with chronological manner where
student can find his/her information about results as well as
college details and he/she also able to compare result with
another student via proving unique number.



The system will be build using Microsoft Technology (vb.NET,
C#) and Microsoft SQL Server Database is as the RDBMS.



With the help of SEO skill the most of search engine can able
to find out data from our site and when any student wanted
information about their result he/she need to type registration
number or enrollment number, Roll number directly on the
search engine etc..



We are going to develop Desktop application for harvesting
data and Web application for providing data online for
student to search their data.
Objective


Web scraping is the process of extracting and creating a
structured representation of data from a web site. HTML, the markup language used to structure data on webpages, is subject to
change when for instance the look-and-feel is updated. Since
current techniques for web scraping are based on the mark-up, a
change may lead to the extraction of incorrect data.



Web crawling with seo analysis shall provide many services that is
specially subjected to students like- Checking results, send results
to the friend email address, downloading exam related materials
as well as some paid eBooks from some other popular web sites
like flipkart and amazon.



Our main objectives is to scrap (Automated Copy) data from
other sites and providing it to the student who need to checking
results as well as comparing it to the other student.
Introduction


A Web crawler is one type of bot or software agent. In general, it
starts with a list of URLs to visit, called the seeds. As the crawler visits
these URLs, it identifies all the hyperlinks in the page and adds
them to the list of URLs to visit, called the crawl frontier.



On the other hand our system is focuses on harvesting data that
resided on the live page; this process is starting with the initial URL
of the website and then there is a manually need of checking
which, how many pages going to be harvesting.



Harvesting of data is kind of illegal activity but now a day no one
wanted to do copy and paste data in spread sheets and any
other types of documentation format so just once write copy for all
kind of technology is made here.
Scope


Web crawling with seo analysis is both desktop base as well as
web based system; in desktop base there is no scope for user
it’s only stated for admin level while in other hand of web
based system is totally user biased and also for admin to add
and remove user.



In desktop application only admin shall generate student
unique number in automated manner and then it’s
automatically stored in local server database.



If any web sites having Captcha code for preventing bots then
it’s not possible to scrap those pages. So the one of the most
critical limitation of our project is to ignore the site that secured
with the automated robot preventing security.



Another main limitation of our system we have to manually
check it out which pages we are going to utilized for
scrapping data purpose however our crawler first determine
all the link of the targeted site and store it in a file.
Technology Used
o













Application architecture
.NET Technology
Programming Language
vb.net ,C#
Testing Browsers
Mozilla Firefox, Google chrome.
Internet technology

Google Analytics,Webmaster,Directory submission.
Database
Sql Server 2008
Operating System
Windows xp or onwards
Module


There are many three module with two systems.

1)Standalone Application


Admin

2)Web Application


Admin



Student
Standalone Application


Admin is one who can generate unique number in automated
manner and scrapper:



Generate Automated Unique Number .



Able crawl Link of targeted web site.



Able to Harvest Data From the targeted sites.



Make Database from the scrapped sites.
Desktop application


Admin module: is for insert/update/delete and save data when
required change in databases as well as whole site because
our system is totally flying where change is need on regular
basis.



Registered user: Here registered user can get daily update via
out automated system so user can get into touch.



Visitor: Here visitor module is stated for just seeing data from our
site there is no purpose for get into touch with our system.
Use case diagram
Context level DFD
1st level DFD
2st level DFD
E-R diagram
Expected Outcome


Student can find their results.



Student can compare their results with other
student of the same class.



Student can also fine exam based materials .



Student can send results to his/her friend via our
system.



Student can download exam time table.



Student can also find the project of other student
of the globe.



Student can be ask the question related to the
community.
Conclusion



Web crawler has met our goals to make web scraping easy
and automatic retraining possible. It has proved a convenient
tool for extracting data from the web. Site Scraper has been
tested over deferent domains with high effectiveness. We
believe that Site scraper can provide a robust and flexible
solution for the problems of dealing with web data.
Reference


Roger. S. Pressman, Software Engineering, TMH 2005.



Object Oriented Modeling and Design with UML-Michael
Blaha, James Rambaugh 2nd edition.



Web Application Development by
Technical publications.



http://www.asp.net

A. A.

Puntambekar
THANK YOU

More Related Content

What's hot

Online shopping Report
Online shopping ReportOnline shopping Report
Online shopping ReportPragnya Dash
 
Final Year Project Presentation Android base Book Store
Final Year Project Presentation Android base Book StoreFinal Year Project Presentation Android base Book Store
Final Year Project Presentation Android base Book StoreSaad Abbasi
 
Web based online shopping system Presentation slide
Web based online shopping system Presentation  slideWeb based online shopping system Presentation  slide
Web based online shopping system Presentation slideRakibul Hasan Pranto
 
Final Year Project BCA Presentation on Pic-O-Stica
Final Year Project BCA Presentation on Pic-O-SticaFinal Year Project BCA Presentation on Pic-O-Stica
Final Year Project BCA Presentation on Pic-O-SticaSharath Raj
 
Web scraping
Web scrapingWeb scraping
Web scrapingSelecto
 
Blood Bank Management System
Blood Bank Management SystemBlood Bank Management System
Blood Bank Management SystemSakibhasan63
 
Ngo management system.
Ngo management system.Ngo management system.
Ngo management system.PallaviKadam
 
Onlineline shopping Yash Bazaar.com
Onlineline shopping Yash Bazaar.comOnlineline shopping Yash Bazaar.com
Onlineline shopping Yash Bazaar.comTmu
 
Online Shop Project Report
Online Shop Project ReportOnline Shop Project Report
Online Shop Project ReportJayed Imran
 
Online Shopping based on ASP .NET
Online Shopping based on ASP .NET Online Shopping based on ASP .NET
Online Shopping based on ASP .NET Pragnya Dash
 
Blood bank-data-abstract-php-project
Blood bank-data-abstract-php-projectBlood bank-data-abstract-php-project
Blood bank-data-abstract-php-projectnarii
 
E-commerce (System Analysis and Design)
E-commerce (System Analysis and Design)E-commerce (System Analysis and Design)
E-commerce (System Analysis and Design)Nazmul Hyder
 
Blood donor managment system
Blood donor managment systemBlood donor managment system
Blood donor managment systemAfsarah Jahin
 
Online Book Store Project Presentation by Moin Khan & Sejda E Jannat
Online Book Store Project Presentation by Moin Khan & Sejda E JannatOnline Book Store Project Presentation by Moin Khan & Sejda E Jannat
Online Book Store Project Presentation by Moin Khan & Sejda E JannatSejdaEJannat
 
online-shopping-documentation-srs for TYBSCIT sem 6
 online-shopping-documentation-srs for TYBSCIT sem 6 online-shopping-documentation-srs for TYBSCIT sem 6
online-shopping-documentation-srs for TYBSCIT sem 6YogeshDhamke2
 

What's hot (20)

Online shopping Report
Online shopping ReportOnline shopping Report
Online shopping Report
 
Final Year Project Presentation Android base Book Store
Final Year Project Presentation Android base Book StoreFinal Year Project Presentation Android base Book Store
Final Year Project Presentation Android base Book Store
 
Web based online shopping system Presentation slide
Web based online shopping system Presentation  slideWeb based online shopping system Presentation  slide
Web based online shopping system Presentation slide
 
Bookcart.com
Bookcart.comBookcart.com
Bookcart.com
 
Final Year Project BCA Presentation on Pic-O-Stica
Final Year Project BCA Presentation on Pic-O-SticaFinal Year Project BCA Presentation on Pic-O-Stica
Final Year Project BCA Presentation on Pic-O-Stica
 
Web scraping
Web scrapingWeb scraping
Web scraping
 
Report on web development
Report on web developmentReport on web development
Report on web development
 
Blood Bank Management System
Blood Bank Management SystemBlood Bank Management System
Blood Bank Management System
 
E commerce use case documentation.
E commerce use case documentation.E commerce use case documentation.
E commerce use case documentation.
 
Ngo management system.
Ngo management system.Ngo management system.
Ngo management system.
 
Onlineline shopping Yash Bazaar.com
Onlineline shopping Yash Bazaar.comOnlineline shopping Yash Bazaar.com
Onlineline shopping Yash Bazaar.com
 
Online Shop Project Report
Online Shop Project ReportOnline Shop Project Report
Online Shop Project Report
 
Online Shopping based on ASP .NET
Online Shopping based on ASP .NET Online Shopping based on ASP .NET
Online Shopping based on ASP .NET
 
Blood bank-data-abstract-php-project
Blood bank-data-abstract-php-projectBlood bank-data-abstract-php-project
Blood bank-data-abstract-php-project
 
E-commerce (System Analysis and Design)
E-commerce (System Analysis and Design)E-commerce (System Analysis and Design)
E-commerce (System Analysis and Design)
 
WEB Scraping.pptx
WEB Scraping.pptxWEB Scraping.pptx
WEB Scraping.pptx
 
Blood donor managment system
Blood donor managment systemBlood donor managment system
Blood donor managment system
 
Online Book Store Project Presentation by Moin Khan & Sejda E Jannat
Online Book Store Project Presentation by Moin Khan & Sejda E JannatOnline Book Store Project Presentation by Moin Khan & Sejda E Jannat
Online Book Store Project Presentation by Moin Khan & Sejda E Jannat
 
Blog - An Online blogging project
Blog - An Online blogging project Blog - An Online blogging project
Blog - An Online blogging project
 
online-shopping-documentation-srs for TYBSCIT sem 6
 online-shopping-documentation-srs for TYBSCIT sem 6 online-shopping-documentation-srs for TYBSCIT sem 6
online-shopping-documentation-srs for TYBSCIT sem 6
 

Similar to Web crawler with seo analysis

An Effective Approach for Document Crawling With Usage Pattern and Image Base...
An Effective Approach for Document Crawling With Usage Pattern and Image Base...An Effective Approach for Document Crawling With Usage Pattern and Image Base...
An Effective Approach for Document Crawling With Usage Pattern and Image Base...Editor IJCATR
 
college website project report
college website project reportcollege website project report
college website project reportMahendra Choudhary
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET Journal
 
What are the different types of web scraping approaches
What are the different types of web scraping approachesWhat are the different types of web scraping approaches
What are the different types of web scraping approachesAparna Sharma
 
Six Months Industrial Training11
Six Months Industrial Training11Six Months Industrial Training11
Six Months Industrial Training11Manoj Rao
 
The best development services available for Pakistan.ppt
The best development services available for Pakistan.pptThe best development services available for Pakistan.ppt
The best development services available for Pakistan.pptConnect Solutions
 
Detection of Phishing Websites
Detection of Phishing WebsitesDetection of Phishing Websites
Detection of Phishing WebsitesIRJET Journal
 
The Guide to Website Development for Beginners.pdf
The Guide to Website Development for Beginners.pdfThe Guide to Website Development for Beginners.pdf
The Guide to Website Development for Beginners.pdfConnect Solutions
 
iProspect - Tech SEO - Task - 17/12/2019
iProspect - Tech SEO - Task - 17/12/2019iProspect - Tech SEO - Task - 17/12/2019
iProspect - Tech SEO - Task - 17/12/2019Nick Samuel
 
Load Speed PSI development of webcore vitals
Load Speed PSI development of webcore vitalsLoad Speed PSI development of webcore vitals
Load Speed PSI development of webcore vitalsrahmathidayat471220
 
Having Fun Building Web Applications (Day 1 Slides)
Having Fun Building Web Applications (Day 1 Slides)Having Fun Building Web Applications (Day 1 Slides)
Having Fun Building Web Applications (Day 1 Slides)Clarence Ngoh
 
The Guide to Website Development for Beginners.pptx
The Guide to Website Development for Beginners.pptxThe Guide to Website Development for Beginners.pptx
The Guide to Website Development for Beginners.pptxConnect Solutions
 
A Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyA Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyIOSR Journals
 

Similar to Web crawler with seo analysis (20)

Implementation of Web Application for Disease Prediction Using AI
Implementation of Web Application for Disease Prediction Using AIImplementation of Web Application for Disease Prediction Using AI
Implementation of Web Application for Disease Prediction Using AI
 
Nadee2018
Nadee2018Nadee2018
Nadee2018
 
Implementation ofWeb Application for Disease Prediction Using AI
Implementation ofWeb Application for Disease Prediction Using AIImplementation ofWeb Application for Disease Prediction Using AI
Implementation ofWeb Application for Disease Prediction Using AI
 
Job center
Job centerJob center
Job center
 
An Effective Approach for Document Crawling With Usage Pattern and Image Base...
An Effective Approach for Document Crawling With Usage Pattern and Image Base...An Effective Approach for Document Crawling With Usage Pattern and Image Base...
An Effective Approach for Document Crawling With Usage Pattern and Image Base...
 
college website project report
college website project reportcollege website project report
college website project report
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine Optimization
 
jtmcv
jtmcvjtmcv
jtmcv
 
What are the different types of web scraping approaches
What are the different types of web scraping approachesWhat are the different types of web scraping approaches
What are the different types of web scraping approaches
 
Six Months Industrial Training11
Six Months Industrial Training11Six Months Industrial Training11
Six Months Industrial Training11
 
The best development services available for Pakistan.ppt
The best development services available for Pakistan.pptThe best development services available for Pakistan.ppt
The best development services available for Pakistan.ppt
 
Detection of Phishing Websites
Detection of Phishing WebsitesDetection of Phishing Websites
Detection of Phishing Websites
 
The Guide to Website Development for Beginners.pdf
The Guide to Website Development for Beginners.pdfThe Guide to Website Development for Beginners.pdf
The Guide to Website Development for Beginners.pdf
 
Seo report
Seo reportSeo report
Seo report
 
iProspect - Tech SEO - Task - 17/12/2019
iProspect - Tech SEO - Task - 17/12/2019iProspect - Tech SEO - Task - 17/12/2019
iProspect - Tech SEO - Task - 17/12/2019
 
Load Speed PSI development of webcore vitals
Load Speed PSI development of webcore vitalsLoad Speed PSI development of webcore vitals
Load Speed PSI development of webcore vitals
 
javed ahmed
javed ahmedjaved ahmed
javed ahmed
 
Having Fun Building Web Applications (Day 1 Slides)
Having Fun Building Web Applications (Day 1 Slides)Having Fun Building Web Applications (Day 1 Slides)
Having Fun Building Web Applications (Day 1 Slides)
 
The Guide to Website Development for Beginners.pptx
The Guide to Website Development for Beginners.pptxThe Guide to Website Development for Beginners.pptx
The Guide to Website Development for Beginners.pptx
 
A Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyA Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET Technology
 

Recently uploaded

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Thierry Lestable
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform EngineeringJemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»QADay
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsVlad Stirbu
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Product School
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 

Recently uploaded (20)

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 

Web crawler with seo analysis

  • 1. By Vikram Parmar (110173116010) Hitesh Mavani (100170116023) Internal Guide: Mr. Safvan Vahora
  • 2. Abstract  Now a day there are supposed to many web sites which provide student details as well as college details along with rank and fees structure, this sites have legal statement for providing all this kind of information to the world.  Our project web crawler with seo analysis actually providing information from that sites with chronological manner where student can find his/her information about results as well as college details and he/she also able to compare result with another student via proving unique number.  The system will be build using Microsoft Technology (vb.NET, C#) and Microsoft SQL Server Database is as the RDBMS.  With the help of SEO skill the most of search engine can able to find out data from our site and when any student wanted information about their result he/she need to type registration number or enrollment number, Roll number directly on the search engine etc..  We are going to develop Desktop application for harvesting data and Web application for providing data online for student to search their data.
  • 3. Objective  Web scraping is the process of extracting and creating a structured representation of data from a web site. HTML, the markup language used to structure data on webpages, is subject to change when for instance the look-and-feel is updated. Since current techniques for web scraping are based on the mark-up, a change may lead to the extraction of incorrect data.  Web crawling with seo analysis shall provide many services that is specially subjected to students like- Checking results, send results to the friend email address, downloading exam related materials as well as some paid eBooks from some other popular web sites like flipkart and amazon.  Our main objectives is to scrap (Automated Copy) data from other sites and providing it to the student who need to checking results as well as comparing it to the other student.
  • 4. Introduction  A Web crawler is one type of bot or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier.  On the other hand our system is focuses on harvesting data that resided on the live page; this process is starting with the initial URL of the website and then there is a manually need of checking which, how many pages going to be harvesting.  Harvesting of data is kind of illegal activity but now a day no one wanted to do copy and paste data in spread sheets and any other types of documentation format so just once write copy for all kind of technology is made here.
  • 5. Scope  Web crawling with seo analysis is both desktop base as well as web based system; in desktop base there is no scope for user it’s only stated for admin level while in other hand of web based system is totally user biased and also for admin to add and remove user.  In desktop application only admin shall generate student unique number in automated manner and then it’s automatically stored in local server database.  If any web sites having Captcha code for preventing bots then it’s not possible to scrap those pages. So the one of the most critical limitation of our project is to ignore the site that secured with the automated robot preventing security.  Another main limitation of our system we have to manually check it out which pages we are going to utilized for scrapping data purpose however our crawler first determine all the link of the targeted site and store it in a file.
  • 6. Technology Used o            Application architecture .NET Technology Programming Language vb.net ,C# Testing Browsers Mozilla Firefox, Google chrome. Internet technology Google Analytics,Webmaster,Directory submission. Database Sql Server 2008 Operating System Windows xp or onwards
  • 7. Module  There are many three module with two systems. 1)Standalone Application  Admin 2)Web Application  Admin  Student
  • 8. Standalone Application  Admin is one who can generate unique number in automated manner and scrapper:  Generate Automated Unique Number .  Able crawl Link of targeted web site.  Able to Harvest Data From the targeted sites.  Make Database from the scrapped sites.
  • 9. Desktop application  Admin module: is for insert/update/delete and save data when required change in databases as well as whole site because our system is totally flying where change is need on regular basis.  Registered user: Here registered user can get daily update via out automated system so user can get into touch.  Visitor: Here visitor module is stated for just seeing data from our site there is no purpose for get into touch with our system.
  • 15. Expected Outcome  Student can find their results.  Student can compare their results with other student of the same class.  Student can also fine exam based materials .  Student can send results to his/her friend via our system.  Student can download exam time table.  Student can also find the project of other student of the globe.  Student can be ask the question related to the community.
  • 16. Conclusion  Web crawler has met our goals to make web scraping easy and automatic retraining possible. It has proved a convenient tool for extracting data from the web. Site Scraper has been tested over deferent domains with high effectiveness. We believe that Site scraper can provide a robust and flexible solution for the problems of dealing with web data.
  • 17. Reference  Roger. S. Pressman, Software Engineering, TMH 2005.  Object Oriented Modeling and Design with UML-Michael Blaha, James Rambaugh 2nd edition.  Web Application Development by Technical publications.  http://www.asp.net A. A. Puntambekar