Web crawler with seo analysis


Published on

Published in: Technology, Design
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Web crawler with seo analysis

  1. 1. By Vikram Parmar (110173116010) Hitesh Mavani (100170116023) Internal Guide: Mr. Safvan Vahora
  2. 2. Abstract  Now a day there are supposed to many web sites which provide student details as well as college details along with rank and fees structure, this sites have legal statement for providing all this kind of information to the world.  Our project web crawler with seo analysis actually providing information from that sites with chronological manner where student can find his/her information about results as well as college details and he/she also able to compare result with another student via proving unique number.  The system will be build using Microsoft Technology (vb.NET, C#) and Microsoft SQL Server Database is as the RDBMS.  With the help of SEO skill the most of search engine can able to find out data from our site and when any student wanted information about their result he/she need to type registration number or enrollment number, Roll number directly on the search engine etc..  We are going to develop Desktop application for harvesting data and Web application for providing data online for student to search their data.
  3. 3. Objective  Web scraping is the process of extracting and creating a structured representation of data from a web site. HTML, the markup language used to structure data on webpages, is subject to change when for instance the look-and-feel is updated. Since current techniques for web scraping are based on the mark-up, a change may lead to the extraction of incorrect data.  Web crawling with seo analysis shall provide many services that is specially subjected to students like- Checking results, send results to the friend email address, downloading exam related materials as well as some paid eBooks from some other popular web sites like flipkart and amazon.  Our main objectives is to scrap (Automated Copy) data from other sites and providing it to the student who need to checking results as well as comparing it to the other student.
  4. 4. Introduction  A Web crawler is one type of bot or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier.  On the other hand our system is focuses on harvesting data that resided on the live page; this process is starting with the initial URL of the website and then there is a manually need of checking which, how many pages going to be harvesting.  Harvesting of data is kind of illegal activity but now a day no one wanted to do copy and paste data in spread sheets and any other types of documentation format so just once write copy for all kind of technology is made here.
  5. 5. Scope  Web crawling with seo analysis is both desktop base as well as web based system; in desktop base there is no scope for user it’s only stated for admin level while in other hand of web based system is totally user biased and also for admin to add and remove user.  In desktop application only admin shall generate student unique number in automated manner and then it’s automatically stored in local server database.  If any web sites having Captcha code for preventing bots then it’s not possible to scrap those pages. So the one of the most critical limitation of our project is to ignore the site that secured with the automated robot preventing security.  Another main limitation of our system we have to manually check it out which pages we are going to utilized for scrapping data purpose however our crawler first determine all the link of the targeted site and store it in a file.
  6. 6. Technology Used o            Application architecture .NET Technology Programming Language vb.net ,C# Testing Browsers Mozilla Firefox, Google chrome. Internet technology Google Analytics,Webmaster,Directory submission. Database Sql Server 2008 Operating System Windows xp or onwards
  7. 7. Module  There are many three module with two systems. 1)Standalone Application  Admin 2)Web Application  Admin  Student
  8. 8. Standalone Application  Admin is one who can generate unique number in automated manner and scrapper:  Generate Automated Unique Number .  Able crawl Link of targeted web site.  Able to Harvest Data From the targeted sites.  Make Database from the scrapped sites.
  9. 9. Desktop application  Admin module: is for insert/update/delete and save data when required change in databases as well as whole site because our system is totally flying where change is need on regular basis.  Registered user: Here registered user can get daily update via out automated system so user can get into touch.  Visitor: Here visitor module is stated for just seeing data from our site there is no purpose for get into touch with our system.
  10. 10. Use case diagram
  11. 11. Context level DFD
  12. 12. 1st level DFD
  13. 13. 2st level DFD
  14. 14. E-R diagram
  15. 15. Expected Outcome  Student can find their results.  Student can compare their results with other student of the same class.  Student can also fine exam based materials .  Student can send results to his/her friend via our system.  Student can download exam time table.  Student can also find the project of other student of the globe.  Student can be ask the question related to the community.
  16. 16. Conclusion  Web crawler has met our goals to make web scraping easy and automatic retraining possible. It has proved a convenient tool for extracting data from the web. Site Scraper has been tested over deferent domains with high effectiveness. We believe that Site scraper can provide a robust and flexible solution for the problems of dealing with web data.
  17. 17. Reference  Roger. S. Pressman, Software Engineering, TMH 2005.  Object Oriented Modeling and Design with UML-Michael Blaha, James Rambaugh 2nd edition.  Web Application Development by Technical publications.  http://www.asp.net A. A. Puntambekar
  18. 18. THANK YOU