Google crawling

776 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
776
On SlideShare
0
From Embeds
0
Number of Embeds
39
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Google crawling

  1. 1. Google Crawling Naveen Gujar
  2. 2. Google Crawlers GOOGLEBOT IMAGEBOT MEDIABOT ADSBOT GOOGLEBOT MOBILE FEEDFETCHER GOOGLE Naveen Gujar
  3. 3. GOOGLEBOT The Google Crawler that reads the web pages File Formats read by Googlebot 1. Adobe's PDF documents and PostScript (ps) 2. Microsoft Office Document types :- Excel, Word, Powerpoint 3. Lotus Document types :-wk1, wk2, wk3, wk4, wk5, wki, wks, wku 4. Lotus WordPro(lwp), Macwrite(mw), Rich Text Format(rtf) Text files(txt) Naveen Gujar
  4. 4. More on GOOGLEBOT 1. File Formats Avoided : - exe, dll, zip, dmg 2. Can be redirected to crawl a certain page through the Use of ROBOTS.txt file Naveen Gujar
  5. 5. GOOGLEBOT Partners FRESHBOT:- Used to Crawl updated Pages on the web. DEEPBOT:- Follows as many links and download as may pages as possible Naveen Gujar
  6. 6. MEDIABOT  Used for serving contextually relevant ads to the publishing sites  Purpose- To analyze content of webpages so that adsense can serve meaningful ads on the site.  This crawler should not be restricted on sites using Adsense Naveen Gujar
  7. 7. IMAGEBOT  Scavenges the web for images to place in their image index  Ranking of images for a particular keywords depends upon:- Filename, Surrounding text, AltText and Pagetitle  If website is not focused on image inventory and downloads it makes sense to restrict IMAGEBOT using ROBOTS.txt  Restricting IMAGEBOT also saves some Bandwidth Naveen Gujar
  8. 8. ADSBOT  It serves a very specific purpose as far as crawling is concerned  Geared to provide wisdom to Google Adsense program by:- Analyzing the content of pages landing to it. This content analysis helps in determining the Quality Score for a particular ad. This Quality score in association with the Bid Amount & CTR (Click Through rate) is used by Google to determine the ranking score of an Ad for a particular Keyword. Naveen Gujar
  9. 9. GOOGLEBOT-MOBILE  Google does use a specific cawler for indexing mobile content.  Google indexes public mobile content.  If the content appears to be available only to subset of all mobile users, it is NOT indexed.  Users can search the mobile web on their mobile devices using Google Mobile Web Search. Naveen Gujar
  10. 10. Getting Your Mobile Content Indexed Steps are roughly the same:-  Submit Mobile Sitemaps to the Google Mobile Index just in the same way as the Non-mobile site maps are submitted.  You create and add Mobile Sitemaps to your Google Webmaster tools account in a similar way to Sitemaps for non-mobile content.  If your Mobile site has changed, then you can resubmit your map Naveen Gujar
  11. 11. FEEDFETCHER-GOOGLE This is the RSS and ATOM feed crawler of Google  All Blogs published thorugh BLOGGER, Wordpress, Typepad etc  Blogs written in ENGLISH, FRENCH,GERMAN,ITALIAN, SPANISH, BRAZILIAN, PORTUGESE etc.  Average Crawl frequency is more than an hour, depending on frequency of the Blog's update frequency.  If your Blog publishes a site feed in any format & pings an update service, then the contents of this feed will be indexed in the Blog Search. Naveen Gujar

×