Privacy and Google Search Engine Indexing

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    Privacy and Google Search Engine Indexing - Presentation Transcript

    1. Wasim Rangoonwala Http://techwasim.blogspot.com Http://www.auto-insurancerates.com
      • “ Privacy is the claim of individuals, groups or institutions to determine for themselves when, how, and to what extent information about them is communicated to others”
        • - Alan Westin: Privacy & Freedom,1967
    2.  
    3. What are www Robots? A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced . Web robots are sometimes referred to as Web Wanderers, Web Crawlers, or Spiders or Bots.
    4. Web Spiders / Robots Collecting Data
    5. Controlling how search engine access and index your website? Google refers to their spiders as Googlebots and Googlebots-Image Google has a set of computers that continually crawl the web. Together these machines are known as the Googlebot . In general you want Googlebot to access your site so your web pages can be found by people searching on Google.
    6. Controlling how search engine access and index your website? One key Question is: how does Google know what parts of a website the site owner wants to have show up in search results? Can publishers specify that some parts of the site should be private and non-searchable? The good news is that those who publish on the web have a lot of control over which pages should appear in search results and which pages can be kept Private. . Answer: Robots.txt File
    7. Controlling how search engine access and index your website?
      • Robots.txt has been an industry standard for many years that lets a site owner control how search engines access their web site.
      • The robots.txt file contains a list of the pages that search engines shouldn't access.
      • You can exclude pages from Google's crawler by creating a text file called robots.txt and placing it in the root directory.
      Making Use of Robots.txt File
    8. Controlling how search engine access and index your website?
      • Example of pages you want to kept private from search engines
      • A directory that contains internal logs.
      • News articles that require payment to access.
      • Administration area of website. Database configuration string, stored passwords, credit card details.
      • Images that you want to kept Private.
      Making Use of Robots.txt File Continue
    9. Achieving Privacy through Robots.txt File # robots.txt File # Currently disallow all images to the Google Image bot User-agent: Googlebot-Image Disallow: / # ALL search engine spiders/crawlers (put at end of file) User-agent: Googlebot Disallow: /admin/ Disallow: /account_password.html Disallow: /address_book.html Disallow: /checkout_payment.html Disallow: /cookie_usage.html Disallow: /login.html Example of Robots.txt File
    10. Privacy through Robots <META> tag
      • You can use a special HTML <META> tag to tell robots not to index
      • the content of a page, and/or not scan it for links to follow.
      • Example
      • <html>
      • <head>
      • <title>...</title>
      • <META NAME=&quot;ROBOTS&quot; CONTENT=&quot;NOINDEX, NOFOLLOW&quot;>
      • </head>
      • The &quot;NAME&quot; attribute must be &quot;ROBOTS&quot;.
      • Valid values for the &quot;CONTENT&quot; attribute are: &quot;INDEX&quot;, &quot;NOINDEX&quot;, &quot;FOLLOW&quot;, &quot;NOFOLLOW&quot;. Multiple comma-separated values are allowed, but obviously only some combinations make sense. If there is no robots <META> tag, the default is &quot;INDEX,FOLLOW&quot;, so there's no need to spell that out.
      Example of <META> Tag
    11. Search Engine Web Spiders Names
      • Yahoo! Search- Yahoo Slurp
      • AltaVista- Scooter
      • AskJeeves- Ask Jeeves/Teoma
      • MSN Search- MSNbot
      • Visit http://www.robotstxt.org/db.html
      • For more details on Search Engine
      • Web Spider Names.
    12. Bonus
    13. Google: Anatomy
      • Google Crawlers (GoogleBot)
        • Multiple distributed crawlers
        • Own DNS cache
        • 300 connections open at once
        • Send fetched pages to Store Server
        • Originally written in Python
    14. P a g e R a n k ™ Algorithm H y p e r t e x t - m a t c h i n g Analysis Google: Technology
    15. Google Webmaster Central Webmasters Central offer services: • see which parts of a site Googlebot had problems crawling • upload an XML Sitemap file • analyze and generate robots.txt files • remove URLs already crawled by Googlebot • specify the preferred domain • identify issues with title and description meta tags • understand the top searches used to reach a site • get a glimpse at how Googlebot sees pages • remove unwanted site links that Google may use in results
    16. When surfing the internet, avoid “free” offers and protect your information! Chatting – guard your information unless You are 100% Sure who you are chatting with. Cookies aren’t just for eating, they may be sending your personal information to others. Protect your passwords like you would your wallet or car keys. Make it complicate ! E-mail is not secure and should never be though of as private. Don’t even open Spam , download a spam buster ! Beware of phishing, which are fake e-mails Sent to try to gain your personal and financial information. Protect your privacy on the Web
    17.  
      • http://www.google.com/support/webmasters/bin/answer.py?answer=80553
      • http://www.google.com/bot.html
      • http://www.googleguide.com
      • http://www.searchengineposition.com
      • http://www.google-watch.org
      • http://www.robotstxt.org/db.html
      • http://www.googleblog.blogspot.com
      • For more Details Visit http://techwasim.blogspot.com

    + Wasim RangoonwalaWasim Rangoonwala, 2 years ago

    custom

    1141 views, 1 favs, 2 embeds more stats

    Presentation on how search engine index websites, h more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 1141
      • 1125 on SlideShare
      • 16 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 54
    Most viewed embeds
    • 9 views on http://www.techwasim.blogspot.com
    • 7 views on http://techwasim.blogspot.com

    more

    All embeds
    • 9 views on http://www.techwasim.blogspot.com
    • 7 views on http://techwasim.blogspot.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories