White Hat Cloaking
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

White Hat Cloaking

  • 3,195 views
Uploaded on

My SMX Advanced Presentation on White Hat Cloaking

My SMX Advanced Presentation on White Hat Cloaking

More in: Technology , Design
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,195
On Slideshare
2,416
From Embeds
779
Number of Embeds
8

Actions

Shares
Downloads
17
Comments
0
Likes
2

Embeds 779

http://herramientaseo.wordpress.com 768
http://plus.url.google.com 3
http://www.slideshare.net 2
http://www.slideee.com 2
http://207.46.192.232 1
http://www.linkedin.com 1
https://www.linkedin.com 1
https://herramientaseo.wordpress.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Hi, My name is Hamlet Batista. Some of you know me from my blog, Hamlet Batista dot Com. I’m sure that everybody here has been taught that cloaking is bad. Today I am here to tell you otherwise. I am here to convince you that you should cloak. Now, before you leave the room for fear of castigation by Google, let me share some practical scenarios where good cloaking makes sense. I will contrast cloaking to other recommended alternatives and show why cloaking is still a better option. Hopefully, at the end of my presentation I will have convinced the search engineers in my panel as well.

Transcript

  • 1. White Hat Cloaking – Six Practical Applications Presented by Hamlet Batista
  • 2. Why white hat cloaking?
    • “ Good” vs “bad” cloaking is all about your intention
    • Always weigh the risks versus the rewards of cloaking
    • Ask permission— or just don’t call it cloaking!
    • Cloaking vs “IP delivery”
    Page 
  • 3. Crash course in white hat cloaking Page  When to cloak? How do we cloak? How can cloaking be detected? Risks and next steps 1 2 4 5 Practical scenarios where good cloaking makes sense Practical scenarios and alternatives 3
  • 4. When is practical to cloak?
    • Content accessibility
      • Search unfriendly Content Management Systems
      • Rich media sites
      • Content behind forms
    • Membership sites
      • Free and paid content
    • Site structure improvements
      • Alternative to PR sculpting via “no-follow“
    • Geolocation/IP delivery
    • Multivariate testing
    Page 
  • 5. Practical scenario #1 Page  Regular users see
    • URLs with many dynamic parameters
    • URLs with session IDs
    • URLs with canonicalization issues
    • Missing titles and meta descriptions
    Search engine robot sees
    • Search engine friendly URLs
    • URLs without session IDs
    • URLs with a consistent naming convention
    • Automatically generated titles and meta descriptions
    Proprietary website management systems that are not search-engine friendly
  • 6. Practical scenario #2 Page  Sites built completely in Flash, Silverlight or any other rich media technology Search engine robot sees
    • A text representation of all graphical (images) elements
    • A text representation of all motion (video) elements
    • A text transcription of all audio in the rich media content
    Your text
  • 7. Practical scenario #3 Page  Membership sites Search users see
    • Snippets of premium content on the SERPs
    • When they land on the site they are faced with a registration form
    Your text Members sees
    • The same content search engine robots see
  • 8. Practical scenario #4 Page  Regular users follow a link structure designed for ease of navigation Sites requiring massive site strucuture changes to improve index penetration Search engine robots follow a link structure designed for ease of crawling and deeper index penetration of the most important content Step 4 Step 1 Step 2 Step 3 Step 4 Step 5 Step 1 Step 3 Step 2 Step 5
  • 9. Practical scenario #5 Page  Sites using geolocation technology Regular users see
    • Content tailored to their geographical location and/or user’s language
    Your text Search engine robot sees
    • The same content consistently
  • 10. Practical scenario #6 Page  Split testing organic search landing pages Each regular user sees
    • One of the content experiment alternatives
    Your text Search engine robot sees
    • The same content consistently
  • 11. How do we cloak? Page  Search robot detection
    • By HTTP User agent
    • By IP address
    • By HTTP cookie test
    • By JavaScript/CSS test
    • By DNS double check
    • By visitor behavior
    • By combining all the techniques
    Content delivery
    • Presenting the equivalent of the inaccesible content to robots
    • Presenting the search-engine friendly content to robots
    • Presenting the content behind forms robots
    Cloaking is performed with a web server script or module
  • 12. Robot detection by HTTP user agent Page  Search robot HTTP request 66.249.66.1 - - [04/Mar/2008:00:20:56 -0500] “ GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/ HTTP/1.1″ 200 61477 “ -” “ Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” “-” A very simple robot detection technique
  • 13. Robot detection by HTTP cookie test Page  Search robot HTTP request 66.249.66.1 - - [04/Mar/2008:00:20:56 -0500] “ GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/ HTTP/1.1″ 200 61477 “ -” “ Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” “ Missing cookie info ” Another simple robot detection technique, but weaker
  • 14. Robot detection by JavaScript/CSS test HTML Code <div id=&quot;header&quot;><h1><a href=&quot;http://www.example.com&quot; title=&quot;Example Site&quot;>Example site</a></h1></div> and the CSS code is pretty straight forward, it swaps out anything in the h1 tag in the header with an image CSS Code /* CSS Image replacement */ #header h1 {margin:0; padding:0;} #header h1 a { display: block; padding: 150px 0 0 0; background: url(path to image) top right no-repeat; overflow: hidden; font-size: 1px; line-height: 1px; height: 0px !important; height /**/:150px; } Page  DHTML Content Another option for robot detection
  • 15. Robot detection by IP address Page  Search robot HTTP request 66.249.66.1 - - [04/Mar/2008:00:20:56 -0500] “ GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/ HTTP/1.1″ 200 61477 “ -” “ Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” “ -” A more robust robot detection technique
  • 16. Robot detection by double DNS check Page  Search robot HTTP request
    • nslookup
    • 66.249.66.1
    • Name: crawl-66-249-66-1.googlebot.com
    • Address: 66.249.66.1
    • crawl-66-249-66-1.googlebot.com
    • Non-authoritative answer:
    • Name: crawl-66-249-66-1.googlebot.com
    • Address : 66.249.66.1
    A more robust robot detection technique
  • 17. Robot detection by visitor behavior Page  Robots differ substantially from regular users when visiting a website Your text
  • 18. Combining the best of all techniques Page  Maintain a cache with a list of known search robots to reduce the number of verification attempts Label as possible robot any visitor with suspicious behavior Label a robot anything that identifies as such Confirm it is a robot by doing a double DNS check. Also confirm suspect robots User Behavior Check User Agent Check IP Address Check Double DNS check
  • 19. Clever cloaking detection Page  A clever detection technique is to check the caches at the newest datacenters
    • IP-based detection techniques rely on an up-to-date list of robot IPs
    • Search engines change IPs on a regular basis
    • It is possible to identify those new IPs and check the cache
    Your text
  • 20. Risks of cloaking Page  Search engines do not want to accept any type of cloaking Survival tips
    • The safest way to cloak is to ask for permission from each of the search engines that you care about
    • Refer to it as IP delivery .
    Your text
    • Cloaking : Serving different content to users than to Googlebot. This is a violation of our webmaster guidelines . If the file that Googlebot sees is not identical to the file that a typical user sees, then you're in a high-risk category. A program such as md5sum or diff can compute a hash to verify that two different files are identical.
    • http://googlewebmastercentral.blogspot.com/2008/06/how-google-defines-ip-delivery.html
  • 21. Next Steps
    • Make sure clients understand the risks/rewards of implementing white hat cloaking
    • More information and how to get started
      • How Google defines IP delivery, geolocation and cloaking http://googlewebmastercentral.blogspot.com/2008/06/how-google-defines-ip-delivery.html
      • First Click Free http://googlenewsblog.blogspot.com/2007/09/first-click-free.html
      • Good Cloaking, Evil Cloaking and Detection http://searchengineland.com/070301-065358.php
      • YADAC: Yet Another Debate About Cloaking Happens Again http://searchengineland.com/070304-231603.php
      • Cloaking is OK Says Google http://blog.venture-skills.co.uk/2007/07/06/cloaking-is-ok-says-google/
      • Advanced Cloaking Technique: How to feed password-protected content to search engine spiders http://hamletbatista.com/2007/09/03/advanced-cloaking-technique-how-to-feed-password-protected-content-to-search-engine-spiders/
    Page 
  • 22.
    • Blog http://hamletbatista.com
    • LinkedIn http://www.linkedin.com/in/hamletbatista
    • Facebook http://www.facebook.com/people/Hamlet_Batista/613808617
    • Twitter http://twitter.com/hamletbatista
    • E-mail [email_address]
    Page  I would be happy to help. Feel free to contact me ? ? ?