Your SlideShare is downloading. ×
0
Making AJAX crawlable Katharina Probst Engineer, Google Bruce Johnson Engineering Manager, Google in collaboration with: A...
<ul><li>Web crawlers don't always see what the user sees </li></ul><ul><ul><li>JavaScript produces dynamic content that is...
<ul><ul><li>Web 2.0: More content on the web is created dynamically (~69%) </li></ul></ul><ul><ul><li>Over time, this hurt...
A crawler's view of the web - with and without AJAX
<ul><ul><li>Crawling and indexing AJAX is needed for users and developers </li></ul></ul><ul><ul><li>Problem: Which AJAX s...
<ul><ul><li>Crawlers execute all the web's JavaScript </li></ul></ul><ul><ul><ul><li>This is expensive and time-consuming ...
Overview of proposed approach - crawl time <ul><li>Crawling is enabled by mapping between    </li></ul><ul><ul><li>&quot;p...
Overview of proposed approach - search time Nothing changes!
<ul><ul><li>Web servers agree to </li></ul></ul><ul><ul><ul><li>opt in by indicating indexable states  </li></ul></ul></ul...
http://example.com/stocks.html#GOOG could easily be changed to   http://example.com/stocks.html#!GOOG   which can be crawl...
<ul><ul><li>We are currently working on a proposal and prototype implementation </li></ul></ul><ul><ul><li>Check out the b...
Upcoming SlideShare
Loading in...5
×

rendre AJAX crawlable par les moteurs

757

Published on

Published in: Self Improvement
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
757
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "rendre AJAX crawlable par les moteurs"

  1. 1. Making AJAX crawlable Katharina Probst Engineer, Google Bruce Johnson Engineering Manager, Google in collaboration with: Arup Mukherjee, Erik van der Poel, Li Xiao , Google
  2. 2. <ul><li>Web crawlers don't always see what the user sees </li></ul><ul><ul><li>JavaScript produces dynamic content that is not seen by crawlers </li></ul></ul><ul><ul><li>Example: A Google Web Toolkit application that looks like this to a user... </li></ul></ul><ul><li>     </li></ul><ul><li>       ...but a web crawler only sees this:            <script src='showcase.js'></script> </li></ul>The problem of AJAX for web crawlers
  3. 3. <ul><ul><li>Web 2.0: More content on the web is created dynamically (~69%) </li></ul></ul><ul><ul><li>Over time, this hurts search </li></ul></ul><ul><ul><li>Developers are discouraged from building dynamic apps </li></ul></ul><ul><ul><li>Not solving AJAX crawlability holds back progress on the web! </li></ul></ul>Why does this problem need to be solved?
  4. 4. A crawler's view of the web - with and without AJAX
  5. 5. <ul><ul><li>Crawling and indexing AJAX is needed for users and developers </li></ul></ul><ul><ul><li>Problem: Which AJAX states can be indexed? </li></ul></ul><ul><ul><ul><li>Explicit opt-in needed by the web server </li></ul></ul></ul><ul><ul><li>Problem: Don't want to cloak </li></ul></ul><ul><ul><ul><li>Users and search engine crawlers need to see the same content </li></ul></ul></ul><ul><ul><li>Problem: How could the logistics work? </li></ul></ul><ul><ul><ul><li>That's the remainder of the presentation </li></ul></ul></ul>Goal: crawl and index AJAX
  6. 6. <ul><ul><li>Crawlers execute all the web's JavaScript </li></ul></ul><ul><ul><ul><li>This is expensive and time-consuming </li></ul></ul></ul><ul><ul><ul><li>Only major search engines would even be able to do this, and probably only partially </li></ul></ul></ul><ul><ul><ul><li>Indexes would be more stale, resulting in worse search results </li></ul></ul></ul><ul><ul><li>Web servers execute their own JavaScript at crawl time </li></ul></ul><ul><ul><ul><li>Avoids above problems </li></ul></ul></ul><ul><ul><ul><li>Gives more control to webmasters </li></ul></ul></ul><ul><ul><ul><li>Can be done automatically </li></ul></ul></ul><ul><ul><ul><li>Does not require ongoing maintenance </li></ul></ul></ul>Possible solutions
  7. 7. Overview of proposed approach - crawl time <ul><li>Crawling is enabled by mapping between   </li></ul><ul><ul><li>&quot;pretty&quot; URLs: www.example.com/page?query#!mystate </li></ul></ul><ul><ul><li>&quot;ugly&quot; URLs: www.example.com/page?query&_escaped_fragment_=mystate </li></ul></ul>
  8. 8. Overview of proposed approach - search time Nothing changes!
  9. 9. <ul><ul><li>Web servers agree to </li></ul></ul><ul><ul><ul><li>opt in by indicating indexable states  </li></ul></ul></ul><ul><ul><ul><li>execute JavaScript for ugly URLs (no user agent sniffing!)  </li></ul></ul></ul><ul><ul><ul><li>not cloak by always giving same content to browser and crawler regardless of request (or risk elimination, as before) </li></ul></ul></ul><ul><li>  </li></ul><ul><ul><li>Search engines agree to </li></ul></ul><ul><ul><ul><li>discover URLs as before (Sitemaps, hyperlinks) </li></ul></ul></ul><ul><ul><ul><li>modify pretty URLs to ugly URLs </li></ul></ul></ul><ul><ul><ul><li>index content </li></ul></ul></ul><ul><ul><ul><li>display pretty URLs </li></ul></ul></ul>Agreement between participants
  10. 10. http://example.com/stocks.html#GOOG could easily be changed to   http://example.com/stocks.html#!GOOG   which can be crawled as   http://example.com/stocks.html?_escaped_fragment_=GOOG   but will be displayed in the search results as http://example.com/stocks.html#!GOOG Summary: Life of a URL
  11. 11. <ul><ul><li>We are currently working on a proposal and prototype implementation </li></ul></ul><ul><ul><li>Check out the blog post on the Google Webmaster Central Blog: http://googlewebmastercentral.blogspot.com </li></ul></ul><ul><ul><li>We welcome feedback from the community at the Google Webmaster Help Forum (link is posted in the blog entry) </li></ul></ul>Feedback is welcome
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×