Your SlideShare is downloading. ×
0
rendre AJAX crawlable par les moteurs
rendre AJAX crawlable par les moteurs
rendre AJAX crawlable par les moteurs
rendre AJAX crawlable par les moteurs
rendre AJAX crawlable par les moteurs
rendre AJAX crawlable par les moteurs
rendre AJAX crawlable par les moteurs
rendre AJAX crawlable par les moteurs
rendre AJAX crawlable par les moteurs
rendre AJAX crawlable par les moteurs
rendre AJAX crawlable par les moteurs
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

rendre AJAX crawlable par les moteurs

746

Published on

Published in: Self Improvement
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
746
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Making AJAX crawlable Katharina Probst Engineer, Google Bruce Johnson Engineering Manager, Google in collaboration with: Arup Mukherjee, Erik van der Poel, Li Xiao , Google
  • 2. <ul><li>Web crawlers don't always see what the user sees </li></ul><ul><ul><li>JavaScript produces dynamic content that is not seen by crawlers </li></ul></ul><ul><ul><li>Example: A Google Web Toolkit application that looks like this to a user... </li></ul></ul><ul><li>     </li></ul><ul><li>       ...but a web crawler only sees this:            <script src='showcase.js'></script> </li></ul>The problem of AJAX for web crawlers
  • 3. <ul><ul><li>Web 2.0: More content on the web is created dynamically (~69%) </li></ul></ul><ul><ul><li>Over time, this hurts search </li></ul></ul><ul><ul><li>Developers are discouraged from building dynamic apps </li></ul></ul><ul><ul><li>Not solving AJAX crawlability holds back progress on the web! </li></ul></ul>Why does this problem need to be solved?
  • 4. A crawler's view of the web - with and without AJAX
  • 5. <ul><ul><li>Crawling and indexing AJAX is needed for users and developers </li></ul></ul><ul><ul><li>Problem: Which AJAX states can be indexed? </li></ul></ul><ul><ul><ul><li>Explicit opt-in needed by the web server </li></ul></ul></ul><ul><ul><li>Problem: Don't want to cloak </li></ul></ul><ul><ul><ul><li>Users and search engine crawlers need to see the same content </li></ul></ul></ul><ul><ul><li>Problem: How could the logistics work? </li></ul></ul><ul><ul><ul><li>That's the remainder of the presentation </li></ul></ul></ul>Goal: crawl and index AJAX
  • 6. <ul><ul><li>Crawlers execute all the web's JavaScript </li></ul></ul><ul><ul><ul><li>This is expensive and time-consuming </li></ul></ul></ul><ul><ul><ul><li>Only major search engines would even be able to do this, and probably only partially </li></ul></ul></ul><ul><ul><ul><li>Indexes would be more stale, resulting in worse search results </li></ul></ul></ul><ul><ul><li>Web servers execute their own JavaScript at crawl time </li></ul></ul><ul><ul><ul><li>Avoids above problems </li></ul></ul></ul><ul><ul><ul><li>Gives more control to webmasters </li></ul></ul></ul><ul><ul><ul><li>Can be done automatically </li></ul></ul></ul><ul><ul><ul><li>Does not require ongoing maintenance </li></ul></ul></ul>Possible solutions
  • 7. Overview of proposed approach - crawl time <ul><li>Crawling is enabled by mapping between   </li></ul><ul><ul><li>&quot;pretty&quot; URLs: www.example.com/page?query#!mystate </li></ul></ul><ul><ul><li>&quot;ugly&quot; URLs: www.example.com/page?query&_escaped_fragment_=mystate </li></ul></ul>
  • 8. Overview of proposed approach - search time Nothing changes!
  • 9. <ul><ul><li>Web servers agree to </li></ul></ul><ul><ul><ul><li>opt in by indicating indexable states  </li></ul></ul></ul><ul><ul><ul><li>execute JavaScript for ugly URLs (no user agent sniffing!)  </li></ul></ul></ul><ul><ul><ul><li>not cloak by always giving same content to browser and crawler regardless of request (or risk elimination, as before) </li></ul></ul></ul><ul><li>  </li></ul><ul><ul><li>Search engines agree to </li></ul></ul><ul><ul><ul><li>discover URLs as before (Sitemaps, hyperlinks) </li></ul></ul></ul><ul><ul><ul><li>modify pretty URLs to ugly URLs </li></ul></ul></ul><ul><ul><ul><li>index content </li></ul></ul></ul><ul><ul><ul><li>display pretty URLs </li></ul></ul></ul>Agreement between participants
  • 10. http://example.com/stocks.html#GOOG could easily be changed to   http://example.com/stocks.html#!GOOG   which can be crawled as   http://example.com/stocks.html?_escaped_fragment_=GOOG   but will be displayed in the search results as http://example.com/stocks.html#!GOOG Summary: Life of a URL
  • 11. <ul><ul><li>We are currently working on a proposal and prototype implementation </li></ul></ul><ul><ul><li>Check out the blog post on the Google Webmaster Central Blog: http://googlewebmastercentral.blogspot.com </li></ul></ul><ul><ul><li>We welcome feedback from the community at the Google Webmaster Help Forum (link is posted in the blog entry) </li></ul></ul>Feedback is welcome

×