Web Crawling- Scraping Ajax Sites

1,025
-1

Published on

Challenges with crawling AJAX pages on the web and their solutions.

Published in: Technology
1 Comment
0 Likes
Statistics
Notes
  • How do you deal with JavaScript Invocation Graphs and Hot Call Conjunctions Queries? these I feel are some real AJAX crawl problems.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total Views
1,025
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
8
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Web Crawling- Scraping Ajax Sites

  1. 1. Scraping AJAX Pages Big Data made small
  2. 2. What’s AJAX on a web page? 1. Filters 2. Load more results 3. Forms and others...
  3. 3. GET vs. POST Client Server Client Server GET POST http://example.com?date=20140410 http://example.com Payload Form Data, JSON Strings, Query Parameters, View States, etc.
  4. 4. What makes crawling AJAX difficult?
  5. 5. Challenge 1- Javascript Calls Solution- Emulate Javascript calls using headless browsers Data fetched from under Javascript code
  6. 6. Challenge 2- Fetch Bandwidths Solution- Optimize fetch limits Incomplete page fetched because of low fetch age Image Credit: ticketmaster.com
  7. 7. Challenge 3- .NET Architectures Solution- Track states, pass event validations, restore states for mitigation Viewstate
  8. 8. Challenge 4- Page Encoding Solution- Send request (content type, media type, accept field parameters) and parse responses in same format as expected by server
  9. 9. Use Case- Crawl Ticketing Sites
  10. 10. Thank You! Have specific queries on AJAX crawling? Reach out to info@promptcloud.com.

×