Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Scraping AJAX
Pages
Big Data made small
What’s AJAX on a web page?
1. Filters 2. Load
more results
3. Forms
and others...
GET vs. POST
Client Server
Client Server
GET
POST
http://example.com?date=20140410
http://example.com
Payload
Form Data, J...
What makes crawling AJAX difficult?
Challenge 1- Javascript Calls
Solution- Emulate Javascript calls using headless browsers
Data fetched
from under
Javascrip...
Challenge 2- Fetch Bandwidths
Solution-
Optimize fetch limits
Incomplete page fetched
because of low fetch age
Image Credi...
Challenge 3- .NET Architectures
Solution- Track states, pass event validations, restore states for
mitigation
Viewstate
Challenge 4- Page Encoding
Solution- Send request (content type, media type,
accept field parameters) and parse responses ...
Use Case- Crawl Ticketing Sites
Thank You!
Have specific queries on AJAX crawling?
Reach out to info@promptcloud.com.
Upcoming SlideShare
Loading in …5
×

Web Crawling- Scraping Ajax Sites

2,491 views

Published on

Challenges with crawling AJAX pages on the web and their solutions.

Published in: Technology
  • ★★ How Long Does She Want You to Last? ★★ A recent study proved that the average man lasts just 2-5 minutes in bed (during intercourse). The study also showed that many women need at least 7-10 minutes of intercourse to reach "The Big O" - and, worse still... 30% of women never get there during intercourse. Clearly, most men are NOT fulfilling there women's needs in bed. Now, as I've said many times - how long you can last is no guarantee of being a GREAT LOVER. But, not being able to last 20, 30 minutes or more, is definitely a sign that you're not going to "set your woman's world on fire" between the sheets. Question is: "What can you do to last longer?" Well, one of the best recommendations I can give you today is to read THIS report. In it, you'll discover a detailed guide to an Ancient Taoist Thrusting Technique that can help any man to last much longer in bed. I can vouch 100% for the technique because my husband has been using it for years :) Here's the link to the report ◆◆◆ https://tinyurl.com/rockhardxx
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • How do you deal with JavaScript Invocation Graphs and Hot Call Conjunctions Queries? these I feel are some real AJAX crawl problems.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Web Crawling- Scraping Ajax Sites

  1. 1. Scraping AJAX Pages Big Data made small
  2. 2. What’s AJAX on a web page? 1. Filters 2. Load more results 3. Forms and others...
  3. 3. GET vs. POST Client Server Client Server GET POST http://example.com?date=20140410 http://example.com Payload Form Data, JSON Strings, Query Parameters, View States, etc.
  4. 4. What makes crawling AJAX difficult?
  5. 5. Challenge 1- Javascript Calls Solution- Emulate Javascript calls using headless browsers Data fetched from under Javascript code
  6. 6. Challenge 2- Fetch Bandwidths Solution- Optimize fetch limits Incomplete page fetched because of low fetch age Image Credit: ticketmaster.com
  7. 7. Challenge 3- .NET Architectures Solution- Track states, pass event validations, restore states for mitigation Viewstate
  8. 8. Challenge 4- Page Encoding Solution- Send request (content type, media type, accept field parameters) and parse responses in same format as expected by server
  9. 9. Use Case- Crawl Ticketing Sites
  10. 10. Thank You! Have specific queries on AJAX crawling? Reach out to info@promptcloud.com.

×