Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
INTERNAL WEB-CRAWLER
Case study of Spider project
OVERVIEW
• An internal web crawler for acquisition team which spiders the web to collect
specific information based on the...
CHALLENGES
• Outdated and closed ads, offers, notices, invitations, referrals etc information
control over the web had bec...
SOLUTION
• Implemented web crawler mechanism to crawl the web using most renowned
search engines to automatically search k...
RESULTS
• After this implementation and tool’s usage, complaints from respective
parties reduced drastically (about 90%)
•...
Upcoming SlideShare
Loading in …5
×

AMS Spider

Atlas Systems case study

  • Login to see the comments

  • Be the first to like this

AMS Spider

  1. 1. INTERNAL WEB-CRAWLER Case study of Spider project
  2. 2. OVERVIEW • An internal web crawler for acquisition team which spiders the web to collect specific information based on the keyword and URL seeds • The collected information is then stored to database as tickets with screenshots of the identified target pages which is later reviewed and closed by the acquisition team • It consists of admin tool interfaces to manage the seeds and target data with ticket management too
  3. 3. CHALLENGES • Outdated and closed ads, offers, notices, invitations, referrals etc information control over the web had become imperative and humongous task • Legal issues and notices were huge due to incomparable task of covering entire web to ensure compliance of agreement and terms with affiliates / 3rd party service providers, carriers & integrators etc. • Slippage and delays in manual activity was resulting in hefty fines and penalties. Sometimes even to the extent of legal suites • Though manually identified and resolved, at times suspended or dead websites coming alive was making the manual task more inconsistent and erroneous resulting in repetitive / redundant job for dedicated teams while adding cost overheads • Issue identification and resolve before complaints raised couldn’t be achieved for long time
  4. 4. SOLUTION • Implemented web crawler mechanism to crawl the web using most renowned search engines to automatically search keywords, campaigns, affiliate ads and outdated information configured via specifically developed admin / management tool • Search result parsing and depth traversing • Multi threaded engine with parallel jobs running with high accuracy (avoiding redundant results and tickets) • Proofing, analyzing and validation of content in site / page • Filtered search result storage with stitched hierarchy of resulting pages (depth) • Capture identified problematic web page as full length image /screenshots (including the scrollable/hidden content) • Ticket creation with found result and email alerts / notifications • Post ticket closure, recheck for specified duration for reappearance based on set status and configurations
  5. 5. RESULTS • After this implementation and tool’s usage, complaints from respective parties reduced drastically (about 90%) • Any such violations (if any) were notified to business teams on an immediate basis for further actions (even before someone could raise objections) • Business had valid and authentic campaigns/Ads/Referrals/Offers etc information over the web – this avoided end user confusions and also helped to up-hold all 3rd parties interest by being within agreed terms and compliance • Helped customer to be in good terms with all associated external parties. • Reduced cost overhead of maintaining big teams to perform these activities manually • Increased efficiency of the internal arbitration teams with valid records and proofs • TECHNOLOGY • Python QT4 (headless browser tools), JAVA, PHP, MySQL, JQuery, Plugins, JSON etc

×