Advertisement

All you need to know about crawlers

Apr. 15, 2023
Advertisement

More Related Content

Similar to All you need to know about crawlers(20)

Recently uploaded(20)

Advertisement

All you need to know about crawlers

  1. All you need to know about crawlers Noemi Ferrera @TheTextLynx
  2. All you need to know about crawlers
  3. All you need to know about crawlers
  4. All you need to know about crawlers
  5. About me ● Currently working @Amazon Disclaimer: I am not representing Amazon, not talking about anything to do with my current, previous or future experience within Amazon. Development and testing professionally since 2009 IBM, Microsoft, Dell, Netease… ● Over 20 presentations worldwide ● Author of the book “How to Test a Time Machine” ● Contact: https://thetestlynx.wordpress.com @thetestlynx in twitter Noemi Ferrera
  6. Agenda ● What’s a crawler? ● Why and when do we need a crawler? ● Types of crawlers ● Components of a crawler ○ View/node ○ Arcs/links ○ Visited storage ○ Heat map ● Example ● What can go wrong ● What you need to succeed All you need to know about crawlers…
  7. What’s a crawler? A crawler is an automatic system that iterates throughout the parts of an application, with the objective of finding issues or explore it. ..Can be a web application, but also other types of applications. Definition
  8. Why and when … ● Discovery testing ● Finding particular common issues (ex. 404) ● Quick coverage ● Generally runs in production - or pre-prod (late) …do we need a crawler?
  9. Types of Crawlers ● UI vs API ● View First vs Arc First ● Exhaustive vs Shortcutted ● Random vs Smart
  10. Types of crawlers UI VS API UI API Uses UI to navigate through the application Uses API to navigate through the application Closer to user’s behaviour Faster to run Checks elements, not only links Focus mostly on links and API points
  11. Types of crawlers View first VS Arc first View First Arc First Focuses first on the view, then navigates Focuses first on the navigation, then check the view Better when the application has many checks but does not have too much navigation Better when views have few things to check but long list of navigation points
  12. Types of crawlers Exhaustive vs Shortcutted Exhaustive Shortcutted Aims to visit the entire application Stops after a number of visits Better for smaller applications or have a lot of time to cover it all Better if the application is too big, and not enough time to cover it all Might make too many calls or take too long finding issues Might end up after visiting the important parts of the application
  13. Types of crawlers Random vs Smart Random Smart Could be partially random Uses some logic to give priority to parts of the application Likely needs to be shortcutted Might end up after visiting the important parts of the application
  14. Components of a crawler ● View/node ● Arcs/links ● Visited storage ● Heat map Key concepts
  15. Components of a crawler ● View/node ● Arcs/links ● Visited storage ● Heat map Key concepts
  16. Components of a crawler ● How to tell when you are in a different one? ○ Website: URL ○ External links - Avoid navigation ○ Games or harder apps View/Node
  17. Components of a crawler ● View/node ● Arcs/links ● Visited storage ● Heat map Key concepts
  18. Components of a crawler ● How to navigate? ○ Clicks ○ API calls ○ Swiping and other actions ○ VR apps - other interactions ● Clickable objects? ○ Websites - href ○ All dom objects ■ Containers? ○ Moving/Changing objects Arcs/links
  19. Components of a crawler ● How to navigate? ○ Clicks ○ API calls ○ Swiping and other actions ○ VR apps - other interactions ● Clickable objects? ○ Websites - href ○ All dom objects ■ Containers? ○ Moving/Changing objects Arcs/links
  20. Components of a crawler ● How to navigate? ○ Clicks ○ API calls ○ Swiping and other actions ○ VR apps - other interactions ● Clickable objects? ○ Websites - href ○ All dom objects ■ Containers? ○ Moving/Changing objects Arcs/links
  21. Components of a crawler ● How to navigate? ○ Clicks ○ API calls ○ Swiping and other actions ○ VR apps - other interactions ● Clickable objects? ○ Websites - href ○ All dom objects ■ Containers? ○ Moving/Changing objects ○ Dynanism ○ Hidden elements? Arcs/links
  22. Components of a crawler ● View/node ● Arcs/links ● Visited storage ● Heat map Key concepts index.html
  23. Components of a crawler ● View/node ● Arcs/links ● Visited storage ● Heat map Key concepts index.html Second view
  24. Components of a crawler ● View/node ● Arcs/links ● Visited storage ● Heat map Key concepts
  25. Components of a crawler ● By usage ● By issues found ● By novelty ● Others Heat map
  26. Crawling with Selenium Class WebCrawlerSelenium: def __init__(self): driver = webdriver.Chrome(...) self.top_level = 10 url = https://www.selenium.dev driver.get(url) view = view_class.ViewClass(url) self.explore(view, [], 0 driver) driver.close() Example Start the crawler Explore the first view
  27. Crawling with Selenium def explore(self, view, visited, current_level, driver): current_level = current_level + 1 if current_level >= self.top_level: sys.exit(“Max visit reached”) visit.append(view) check_status(node.url) If view.count == -1: view.count = 0 get_all_href(view) # adds view.count while view.count > 0: get_next_view(view) Example cont… Explore each level Initialize the linked views
  28. Crawling with Selenium def check_status(self, node): status_code = requests.get(url).status_code if status_code < 200 or status_code >= 400: sys.exit(“Error on url” + url) Example cont 2 … Check status with API
  29. Crawling with Selenium def get_all_href(self, view): for a_tag in driver.find_element(By.TAG_NAME, ‘a’): view.count = view.count + 1 href = a_tag.get_attribute(‘href’) view.actions[href] = a_tag.get_dom_attribute(‘href’) Example cont 3 … Get all references for the node Finding by a tag Add to actions
  30. Crawling with Selenium def get_next_view(self, view, visited): sub_url = view.actions.last() count = len(view.actions) while sub_url in visited and count > 0: sub_url = view.actions[count] count = count - 1 if count == 0: return subview = view_class.ViewClass(sub_url) self.try_click(sub_url, driver) # ui navigation, API - requests.get Example cont 4 … Initialize the view Get all the urls Click next action Explore next
  31. Crawling with Selenium def try_click(self, href, driver): xpath= ('//a[@href="'+href+'"]') try: element = driver.find_element(By.XPATH, xpath) element.click() except Exception: print(“Could not find the xpath”) Example cont 5 … Tries to click the element We could add here other actions
  32. What could go wrong ● How to identify views? (Already covered) ○ External links? ○ Keep track of visited ○ Top level ● How to identify navigation points/arcs? (already covered) ○ Partial vs full hrefs
  33. What could go wrong ● How to identify views? ○ External links? ○ Keep track of visited ○ Top level ● How to identify navigation points/arcs? ○ Partial vs full hrefs ● Forms, ex. login
  34. What could go wrong ● How to identify views? (Already covered) ○ External links? ○ Keep track of visited ○ Top level ● How to identify navigation points/arcs? (already covered) ○ Partial vs full hrefs ● Forms, ex. login ● Pop-ups
  35. What could go wrong ● How to identify views? (Already covered) ○ External links? ○ Keep track of visited ○ Top level ● How to identify navigation points/arcs? (already covered) ○ Partial vs full hrefs ● Forms, ex. login ● Pop-ups ● Cookies ● Dynamic objects ● Stale links
  36. What you need to succeed ● Know: graph, trees, types of traversals ○ Tracking visited nodes ● App knowledge ○ Experience or tool (head map generator…) ● What type of issues are you looking for? ○ API? UI? When do they happen? ● Make sure you cannot cover these with other testing!!!
  37. Summary ● What’s a crawler? ● Why and when do we need a crawler? ○ Discovery ○ Common issues ○ Quick coverage ● Types of crawlers ● Components of a crawler ● Example ● What can go wrong ● What you need to succeed
  38. Summary ● What’s a crawler? ● Why and when do we need a crawler? ● Types of crawlers ○ UI/API/MIXED ○ VIEW FIRST / DEPTH FIRST ○ EXHAUSTIVE / SHORTCUTTED ○ RANDOM / SMART ● Components of a crawler ● Example ● What can go wrong ● What you need to succeed
  39. Summary ● What’s a crawler? ● Why and when do we need a crawler? ● Types of crawlers ● Components of a crawler ○ View/node ○ Arcs/links ○ Visited storage ○ Heat map ● Example ● What can go wrong ● What you need to succeed
  40. Summary ● What’s a crawler? ● Why and when do we need a crawler? ● Types of crawlers ● Components of a crawler ● Example ● What can go wrong ● What you need to succeed
  41. Thank you! https://thetestlynx.wordpress.com @thetestlynx twitter Noemi Ferrera

Editor's Notes

  1. In this presentation we will see a web app example
  2. VR - looking for a while could be an action
Advertisement