SlideShare a Scribd company logo
All you need to know about
crawlers
Noemi Ferrera
@TheTextLynx
All you need to know
about crawlers
All you need to know
about crawlers
All you need to know
about crawlers
About me
● Currently working @Amazon
Disclaimer: I am not representing Amazon, not talking about anything
to do with my current, previous or future experience within Amazon.
Development and testing professionally since 2009
IBM, Microsoft, Dell, Netease…
● Over 20 presentations worldwide
● Author of the book “How to Test a Time Machine”
● Contact:
https://thetestlynx.wordpress.com
@thetestlynx in twitter
Noemi Ferrera
Agenda
● What’s a crawler?
● Why and when do we need a crawler?
● Types of crawlers
● Components of a crawler
○ View/node
○ Arcs/links
○ Visited storage
○ Heat map
● Example
● What can go wrong
● What you need to succeed
All you need to know about crawlers…
What’s a crawler?
A crawler is an automatic system that iterates throughout the parts of an
application, with the objective of finding issues or explore it.
..Can be a web application, but also other types of applications.
Definition
Why and when …
● Discovery testing
● Finding particular common issues (ex. 404)
● Quick coverage
● Generally runs in production - or pre-prod (late)
…do we need a crawler?
Types of Crawlers
● UI vs API
● View First vs Arc First
● Exhaustive vs Shortcutted
● Random vs Smart
Types of crawlers
UI VS API
UI API
Uses UI to navigate through the application Uses API to navigate through the application
Closer to user’s behaviour Faster to run
Checks elements, not only links Focus mostly on links and API points
Types of crawlers
View first VS Arc first
View First Arc First
Focuses first on the view, then navigates Focuses first on the navigation, then check
the view
Better when the application has many checks but
does not have too much navigation
Better when views have few things to check
but long list of navigation points
Types of crawlers
Exhaustive vs Shortcutted
Exhaustive Shortcutted
Aims to visit the entire application Stops after a number of visits
Better for smaller applications or have a lot of time
to cover it all
Better if the application is too big, and not
enough time to cover it all
Might make too many calls or take too long finding
issues
Might end up after visiting the important parts
of the application
Types of crawlers
Random vs Smart
Random Smart
Could be partially random Uses some logic to give priority to parts of
the application
Likely needs to be shortcutted Might end up after visiting the important parts
of the application
Components of a
crawler
● View/node
● Arcs/links
● Visited storage
● Heat map
Key concepts
Components of a
crawler
● View/node
● Arcs/links
● Visited storage
● Heat map
Key concepts
Components of a crawler
● How to tell when you are in a different one?
○ Website: URL
○ External links - Avoid navigation
○ Games or harder apps
View/Node
Components of a
crawler
● View/node
● Arcs/links
● Visited storage
● Heat map
Key concepts
Components of a crawler
● How to navigate?
○ Clicks
○ API calls
○ Swiping and other actions
○ VR apps - other interactions
● Clickable objects?
○ Websites - href
○ All dom objects
■ Containers?
○ Moving/Changing objects
Arcs/links
Components of a crawler
● How to navigate?
○ Clicks
○ API calls
○ Swiping and other actions
○ VR apps - other interactions
● Clickable objects?
○ Websites - href
○ All dom objects
■ Containers?
○ Moving/Changing objects
Arcs/links
Components of a crawler
● How to navigate?
○ Clicks
○ API calls
○ Swiping and other actions
○ VR apps - other interactions
● Clickable objects?
○ Websites - href
○ All dom objects
■ Containers?
○ Moving/Changing objects
Arcs/links
Components of a crawler
● How to navigate?
○ Clicks
○ API calls
○ Swiping and other actions
○ VR apps - other interactions
● Clickable objects?
○ Websites - href
○ All dom objects
■ Containers?
○ Moving/Changing objects
○ Dynanism
○ Hidden elements?
Arcs/links
Components of a
crawler
● View/node
● Arcs/links
● Visited storage
● Heat map
Key concepts
index.html
Components of a
crawler
● View/node
● Arcs/links
● Visited storage
● Heat map
Key concepts
index.html Second view
Components of a
crawler
● View/node
● Arcs/links
● Visited storage
● Heat map
Key concepts
Components of a crawler
● By usage
● By issues found
● By novelty
● Others
Heat map
Crawling with Selenium
Class WebCrawlerSelenium:
def __init__(self):
driver = webdriver.Chrome(...)
self.top_level = 10
url = https://www.selenium.dev
driver.get(url)
view = view_class.ViewClass(url)
self.explore(view, [], 0 driver)
driver.close()
Example
Start the crawler
Explore the first view
Crawling with Selenium
def explore(self, view, visited, current_level, driver):
current_level = current_level + 1
if current_level >= self.top_level:
sys.exit(“Max visit reached”)
visit.append(view)
check_status(node.url)
If view.count == -1:
view.count = 0
get_all_href(view) # adds view.count
while view.count > 0:
get_next_view(view)
Example cont…
Explore each level
Initialize the linked views
Crawling with Selenium
def check_status(self, node):
status_code =
requests.get(url).status_code
if status_code < 200 or status_code >=
400:
sys.exit(“Error on url” + url)
Example cont 2 …
Check status with API
Crawling with Selenium
def get_all_href(self, view):
for a_tag in driver.find_element(By.TAG_NAME, ‘a’):
view.count = view.count + 1
href = a_tag.get_attribute(‘href’)
view.actions[href] =
a_tag.get_dom_attribute(‘href’)
Example cont 3 …
Get all references for the node
Finding by a tag
Add to actions
Crawling with Selenium
def get_next_view(self, view, visited):
sub_url = view.actions.last()
count = len(view.actions)
while sub_url in visited and count > 0:
sub_url = view.actions[count]
count = count - 1
if count == 0:
return
subview = view_class.ViewClass(sub_url)
self.try_click(sub_url, driver) # ui navigation, API -
requests.get
Example cont 4 …
Initialize the view
Get all the urls
Click next action
Explore next
Crawling with Selenium
def try_click(self, href, driver):
xpath= ('//a[@href="'+href+'"]')
try:
element =
driver.find_element(By.XPATH, xpath)
element.click()
except Exception:
print(“Could not find the xpath”)
Example cont 5 …
Tries to click the
element
We could add here
other actions
What could go wrong
● How to identify views? (Already covered)
○ External links?
○ Keep track of visited
○ Top level
● How to identify navigation points/arcs? (already covered)
○ Partial vs full hrefs
What could go wrong
● How to identify views?
○ External links?
○ Keep track of visited
○ Top level
● How to identify navigation points/arcs?
○ Partial vs full hrefs
● Forms, ex. login
What could go wrong
● How to identify views? (Already covered)
○ External links?
○ Keep track of visited
○ Top level
● How to identify navigation points/arcs? (already covered)
○ Partial vs full hrefs
● Forms, ex. login
● Pop-ups
What could go wrong
● How to identify views? (Already covered)
○ External links?
○ Keep track of visited
○ Top level
● How to identify navigation points/arcs? (already covered)
○ Partial vs full hrefs
● Forms, ex. login
● Pop-ups
● Cookies
● Dynamic objects
● Stale links
What you need to succeed
● Know: graph, trees, types of traversals
○ Tracking visited nodes
● App knowledge
○ Experience or tool (head map generator…)
● What type of issues are you looking for?
○ API? UI? When do they happen?
● Make sure you cannot cover these with other testing!!!
Summary
● What’s a crawler?
● Why and when do we need a crawler?
○ Discovery
○ Common issues
○ Quick coverage
● Types of crawlers
● Components of a crawler
● Example
● What can go wrong
● What you need to succeed
Summary
● What’s a crawler?
● Why and when do we need a crawler?
● Types of crawlers
○ UI/API/MIXED
○ VIEW FIRST / DEPTH FIRST
○ EXHAUSTIVE / SHORTCUTTED
○ RANDOM / SMART
● Components of a crawler
● Example
● What can go wrong
● What you need to succeed
Summary
● What’s a crawler?
● Why and when do we need a crawler?
● Types of crawlers
● Components of a crawler
○ View/node
○ Arcs/links
○ Visited storage
○ Heat map
● Example
● What can go wrong
● What you need to succeed
Summary
● What’s a crawler?
● Why and when do we need a crawler?
● Types of crawlers
● Components of a crawler
● Example
● What can go wrong
● What you need to succeed
Thank you!
https://thetestlynx.wordpress.com
@thetestlynx twitter
Noemi Ferrera
All you need to know about crawlers
All you need to know about crawlers

More Related Content

What's hot

Revolutionizing Marketing - Harnessing the Power of AI - Michael Letschin, Brevo
Revolutionizing Marketing - Harnessing the Power of AI - Michael Letschin, BrevoRevolutionizing Marketing - Harnessing the Power of AI - Michael Letschin, Brevo
Revolutionizing Marketing - Harnessing the Power of AI - Michael Letschin, Brevo
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Yapay Zeka Güvenliği : Machine Learning & Deep Learning & Computer Vision Sec...
Yapay Zeka Güvenliği : Machine Learning & Deep Learning & Computer Vision Sec...Yapay Zeka Güvenliği : Machine Learning & Deep Learning & Computer Vision Sec...
Yapay Zeka Güvenliği : Machine Learning & Deep Learning & Computer Vision Sec...
Cihan Özhan
 
Why Aren't You Using Git Flow?
Why Aren't You Using Git Flow?Why Aren't You Using Git Flow?
Why Aren't You Using Git Flow?
John Congdon
 
Docker Container Introduction
Docker Container IntroductionDocker Container Introduction
Docker Container Introduction
Innfinision Cloud and BigData Solutions
 
GitOps A/B testing with Istio and Helm
GitOps A/B testing with Istio and HelmGitOps A/B testing with Istio and Helm
GitOps A/B testing with Istio and Helm
Weaveworks
 
Monitoring at the Speed of DevOps
Monitoring at the Speed of DevOpsMonitoring at the Speed of DevOps
Monitoring at the Speed of DevOps
DevOps.com
 
커뮤니티 빌더를 아시나요? - 윤평호(AWSKRUG) :: AWS Community Day Online 2021
커뮤니티 빌더를 아시나요? - 윤평호(AWSKRUG) :: AWS Community Day Online 2021커뮤니티 빌더를 아시나요? - 윤평호(AWSKRUG) :: AWS Community Day Online 2021
커뮤니티 빌더를 아시나요? - 윤평호(AWSKRUG) :: AWS Community Day Online 2021
AWSKRUG - AWS한국사용자모임
 
The Three Things You Need to Know to Transform Any Size Organization Into an ...
The Three Things You Need to Know to Transform Any Size Organization Into an ...The Three Things You Need to Know to Transform Any Size Organization Into an ...
The Three Things You Need to Know to Transform Any Size Organization Into an ...
Mike Cottmeyer
 
Maven Zero to Hero with AWS CodeCommit, CodeArtifact, ECR, OWASP Dependency ...
Maven Zero to Hero with  AWS CodeCommit, CodeArtifact, ECR, OWASP Dependency ...Maven Zero to Hero with  AWS CodeCommit, CodeArtifact, ECR, OWASP Dependency ...
Maven Zero to Hero with AWS CodeCommit, CodeArtifact, ECR, OWASP Dependency ...
Ravi Soni
 
Agile leadership for the future
Agile leadership for the futureAgile leadership for the future
Agile leadership for the future
Nasima Shafiul
 
BitBucket presentation
BitBucket presentationBitBucket presentation
BitBucket presentation
Jonathan Lawerh
 
Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved
Peter Antman
 
Kubernetes networking & Security
Kubernetes networking & SecurityKubernetes networking & Security
Kubernetes networking & Security
Vietnam Open Infrastructure User Group
 
Intro to git and git hub
Intro to git and git hubIntro to git and git hub
Intro to git and git hub
JasleenSondhi
 
5 dicas para estruturar seu teste de performance
5 dicas para estruturar seu teste de performance5 dicas para estruturar seu teste de performance
5 dicas para estruturar seu teste de performance
Ariane Izac
 
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Janusz Nowak
 
Engineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous DeliveryEngineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous Delivery
Mike McGarr
 
Building the content machine
Building the content machine Building the content machine
Building the content machine
Michael King
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
William Lyon
 
The Art of Building a Roadmap
The Art of Building a RoadmapThe Art of Building a Roadmap
The Art of Building a Roadmap
Atlassian
 

What's hot (20)

Revolutionizing Marketing - Harnessing the Power of AI - Michael Letschin, Brevo
Revolutionizing Marketing - Harnessing the Power of AI - Michael Letschin, BrevoRevolutionizing Marketing - Harnessing the Power of AI - Michael Letschin, Brevo
Revolutionizing Marketing - Harnessing the Power of AI - Michael Letschin, Brevo
 
Yapay Zeka Güvenliği : Machine Learning & Deep Learning & Computer Vision Sec...
Yapay Zeka Güvenliği : Machine Learning & Deep Learning & Computer Vision Sec...Yapay Zeka Güvenliği : Machine Learning & Deep Learning & Computer Vision Sec...
Yapay Zeka Güvenliği : Machine Learning & Deep Learning & Computer Vision Sec...
 
Why Aren't You Using Git Flow?
Why Aren't You Using Git Flow?Why Aren't You Using Git Flow?
Why Aren't You Using Git Flow?
 
Docker Container Introduction
Docker Container IntroductionDocker Container Introduction
Docker Container Introduction
 
GitOps A/B testing with Istio and Helm
GitOps A/B testing with Istio and HelmGitOps A/B testing with Istio and Helm
GitOps A/B testing with Istio and Helm
 
Monitoring at the Speed of DevOps
Monitoring at the Speed of DevOpsMonitoring at the Speed of DevOps
Monitoring at the Speed of DevOps
 
커뮤니티 빌더를 아시나요? - 윤평호(AWSKRUG) :: AWS Community Day Online 2021
커뮤니티 빌더를 아시나요? - 윤평호(AWSKRUG) :: AWS Community Day Online 2021커뮤니티 빌더를 아시나요? - 윤평호(AWSKRUG) :: AWS Community Day Online 2021
커뮤니티 빌더를 아시나요? - 윤평호(AWSKRUG) :: AWS Community Day Online 2021
 
The Three Things You Need to Know to Transform Any Size Organization Into an ...
The Three Things You Need to Know to Transform Any Size Organization Into an ...The Three Things You Need to Know to Transform Any Size Organization Into an ...
The Three Things You Need to Know to Transform Any Size Organization Into an ...
 
Maven Zero to Hero with AWS CodeCommit, CodeArtifact, ECR, OWASP Dependency ...
Maven Zero to Hero with  AWS CodeCommit, CodeArtifact, ECR, OWASP Dependency ...Maven Zero to Hero with  AWS CodeCommit, CodeArtifact, ECR, OWASP Dependency ...
Maven Zero to Hero with AWS CodeCommit, CodeArtifact, ECR, OWASP Dependency ...
 
Agile leadership for the future
Agile leadership for the futureAgile leadership for the future
Agile leadership for the future
 
BitBucket presentation
BitBucket presentationBitBucket presentation
BitBucket presentation
 
Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved
 
Kubernetes networking & Security
Kubernetes networking & SecurityKubernetes networking & Security
Kubernetes networking & Security
 
Intro to git and git hub
Intro to git and git hubIntro to git and git hub
Intro to git and git hub
 
5 dicas para estruturar seu teste de performance
5 dicas para estruturar seu teste de performance5 dicas para estruturar seu teste de performance
5 dicas para estruturar seu teste de performance
 
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
 
Engineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous DeliveryEngineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous Delivery
 
Building the content machine
Building the content machine Building the content machine
Building the content machine
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
 
The Art of Building a Roadmap
The Art of Building a RoadmapThe Art of Building a Roadmap
The Art of Building a Roadmap
 

Similar to All you need to know about crawlers

Scraping the web with Laravel, Dusk, Docker, and PHP
Scraping the web with Laravel, Dusk, Docker, and PHPScraping the web with Laravel, Dusk, Docker, and PHP
Scraping the web with Laravel, Dusk, Docker, and PHP
Paul Redmond
 
Automation Abstractions: Page Objects and Beyond
Automation Abstractions: Page Objects and BeyondAutomation Abstractions: Page Objects and Beyond
Automation Abstractions: Page Objects and Beyond
TechWell
 
4-Step SEO Waltz: Tackle SEO Challenges Head-On
4-Step SEO Waltz: Tackle SEO Challenges Head-On4-Step SEO Waltz: Tackle SEO Challenges Head-On
4-Step SEO Waltz: Tackle SEO Challenges Head-On
Search Engine Journal
 
Architecting single-page front-end apps
Architecting single-page front-end appsArchitecting single-page front-end apps
Architecting single-page front-end apps
Zohar Arad
 
From Website To Webapp Shane Morris
From Website To Webapp   Shane MorrisFrom Website To Webapp   Shane Morris
From Website To Webapp Shane Morris
Shane Morris
 
Hands on Exploration of Page Objects and Abstraction Layers with Selenium Web...
Hands on Exploration of Page Objects and Abstraction Layers with Selenium Web...Hands on Exploration of Page Objects and Abstraction Layers with Selenium Web...
Hands on Exploration of Page Objects and Abstraction Layers with Selenium Web...
Alan Richardson
 
Introduction to AngularJs
Introduction to AngularJsIntroduction to AngularJs
Introduction to AngularJs
murtazahaveliwala
 
SearchEngines.pdf
SearchEngines.pdfSearchEngines.pdf
SearchEngines.pdf
ssuserc6b7571
 
BDD with SpecFlow and Selenium
BDD with SpecFlow and SeleniumBDD with SpecFlow and Selenium
BDD with SpecFlow and Selenium
Liraz Shay
 
From Back to Front: Rails To React Family
From Back to Front: Rails To React FamilyFrom Back to Front: Rails To React Family
From Back to Front: Rails To React Family
Khor SoonHin
 
Destination Documentation: How Not to Get Lost in Your Org
Destination Documentation: How Not to Get Lost in Your OrgDestination Documentation: How Not to Get Lost in Your Org
Destination Documentation: How Not to Get Lost in Your Org
csupilowski
 
Peeling the Onion: Making Sense of the Layers of API Security
Peeling the Onion: Making Sense of the Layers of API SecurityPeeling the Onion: Making Sense of the Layers of API Security
Peeling the Onion: Making Sense of the Layers of API Security
Matt Tesauro
 
Node.js Course 2 of 2 - Advanced techniques
Node.js Course 2 of 2 - Advanced techniquesNode.js Course 2 of 2 - Advanced techniques
Node.js Course 2 of 2 - Advanced techniques
Manuel Eusebio de Paz Carmona
 
End to-End SPA Development Using ASP.NET and AngularJS
End to-End SPA Development Using ASP.NET and AngularJSEnd to-End SPA Development Using ASP.NET and AngularJS
End to-End SPA Development Using ASP.NET and AngularJS
Gil Fink
 
Angular js recommended practices - mini
Angular js   recommended practices - miniAngular js   recommended practices - mini
Angular js recommended practices - mini
Rasheed Waraich
 
Measure everything you can
Measure everything you canMeasure everything you can
Measure everything you can
Ricardo Bánffy
 
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - PanoraysHacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Demi Ben-Ari
 
Angular basicschat
Angular basicschatAngular basicschat
Angular basicschatYu Jin
 
Dreamforce 2017 - Up close and personal with Lightning Experience as Platform
Dreamforce 2017 - Up close and personal with Lightning Experience as PlatformDreamforce 2017 - Up close and personal with Lightning Experience as Platform
Dreamforce 2017 - Up close and personal with Lightning Experience as Platform
andyinthecloud
 
Philip Shurpik "Architecting React Native app"
Philip Shurpik "Architecting React Native app"Philip Shurpik "Architecting React Native app"
Philip Shurpik "Architecting React Native app"
Fwdays
 

Similar to All you need to know about crawlers (20)

Scraping the web with Laravel, Dusk, Docker, and PHP
Scraping the web with Laravel, Dusk, Docker, and PHPScraping the web with Laravel, Dusk, Docker, and PHP
Scraping the web with Laravel, Dusk, Docker, and PHP
 
Automation Abstractions: Page Objects and Beyond
Automation Abstractions: Page Objects and BeyondAutomation Abstractions: Page Objects and Beyond
Automation Abstractions: Page Objects and Beyond
 
4-Step SEO Waltz: Tackle SEO Challenges Head-On
4-Step SEO Waltz: Tackle SEO Challenges Head-On4-Step SEO Waltz: Tackle SEO Challenges Head-On
4-Step SEO Waltz: Tackle SEO Challenges Head-On
 
Architecting single-page front-end apps
Architecting single-page front-end appsArchitecting single-page front-end apps
Architecting single-page front-end apps
 
From Website To Webapp Shane Morris
From Website To Webapp   Shane MorrisFrom Website To Webapp   Shane Morris
From Website To Webapp Shane Morris
 
Hands on Exploration of Page Objects and Abstraction Layers with Selenium Web...
Hands on Exploration of Page Objects and Abstraction Layers with Selenium Web...Hands on Exploration of Page Objects and Abstraction Layers with Selenium Web...
Hands on Exploration of Page Objects and Abstraction Layers with Selenium Web...
 
Introduction to AngularJs
Introduction to AngularJsIntroduction to AngularJs
Introduction to AngularJs
 
SearchEngines.pdf
SearchEngines.pdfSearchEngines.pdf
SearchEngines.pdf
 
BDD with SpecFlow and Selenium
BDD with SpecFlow and SeleniumBDD with SpecFlow and Selenium
BDD with SpecFlow and Selenium
 
From Back to Front: Rails To React Family
From Back to Front: Rails To React FamilyFrom Back to Front: Rails To React Family
From Back to Front: Rails To React Family
 
Destination Documentation: How Not to Get Lost in Your Org
Destination Documentation: How Not to Get Lost in Your OrgDestination Documentation: How Not to Get Lost in Your Org
Destination Documentation: How Not to Get Lost in Your Org
 
Peeling the Onion: Making Sense of the Layers of API Security
Peeling the Onion: Making Sense of the Layers of API SecurityPeeling the Onion: Making Sense of the Layers of API Security
Peeling the Onion: Making Sense of the Layers of API Security
 
Node.js Course 2 of 2 - Advanced techniques
Node.js Course 2 of 2 - Advanced techniquesNode.js Course 2 of 2 - Advanced techniques
Node.js Course 2 of 2 - Advanced techniques
 
End to-End SPA Development Using ASP.NET and AngularJS
End to-End SPA Development Using ASP.NET and AngularJSEnd to-End SPA Development Using ASP.NET and AngularJS
End to-End SPA Development Using ASP.NET and AngularJS
 
Angular js recommended practices - mini
Angular js   recommended practices - miniAngular js   recommended practices - mini
Angular js recommended practices - mini
 
Measure everything you can
Measure everything you canMeasure everything you can
Measure everything you can
 
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - PanoraysHacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
 
Angular basicschat
Angular basicschatAngular basicschat
Angular basicschat
 
Dreamforce 2017 - Up close and personal with Lightning Experience as Platform
Dreamforce 2017 - Up close and personal with Lightning Experience as PlatformDreamforce 2017 - Up close and personal with Lightning Experience as Platform
Dreamforce 2017 - Up close and personal with Lightning Experience as Platform
 
Philip Shurpik "Architecting React Native app"
Philip Shurpik "Architecting React Native app"Philip Shurpik "Architecting React Native app"
Philip Shurpik "Architecting React Native app"
 

Recently uploaded

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 

Recently uploaded (20)

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 

All you need to know about crawlers

  • 1.
  • 2.
  • 3. All you need to know about crawlers Noemi Ferrera @TheTextLynx
  • 4. All you need to know about crawlers
  • 5. All you need to know about crawlers
  • 6. All you need to know about crawlers
  • 7. About me ● Currently working @Amazon Disclaimer: I am not representing Amazon, not talking about anything to do with my current, previous or future experience within Amazon. Development and testing professionally since 2009 IBM, Microsoft, Dell, Netease… ● Over 20 presentations worldwide ● Author of the book “How to Test a Time Machine” ● Contact: https://thetestlynx.wordpress.com @thetestlynx in twitter Noemi Ferrera
  • 8. Agenda ● What’s a crawler? ● Why and when do we need a crawler? ● Types of crawlers ● Components of a crawler ○ View/node ○ Arcs/links ○ Visited storage ○ Heat map ● Example ● What can go wrong ● What you need to succeed All you need to know about crawlers…
  • 9. What’s a crawler? A crawler is an automatic system that iterates throughout the parts of an application, with the objective of finding issues or explore it. ..Can be a web application, but also other types of applications. Definition
  • 10. Why and when … ● Discovery testing ● Finding particular common issues (ex. 404) ● Quick coverage ● Generally runs in production - or pre-prod (late) …do we need a crawler?
  • 11. Types of Crawlers ● UI vs API ● View First vs Arc First ● Exhaustive vs Shortcutted ● Random vs Smart
  • 12. Types of crawlers UI VS API UI API Uses UI to navigate through the application Uses API to navigate through the application Closer to user’s behaviour Faster to run Checks elements, not only links Focus mostly on links and API points
  • 13. Types of crawlers View first VS Arc first View First Arc First Focuses first on the view, then navigates Focuses first on the navigation, then check the view Better when the application has many checks but does not have too much navigation Better when views have few things to check but long list of navigation points
  • 14. Types of crawlers Exhaustive vs Shortcutted Exhaustive Shortcutted Aims to visit the entire application Stops after a number of visits Better for smaller applications or have a lot of time to cover it all Better if the application is too big, and not enough time to cover it all Might make too many calls or take too long finding issues Might end up after visiting the important parts of the application
  • 15. Types of crawlers Random vs Smart Random Smart Could be partially random Uses some logic to give priority to parts of the application Likely needs to be shortcutted Might end up after visiting the important parts of the application
  • 16. Components of a crawler ● View/node ● Arcs/links ● Visited storage ● Heat map Key concepts
  • 17. Components of a crawler ● View/node ● Arcs/links ● Visited storage ● Heat map Key concepts
  • 18. Components of a crawler ● How to tell when you are in a different one? ○ Website: URL ○ External links - Avoid navigation ○ Games or harder apps View/Node
  • 19. Components of a crawler ● View/node ● Arcs/links ● Visited storage ● Heat map Key concepts
  • 20. Components of a crawler ● How to navigate? ○ Clicks ○ API calls ○ Swiping and other actions ○ VR apps - other interactions ● Clickable objects? ○ Websites - href ○ All dom objects ■ Containers? ○ Moving/Changing objects Arcs/links
  • 21. Components of a crawler ● How to navigate? ○ Clicks ○ API calls ○ Swiping and other actions ○ VR apps - other interactions ● Clickable objects? ○ Websites - href ○ All dom objects ■ Containers? ○ Moving/Changing objects Arcs/links
  • 22. Components of a crawler ● How to navigate? ○ Clicks ○ API calls ○ Swiping and other actions ○ VR apps - other interactions ● Clickable objects? ○ Websites - href ○ All dom objects ■ Containers? ○ Moving/Changing objects Arcs/links
  • 23. Components of a crawler ● How to navigate? ○ Clicks ○ API calls ○ Swiping and other actions ○ VR apps - other interactions ● Clickable objects? ○ Websites - href ○ All dom objects ■ Containers? ○ Moving/Changing objects ○ Dynanism ○ Hidden elements? Arcs/links
  • 24. Components of a crawler ● View/node ● Arcs/links ● Visited storage ● Heat map Key concepts index.html
  • 25. Components of a crawler ● View/node ● Arcs/links ● Visited storage ● Heat map Key concepts index.html Second view
  • 26. Components of a crawler ● View/node ● Arcs/links ● Visited storage ● Heat map Key concepts
  • 27. Components of a crawler ● By usage ● By issues found ● By novelty ● Others Heat map
  • 28. Crawling with Selenium Class WebCrawlerSelenium: def __init__(self): driver = webdriver.Chrome(...) self.top_level = 10 url = https://www.selenium.dev driver.get(url) view = view_class.ViewClass(url) self.explore(view, [], 0 driver) driver.close() Example Start the crawler Explore the first view
  • 29. Crawling with Selenium def explore(self, view, visited, current_level, driver): current_level = current_level + 1 if current_level >= self.top_level: sys.exit(“Max visit reached”) visit.append(view) check_status(node.url) If view.count == -1: view.count = 0 get_all_href(view) # adds view.count while view.count > 0: get_next_view(view) Example cont… Explore each level Initialize the linked views
  • 30. Crawling with Selenium def check_status(self, node): status_code = requests.get(url).status_code if status_code < 200 or status_code >= 400: sys.exit(“Error on url” + url) Example cont 2 … Check status with API
  • 31. Crawling with Selenium def get_all_href(self, view): for a_tag in driver.find_element(By.TAG_NAME, ‘a’): view.count = view.count + 1 href = a_tag.get_attribute(‘href’) view.actions[href] = a_tag.get_dom_attribute(‘href’) Example cont 3 … Get all references for the node Finding by a tag Add to actions
  • 32. Crawling with Selenium def get_next_view(self, view, visited): sub_url = view.actions.last() count = len(view.actions) while sub_url in visited and count > 0: sub_url = view.actions[count] count = count - 1 if count == 0: return subview = view_class.ViewClass(sub_url) self.try_click(sub_url, driver) # ui navigation, API - requests.get Example cont 4 … Initialize the view Get all the urls Click next action Explore next
  • 33. Crawling with Selenium def try_click(self, href, driver): xpath= ('//a[@href="'+href+'"]') try: element = driver.find_element(By.XPATH, xpath) element.click() except Exception: print(“Could not find the xpath”) Example cont 5 … Tries to click the element We could add here other actions
  • 34. What could go wrong ● How to identify views? (Already covered) ○ External links? ○ Keep track of visited ○ Top level ● How to identify navigation points/arcs? (already covered) ○ Partial vs full hrefs
  • 35. What could go wrong ● How to identify views? ○ External links? ○ Keep track of visited ○ Top level ● How to identify navigation points/arcs? ○ Partial vs full hrefs ● Forms, ex. login
  • 36. What could go wrong ● How to identify views? (Already covered) ○ External links? ○ Keep track of visited ○ Top level ● How to identify navigation points/arcs? (already covered) ○ Partial vs full hrefs ● Forms, ex. login ● Pop-ups
  • 37. What could go wrong ● How to identify views? (Already covered) ○ External links? ○ Keep track of visited ○ Top level ● How to identify navigation points/arcs? (already covered) ○ Partial vs full hrefs ● Forms, ex. login ● Pop-ups ● Cookies ● Dynamic objects ● Stale links
  • 38. What you need to succeed ● Know: graph, trees, types of traversals ○ Tracking visited nodes ● App knowledge ○ Experience or tool (head map generator…) ● What type of issues are you looking for? ○ API? UI? When do they happen? ● Make sure you cannot cover these with other testing!!!
  • 39. Summary ● What’s a crawler? ● Why and when do we need a crawler? ○ Discovery ○ Common issues ○ Quick coverage ● Types of crawlers ● Components of a crawler ● Example ● What can go wrong ● What you need to succeed
  • 40. Summary ● What’s a crawler? ● Why and when do we need a crawler? ● Types of crawlers ○ UI/API/MIXED ○ VIEW FIRST / DEPTH FIRST ○ EXHAUSTIVE / SHORTCUTTED ○ RANDOM / SMART ● Components of a crawler ● Example ● What can go wrong ● What you need to succeed
  • 41. Summary ● What’s a crawler? ● Why and when do we need a crawler? ● Types of crawlers ● Components of a crawler ○ View/node ○ Arcs/links ○ Visited storage ○ Heat map ● Example ● What can go wrong ● What you need to succeed
  • 42. Summary ● What’s a crawler? ● Why and when do we need a crawler? ● Types of crawlers ● Components of a crawler ● Example ● What can go wrong ● What you need to succeed

Editor's Notes

  1. In this presentation we will see a web app example
  2. VR - looking for a while could be an action