SlideShare a Scribd company logo
1 of 20
Download to read offline
DevTools to crawl Webpages.
DevTools




09.05.12   @chrschneider   2
DevTools

                                    … Apache … toolset of low level Java components
                                    focused on HTTP and associated protocols.“



  ●   HttpComponents Core
          … is a set of low level HTTP transport components

  ●   HttpComponents Client
          … provides reusable components for client-side ... HTTP connection
          management.

  ●   HttpComponents AsyncClient (DEV)
          … ability to handle a great number of concurrent connections ... more ...
          performance in terms of a raw data throughput.

  ●   Commons HttpClient (Legacy)
         … All users of Commons HttpClient 3.x are strongly encouraged to upgrade to
         HttpClient 4.1.


09.05.12                               @chrschneider                                  3
DevTools

                                                      HttpComponents Client




       Example Components

           ●   Get, Post, Delete, … Request Objects

           ●   Cookie Manager

           ●   SSL

           ●   Content Encoding Aware

           ●   HTTP Authentication (Basic, Digest, ...)




09.05.12                                   @chrschneider                      4
DevTools

                                                      HttpComponents Client Example




           public final static void main(final String[] args) throws Exception
           {

                final HttpClient httpclient = new DefaultHttpClient();
                try
                {
                      final HttpGet httpget = new HttpGet("http://www.google.com/");

                      System.out.println("executing request " + httpget.getURI());

                      // Create a response handler
                      final ResponseHandler<String> responseHandler = new BasicResponseHandler();
                      final String responseBody = httpclient.execute(httpget, responseHandler);
                      System.out.println("----------------------------------------");
                      System.out.println(responseBody);
                      System.out.println("----------------------------------------");

                }
                finally
                {
                      httpclient.getConnectionManager().shutdown();
                }
           }


                                                              http://hc.apache.org/httpcomponents-client-ga/examples.html


09.05.12                                         @chrschneider                                                   5
DevTools

                      HttpComponents Client




               Demo




09.05.12   @chrschneider                      6
DevTools




           … is an asynchronous event-driven network application framework for rapid
           development of maintainable high performance protocol servers & clients.




                          See: http://netty.io/




09.05.12                             @chrschneider                                     7
DevTools

                                      … is a "GUI-Less browser for Java programs"


 Features (extraction):
  ● Support for the HTTP and HTTPS protocols

  ● Support for cookies

  ● Ability to specify whether failing responses from the server should throw exceptions

    or should be returned as pages of the appropriate type (based on content type)
  ● Ability to customize the request headers being sent to the server

  ● Support for HTML responses



   ●   Support for submitting forms
   ●   Support for clicking links
   ●   Support for walking the DOM model of the HTML document
   ●   JavaScript support




09.05.12                             @chrschneider                                 8
DevTools

                                                  … is a "GUI-Less browser for Java programs"



      @Test
      public void homePage() throws Exception
      {
            final WebClient webClient = new WebClient();
            final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");

           System.out.println(page.getTitleText());

           assertEquals("Welcome to HtmlUnit", page.getTitleText());

           final String pageAsXml = page.asXml();
           assertTrue(pageAsXml.contains("<body class="composite">"));

           final String pageAsText = page.asText();
           assertTrue(pageAsText.contains("Support for the HTTP and HTTPS protocols"));

           webClient.closeAllWindows();
      }




                                                                               http://htmlunit.sourceforge.net/gettingStarted.html


09.05.12                                        @chrschneider                                                          9
DevTools

                                                                 … is a "GUI-Less browser for Java programs"




           @Test
           public void   getElements() throws Exception
           {
                 final   WebClient webClient = new WebClient();
                 final   HtmlPage page = webClient.getPage("http://some_url");
                 final   HtmlDivision div = page.getHtmlElementById("some_div_id");
                 final   HtmlAnchor anchor = page.getAnchorByName("anchor_name");

                 webClient.closeAllWindows();
           }


                                                                                                       Luxus :)



     Note: Also html tables are supported. They wrote easy wrapper classes to walk though them. … Handy!
     http://htmlunit.sourceforge.net/table-howto.html




                                                                                                       http://htmlunit.sourceforge.net/gettingStarted.html


09.05.12                                                      @chrschneider                                                                   10
DevTools

                                             … automates browsers. That's it.




    Selenium-WebDriver supports the following browsers along with the
    operating systems these browsers are compatible with.
      ●    Google Chrome 12.0.712.0+
      ●    Internet Explorer 6, 7, 8, 9 - 32 and 64-bit where applicable
      ●    Firefox 3.0, 3.5, 3.6, 4.0, 5.0, 6, 7
      ●    Opera 11.5+
      ●    HtmlUnit 2.9
      ●    Android – 2.3+ for phones and tablets (devices & emulators)
      ●    iOS 3+ for phones (devices & emulators) and 3.2+ for tablets (devices
           & emulators)


09.05.12                                    @chrschneider                          11
DevTools

                                … automates browsers. That's it.




                           The Selenium Family

           Selenium IDE



                                               Also c#, Phython, Ruby, ...
           Selenium WebDriver

                                                               Also on Windows and Mac



           Selenium Grid



09.05.12                     @chrschneider                                      12
DevTools

                                … automates browsers. That's it.




                           The Selenium Family

                                    … create quick bug reproduction scripts
           Selenium IDE
                                    … create scripts to aid in automation-aided
                                    exploratory testing


           Selenium WebDriver       … create robust, browser-based regression
                                    automation

                                    … scale and distribute scripts across many
                                    environments
           Selenium Grid

                                                                     http://seleniumhq.org/


09.05.12                     @chrschneider                                     13
DevTools

                                            Requirements for Selenium WebDriver with Firefox
                                                             (and HtmlUnit)




              Dependencies                                        Browser Binaries
   <dependency>
         <groupId>org.seleniumhq.selenium</groupId>
         <artifactId>selenium-java</artifactId>
         <version>2.21.0</version>
   </dependency>

   <dependency>
         <groupId>org.seleniumhq.selenium</groupId>
         <artifactId>selenium-htmlunit-driver</artifactId>
         <version>2.21.0</version>
   </dependency>

   <dependency>
         <groupId>org.seleniumhq.selenium</groupId>




                                                                           it.
         <artifactId>selenium-firefox-driver</artifactId>




                                                                          's
         <version>2.21.0</version>




                                                                        at
                                                                      Th
   </dependency>




09.05.12                                         @chrschneider                           14
DevTools

                                                               Basic Selenium example




    @Test
    public void testSeleniumWithFirefox() throws InterruptedException
    {
          final WebDriver webDriver = new FirefoxDriver();

           webDriver.get("http://www.majug.de");

           final WebElement veranstaltungenLink = webDriver.findElement(By.linkText("Veranstaltungen"));

           veranstaltungenLink.click();

           // Close the browser
           Thread.sleep(5000);
           webDriver.quit();
    }




09.05.12                                           @chrschneider                                           15
DevTools

                                        Selenium WebDriver Locator Strategies




 It's also possible to call findElements(...) to get a List<> of WebElements.:

               List<WebElement> hits = webDriver.findElements(By.tagName("a"));




09.05.12                                     @chrschneider                        16
DevTools

                                      Selenium WebDriver Interactions




  If you got a webElement, you can...

     ●   webElement.click() it

     ●   webElement.sendKeys(...) to it

     ●   webElement.submit() on it.


  It is also possible to perform “Actions“ like DoubleClick, DragAndDrop, ClickAndHold, …
  with the “Actions“ class.




09.05.12                                  @chrschneider                          17
DevTools

                           Selenium WebDriver




              Demo




09.05.12   @chrschneider                        18
DevTools

                                                        Selenium WebDriver Pitfalls




    Newbie Pitfalls:

    ●   Selenium doesn't wait until the hole site is loaded (Keyword: Implicit wait)
    ●   webElement.xPath(“@// ...“) starts from root of the DOM (use “.//...“ instead)
    ●   Google brings up “Selenium RC“ solutions. This is the old Selenium project.
    ●   A reference to a WebElement will become invalid if the driver “moves“ to
        another page.
    ●   Firefox doesn't run on our CI because it is a headless system (try Xvfb)
    ●   New XPath 2.0 functions (like ends-with(...)) are failing. This is because Selenium
        uses the driver's native Xpath engine. For Firefox this means it is Xpath 1.0 today.




09.05.12                                @chrschneider                                 19
Noch Fragen?
Vielen Dank für Ihre Aufmerksamkeit!

More Related Content

What's hot

GWT Introduction and Overview - SV Code Camp 09
GWT Introduction and Overview - SV Code Camp 09GWT Introduction and Overview - SV Code Camp 09
GWT Introduction and Overview - SV Code Camp 09Fred Sauer
 
UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018
UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018
UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018Tobias Schneck
 
Java Web Programming [9/9] : Web Application Security
Java Web Programming [9/9] : Web Application SecurityJava Web Programming [9/9] : Web Application Security
Java Web Programming [9/9] : Web Application SecurityIMC Institute
 
When dynamic becomes static
When dynamic becomes staticWhen dynamic becomes static
When dynamic becomes staticWim Godden
 
George Thiruvathukal, User Experiences with Plone Content Management
George Thiruvathukal, User Experiences with Plone Content Management George Thiruvathukal, User Experiences with Plone Content Management
George Thiruvathukal, User Experiences with Plone Content Management webcontent2007
 
Selenium Clinic Eurostar 2012 WebDriver Tutorial
Selenium Clinic Eurostar 2012 WebDriver TutorialSelenium Clinic Eurostar 2012 WebDriver Tutorial
Selenium Clinic Eurostar 2012 WebDriver TutorialAlan Richardson
 
Introduction tomaven
Introduction tomavenIntroduction tomaven
Introduction tomavenManav Prasad
 
softshake 2014 - Java EE
softshake 2014 - Java EEsoftshake 2014 - Java EE
softshake 2014 - Java EEAlexis Hassler
 
Thug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientThug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientAngelo Dell'Aera
 
Protractor framework – how to make stable e2e tests for Angular applications
Protractor framework – how to make stable e2e tests for Angular applicationsProtractor framework – how to make stable e2e tests for Angular applications
Protractor framework – how to make stable e2e tests for Angular applicationsLudmila Nesvitiy
 
Se lancer dans l'aventure microservices avec Spring Cloud - Julien Roy
Se lancer dans l'aventure microservices avec Spring Cloud - Julien RoySe lancer dans l'aventure microservices avec Spring Cloud - Julien Roy
Se lancer dans l'aventure microservices avec Spring Cloud - Julien Royekino
 
Testing Ext JS and Sencha Touch
Testing Ext JS and Sencha TouchTesting Ext JS and Sencha Touch
Testing Ext JS and Sencha TouchMats Bryntse
 
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)Tobias Schneck
 
In the Brain of Hans Dockter: Gradle
In the Brain of Hans Dockter: GradleIn the Brain of Hans Dockter: Gradle
In the Brain of Hans Dockter: GradleSkills Matter
 
Automated php unit testing in drupal 8
Automated php unit testing in drupal 8Automated php unit testing in drupal 8
Automated php unit testing in drupal 8Jay Friendly
 

What's hot (17)

GWT Introduction and Overview - SV Code Camp 09
GWT Introduction and Overview - SV Code Camp 09GWT Introduction and Overview - SV Code Camp 09
GWT Introduction and Overview - SV Code Camp 09
 
UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018
UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018
UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018
 
Java Web Programming [9/9] : Web Application Security
Java Web Programming [9/9] : Web Application SecurityJava Web Programming [9/9] : Web Application Security
Java Web Programming [9/9] : Web Application Security
 
When dynamic becomes static
When dynamic becomes staticWhen dynamic becomes static
When dynamic becomes static
 
George Thiruvathukal, User Experiences with Plone Content Management
George Thiruvathukal, User Experiences with Plone Content Management George Thiruvathukal, User Experiences with Plone Content Management
George Thiruvathukal, User Experiences with Plone Content Management
 
Selenium Clinic Eurostar 2012 WebDriver Tutorial
Selenium Clinic Eurostar 2012 WebDriver TutorialSelenium Clinic Eurostar 2012 WebDriver Tutorial
Selenium Clinic Eurostar 2012 WebDriver Tutorial
 
Introduction tomaven
Introduction tomavenIntroduction tomaven
Introduction tomaven
 
softshake 2014 - Java EE
softshake 2014 - Java EEsoftshake 2014 - Java EE
softshake 2014 - Java EE
 
JEE Programming - 05 JSP
JEE Programming - 05 JSPJEE Programming - 05 JSP
JEE Programming - 05 JSP
 
Thug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientThug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclient
 
Protractor framework – how to make stable e2e tests for Angular applications
Protractor framework – how to make stable e2e tests for Angular applicationsProtractor framework – how to make stable e2e tests for Angular applications
Protractor framework – how to make stable e2e tests for Angular applications
 
Se lancer dans l'aventure microservices avec Spring Cloud - Julien Roy
Se lancer dans l'aventure microservices avec Spring Cloud - Julien RoySe lancer dans l'aventure microservices avec Spring Cloud - Julien Roy
Se lancer dans l'aventure microservices avec Spring Cloud - Julien Roy
 
Maven basic concept
Maven basic conceptMaven basic concept
Maven basic concept
 
Testing Ext JS and Sencha Touch
Testing Ext JS and Sencha TouchTesting Ext JS and Sencha Touch
Testing Ext JS and Sencha Touch
 
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)
 
In the Brain of Hans Dockter: Gradle
In the Brain of Hans Dockter: GradleIn the Brain of Hans Dockter: Gradle
In the Brain of Hans Dockter: Gradle
 
Automated php unit testing in drupal 8
Automated php unit testing in drupal 8Automated php unit testing in drupal 8
Automated php unit testing in drupal 8
 

Viewers also liked

Medical management of hiv infection
Medical management of hiv infectionMedical management of hiv infection
Medical management of hiv infectionImran Khan
 
Medical students amnesia
Medical students amnesiaMedical students amnesia
Medical students amnesiaImran Khan
 
Eustace_Harewood_security_company_business_plan
Eustace_Harewood_security_company_business_planEustace_Harewood_security_company_business_plan
Eustace_Harewood_security_company_business_planedharewood
 
De tapete a chica de ensueño
De tapete a chica de ensueñoDe tapete a chica de ensueño
De tapete a chica de ensueñoAnitha Martinez
 
Webportfolio nayuribe c
Webportfolio nayuribe cWebportfolio nayuribe c
Webportfolio nayuribe cNayuribe Ch
 
2010 - 2011 Bermuda Salary Trends Report
2010 - 2011 Bermuda Salary Trends Report2010 - 2011 Bermuda Salary Trends Report
2010 - 2011 Bermuda Salary Trends Reportedharewood
 
Value of Enhanced Hotel Security
Value of Enhanced Hotel SecurityValue of Enhanced Hotel Security
Value of Enhanced Hotel Securityedharewood
 
Paxil: New Indication, New Patients to Help
Paxil: New Indication, New Patients to HelpPaxil: New Indication, New Patients to Help
Paxil: New Indication, New Patients to HelpChristian J. O'Brien
 
Focus MS: Accessing the Use of a Patient Centric Model when Treating Multiple...
Focus MS: Accessing the Use of a Patient Centric Model when Treating Multiple...Focus MS: Accessing the Use of a Patient Centric Model when Treating Multiple...
Focus MS: Accessing the Use of a Patient Centric Model when Treating Multiple...Christian J. O'Brien
 
ADKN, Co. Consulting Team Qsymia Strategic Marketing Plan
ADKN, Co. Consulting Team Qsymia Strategic Marketing PlanADKN, Co. Consulting Team Qsymia Strategic Marketing Plan
ADKN, Co. Consulting Team Qsymia Strategic Marketing PlanChristian J. O'Brien
 

Viewers also liked (16)

Cr9 ppt
Cr9 pptCr9 ppt
Cr9 ppt
 
Medical management of hiv infection
Medical management of hiv infectionMedical management of hiv infection
Medical management of hiv infection
 
Arthritis
ArthritisArthritis
Arthritis
 
Medical students amnesia
Medical students amnesiaMedical students amnesia
Medical students amnesia
 
Work sample
Work sampleWork sample
Work sample
 
Eustace_Harewood_security_company_business_plan
Eustace_Harewood_security_company_business_planEustace_Harewood_security_company_business_plan
Eustace_Harewood_security_company_business_plan
 
DMC Event Presentation for 4 26-2012
DMC Event Presentation for 4 26-2012DMC Event Presentation for 4 26-2012
DMC Event Presentation for 4 26-2012
 
De tapete a chica de ensueño
De tapete a chica de ensueñoDe tapete a chica de ensueño
De tapete a chica de ensueño
 
Webportfolio nayuribe c
Webportfolio nayuribe cWebportfolio nayuribe c
Webportfolio nayuribe c
 
2010 - 2011 Bermuda Salary Trends Report
2010 - 2011 Bermuda Salary Trends Report2010 - 2011 Bermuda Salary Trends Report
2010 - 2011 Bermuda Salary Trends Report
 
Value of Enhanced Hotel Security
Value of Enhanced Hotel SecurityValue of Enhanced Hotel Security
Value of Enhanced Hotel Security
 
Marilyn Monroe
Marilyn MonroeMarilyn Monroe
Marilyn Monroe
 
Embedding with Tableau Server
Embedding with Tableau ServerEmbedding with Tableau Server
Embedding with Tableau Server
 
Paxil: New Indication, New Patients to Help
Paxil: New Indication, New Patients to HelpPaxil: New Indication, New Patients to Help
Paxil: New Indication, New Patients to Help
 
Focus MS: Accessing the Use of a Patient Centric Model when Treating Multiple...
Focus MS: Accessing the Use of a Patient Centric Model when Treating Multiple...Focus MS: Accessing the Use of a Patient Centric Model when Treating Multiple...
Focus MS: Accessing the Use of a Patient Centric Model when Treating Multiple...
 
ADKN, Co. Consulting Team Qsymia Strategic Marketing Plan
ADKN, Co. Consulting Team Qsymia Strategic Marketing PlanADKN, Co. Consulting Team Qsymia Strategic Marketing Plan
ADKN, Co. Consulting Team Qsymia Strategic Marketing Plan
 

Similar to Innoplexia DevTools to Crawl Webpages

Selenium Automation in Java Using HttpWatch Plug-in
 Selenium Automation in Java Using HttpWatch Plug-in  Selenium Automation in Java Using HttpWatch Plug-in
Selenium Automation in Java Using HttpWatch Plug-in Sandeep Tol
 
Mastering Test Automation: How To Use Selenium Successfully
Mastering Test Automation: How To Use Selenium SuccessfullyMastering Test Automation: How To Use Selenium Successfully
Mastering Test Automation: How To Use Selenium SuccessfullySpringPeople
 
Session on Selenium 4 : What’s coming our way? by Hitesh Prajapati
Session on Selenium 4 : What’s coming our way? by Hitesh PrajapatiSession on Selenium 4 : What’s coming our way? by Hitesh Prajapati
Session on Selenium 4 : What’s coming our way? by Hitesh PrajapatiAgile Testing Alliance
 
Selenium 4 - What's coming our way - v1.0.pptx
Selenium 4 - What's coming our way - v1.0.pptxSelenium 4 - What's coming our way - v1.0.pptx
Selenium 4 - What's coming our way - v1.0.pptxHitesh Prajapati
 
C# Security Testing and Debugging
C# Security Testing and DebuggingC# Security Testing and Debugging
C# Security Testing and DebuggingRich Helton
 
Selenium Introduction by Sandeep Sharda
Selenium Introduction by Sandeep ShardaSelenium Introduction by Sandeep Sharda
Selenium Introduction by Sandeep ShardaEr. Sndp Srda
 
Week 05 Web, App and Javascript_Brandon, S.H. Wu
Week 05 Web, App and Javascript_Brandon, S.H. WuWeek 05 Web, App and Javascript_Brandon, S.H. Wu
Week 05 Web, App and Javascript_Brandon, S.H. WuAppUniverz Org
 
Automated ui testing with selenium. drupal con london 2011
Automated ui testing with selenium. drupal con london 2011Automated ui testing with selenium. drupal con london 2011
Automated ui testing with selenium. drupal con london 2011Yuriy Gerasimov
 
Modern Web Technologies
Modern Web TechnologiesModern Web Technologies
Modern Web TechnologiesPerttu Myry
 
Developing Java Web Applications
Developing Java Web ApplicationsDeveloping Java Web Applications
Developing Java Web Applicationshchen1
 
Session on Selenium Powertools by Unmesh Gundecha
Session on Selenium Powertools by Unmesh GundechaSession on Selenium Powertools by Unmesh Gundecha
Session on Selenium Powertools by Unmesh GundechaAgile Testing Alliance
 
Deview 2013 mobile browser internals and trends_20131022
Deview 2013 mobile browser internals and trends_20131022Deview 2013 mobile browser internals and trends_20131022
Deview 2013 mobile browser internals and trends_20131022NAVER D2
 
HTML5 Intoduction for Web Developers
HTML5 Intoduction for Web DevelopersHTML5 Intoduction for Web Developers
HTML5 Intoduction for Web DevelopersSascha Corti
 
eXo Platform SEA - Play Framework Introduction
eXo Platform SEA - Play Framework IntroductioneXo Platform SEA - Play Framework Introduction
eXo Platform SEA - Play Framework Introductionvstorm83
 
Zend Framework Quick Start Walkthrough
Zend Framework Quick Start WalkthroughZend Framework Quick Start Walkthrough
Zend Framework Quick Start WalkthroughBradley Holt
 
The Theory Of The Dom
The Theory Of The DomThe Theory Of The Dom
The Theory Of The Domkaven yan
 

Similar to Innoplexia DevTools to Crawl Webpages (20)

Knolx session
Knolx sessionKnolx session
Knolx session
 
Selenium Automation in Java Using HttpWatch Plug-in
 Selenium Automation in Java Using HttpWatch Plug-in  Selenium Automation in Java Using HttpWatch Plug-in
Selenium Automation in Java Using HttpWatch Plug-in
 
Mastering Test Automation: How To Use Selenium Successfully
Mastering Test Automation: How To Use Selenium SuccessfullyMastering Test Automation: How To Use Selenium Successfully
Mastering Test Automation: How To Use Selenium Successfully
 
Advanced JavaScript
Advanced JavaScriptAdvanced JavaScript
Advanced JavaScript
 
Session on Selenium 4 : What’s coming our way? by Hitesh Prajapati
Session on Selenium 4 : What’s coming our way? by Hitesh PrajapatiSession on Selenium 4 : What’s coming our way? by Hitesh Prajapati
Session on Selenium 4 : What’s coming our way? by Hitesh Prajapati
 
Selenium 4 - What's coming our way - v1.0.pptx
Selenium 4 - What's coming our way - v1.0.pptxSelenium 4 - What's coming our way - v1.0.pptx
Selenium 4 - What's coming our way - v1.0.pptx
 
Selenium.pptx
Selenium.pptxSelenium.pptx
Selenium.pptx
 
C# Security Testing and Debugging
C# Security Testing and DebuggingC# Security Testing and Debugging
C# Security Testing and Debugging
 
Selenium Introduction by Sandeep Sharda
Selenium Introduction by Sandeep ShardaSelenium Introduction by Sandeep Sharda
Selenium Introduction by Sandeep Sharda
 
Week 05 Web, App and Javascript_Brandon, S.H. Wu
Week 05 Web, App and Javascript_Brandon, S.H. WuWeek 05 Web, App and Javascript_Brandon, S.H. Wu
Week 05 Web, App and Javascript_Brandon, S.H. Wu
 
Automated ui testing with selenium. drupal con london 2011
Automated ui testing with selenium. drupal con london 2011Automated ui testing with selenium. drupal con london 2011
Automated ui testing with selenium. drupal con london 2011
 
Modern Web Technologies
Modern Web TechnologiesModern Web Technologies
Modern Web Technologies
 
Developing Java Web Applications
Developing Java Web ApplicationsDeveloping Java Web Applications
Developing Java Web Applications
 
Session on Selenium Powertools by Unmesh Gundecha
Session on Selenium Powertools by Unmesh GundechaSession on Selenium Powertools by Unmesh Gundecha
Session on Selenium Powertools by Unmesh Gundecha
 
Selenium WebDriver training
Selenium WebDriver trainingSelenium WebDriver training
Selenium WebDriver training
 
Deview 2013 mobile browser internals and trends_20131022
Deview 2013 mobile browser internals and trends_20131022Deview 2013 mobile browser internals and trends_20131022
Deview 2013 mobile browser internals and trends_20131022
 
HTML5 Intoduction for Web Developers
HTML5 Intoduction for Web DevelopersHTML5 Intoduction for Web Developers
HTML5 Intoduction for Web Developers
 
eXo Platform SEA - Play Framework Introduction
eXo Platform SEA - Play Framework IntroductioneXo Platform SEA - Play Framework Introduction
eXo Platform SEA - Play Framework Introduction
 
Zend Framework Quick Start Walkthrough
Zend Framework Quick Start WalkthroughZend Framework Quick Start Walkthrough
Zend Framework Quick Start Walkthrough
 
The Theory Of The Dom
The Theory Of The DomThe Theory Of The Dom
The Theory Of The Dom
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Recently uploaded (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Innoplexia DevTools to Crawl Webpages

  • 1. DevTools to crawl Webpages.
  • 2. DevTools 09.05.12 @chrschneider 2
  • 3. DevTools … Apache … toolset of low level Java components focused on HTTP and associated protocols.“ ● HttpComponents Core … is a set of low level HTTP transport components ● HttpComponents Client … provides reusable components for client-side ... HTTP connection management. ● HttpComponents AsyncClient (DEV) … ability to handle a great number of concurrent connections ... more ... performance in terms of a raw data throughput. ● Commons HttpClient (Legacy) … All users of Commons HttpClient 3.x are strongly encouraged to upgrade to HttpClient 4.1. 09.05.12 @chrschneider 3
  • 4. DevTools HttpComponents Client Example Components ● Get, Post, Delete, … Request Objects ● Cookie Manager ● SSL ● Content Encoding Aware ● HTTP Authentication (Basic, Digest, ...) 09.05.12 @chrschneider 4
  • 5. DevTools HttpComponents Client Example public final static void main(final String[] args) throws Exception { final HttpClient httpclient = new DefaultHttpClient(); try { final HttpGet httpget = new HttpGet("http://www.google.com/"); System.out.println("executing request " + httpget.getURI()); // Create a response handler final ResponseHandler<String> responseHandler = new BasicResponseHandler(); final String responseBody = httpclient.execute(httpget, responseHandler); System.out.println("----------------------------------------"); System.out.println(responseBody); System.out.println("----------------------------------------"); } finally { httpclient.getConnectionManager().shutdown(); } } http://hc.apache.org/httpcomponents-client-ga/examples.html 09.05.12 @chrschneider 5
  • 6. DevTools HttpComponents Client Demo 09.05.12 @chrschneider 6
  • 7. DevTools … is an asynchronous event-driven network application framework for rapid development of maintainable high performance protocol servers & clients. See: http://netty.io/ 09.05.12 @chrschneider 7
  • 8. DevTools … is a "GUI-Less browser for Java programs" Features (extraction): ● Support for the HTTP and HTTPS protocols ● Support for cookies ● Ability to specify whether failing responses from the server should throw exceptions or should be returned as pages of the appropriate type (based on content type) ● Ability to customize the request headers being sent to the server ● Support for HTML responses ● Support for submitting forms ● Support for clicking links ● Support for walking the DOM model of the HTML document ● JavaScript support 09.05.12 @chrschneider 8
  • 9. DevTools … is a "GUI-Less browser for Java programs" @Test public void homePage() throws Exception { final WebClient webClient = new WebClient(); final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net"); System.out.println(page.getTitleText()); assertEquals("Welcome to HtmlUnit", page.getTitleText()); final String pageAsXml = page.asXml(); assertTrue(pageAsXml.contains("<body class="composite">")); final String pageAsText = page.asText(); assertTrue(pageAsText.contains("Support for the HTTP and HTTPS protocols")); webClient.closeAllWindows(); } http://htmlunit.sourceforge.net/gettingStarted.html 09.05.12 @chrschneider 9
  • 10. DevTools … is a "GUI-Less browser for Java programs" @Test public void getElements() throws Exception { final WebClient webClient = new WebClient(); final HtmlPage page = webClient.getPage("http://some_url"); final HtmlDivision div = page.getHtmlElementById("some_div_id"); final HtmlAnchor anchor = page.getAnchorByName("anchor_name"); webClient.closeAllWindows(); } Luxus :) Note: Also html tables are supported. They wrote easy wrapper classes to walk though them. … Handy! http://htmlunit.sourceforge.net/table-howto.html http://htmlunit.sourceforge.net/gettingStarted.html 09.05.12 @chrschneider 10
  • 11. DevTools … automates browsers. That's it. Selenium-WebDriver supports the following browsers along with the operating systems these browsers are compatible with. ● Google Chrome 12.0.712.0+ ● Internet Explorer 6, 7, 8, 9 - 32 and 64-bit where applicable ● Firefox 3.0, 3.5, 3.6, 4.0, 5.0, 6, 7 ● Opera 11.5+ ● HtmlUnit 2.9 ● Android – 2.3+ for phones and tablets (devices & emulators) ● iOS 3+ for phones (devices & emulators) and 3.2+ for tablets (devices & emulators) 09.05.12 @chrschneider 11
  • 12. DevTools … automates browsers. That's it. The Selenium Family Selenium IDE Also c#, Phython, Ruby, ... Selenium WebDriver Also on Windows and Mac Selenium Grid 09.05.12 @chrschneider 12
  • 13. DevTools … automates browsers. That's it. The Selenium Family … create quick bug reproduction scripts Selenium IDE … create scripts to aid in automation-aided exploratory testing Selenium WebDriver … create robust, browser-based regression automation … scale and distribute scripts across many environments Selenium Grid http://seleniumhq.org/ 09.05.12 @chrschneider 13
  • 14. DevTools Requirements for Selenium WebDriver with Firefox (and HtmlUnit) Dependencies Browser Binaries <dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-java</artifactId> <version>2.21.0</version> </dependency> <dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-htmlunit-driver</artifactId> <version>2.21.0</version> </dependency> <dependency> <groupId>org.seleniumhq.selenium</groupId> it. <artifactId>selenium-firefox-driver</artifactId> 's <version>2.21.0</version> at Th </dependency> 09.05.12 @chrschneider 14
  • 15. DevTools Basic Selenium example @Test public void testSeleniumWithFirefox() throws InterruptedException { final WebDriver webDriver = new FirefoxDriver(); webDriver.get("http://www.majug.de"); final WebElement veranstaltungenLink = webDriver.findElement(By.linkText("Veranstaltungen")); veranstaltungenLink.click(); // Close the browser Thread.sleep(5000); webDriver.quit(); } 09.05.12 @chrschneider 15
  • 16. DevTools Selenium WebDriver Locator Strategies It's also possible to call findElements(...) to get a List<> of WebElements.: List<WebElement> hits = webDriver.findElements(By.tagName("a")); 09.05.12 @chrschneider 16
  • 17. DevTools Selenium WebDriver Interactions If you got a webElement, you can... ● webElement.click() it ● webElement.sendKeys(...) to it ● webElement.submit() on it. It is also possible to perform “Actions“ like DoubleClick, DragAndDrop, ClickAndHold, … with the “Actions“ class. 09.05.12 @chrschneider 17
  • 18. DevTools Selenium WebDriver Demo 09.05.12 @chrschneider 18
  • 19. DevTools Selenium WebDriver Pitfalls Newbie Pitfalls: ● Selenium doesn't wait until the hole site is loaded (Keyword: Implicit wait) ● webElement.xPath(“@// ...“) starts from root of the DOM (use “.//...“ instead) ● Google brings up “Selenium RC“ solutions. This is the old Selenium project. ● A reference to a WebElement will become invalid if the driver “moves“ to another page. ● Firefox doesn't run on our CI because it is a headless system (try Xvfb) ● New XPath 2.0 functions (like ends-with(...)) are failing. This is because Selenium uses the driver's native Xpath engine. For Firefox this means it is Xpath 1.0 today. 09.05.12 @chrschneider 19
  • 20. Noch Fragen? Vielen Dank für Ihre Aufmerksamkeit!