SlideShare a Scribd company logo
Web Pages Classification using
     Concept Analysis

                Porfirio Tramontana
                Anna Rita Fasolino
          Dipartimento di Informatica e Sistemistica
             University of Naples Federico II, Italy


               Giuseppe A. Di Lucca
      RCOST – Research Centre on Software Technology
           University of Sannio, Benevento, Italy


                                                       1
The Context

    With the diffusion of new paradigms and technological
     solutions for the Web (RIA and Ajax, SOA, …), existing
     Web Applications are rapidly become legacy

    A strategic objective: integrating that existing
     applications with the new platforms

    An open issue: defining effective approaches for
     analysing and classifying the Web Applications User
     Interface


                                                                                  2
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
Automatic Web Pages Classification
     Web Applications interact with users by means of Client
      Pages that are dynamically generated

     Given a classification of Built Client pages in a set of
      equivalent classes
          … classes may correspond to the different reached Web
           Application scenarios
          … classes may correspond to the different results of the
           execution of test cases

     The problem:
          To propose a technique for the automatic identification of the
           equivalence class a Web page belongs to

                                                                                  3
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
A possible application scenario
      Designing wrappers that encapsulate the original
       UI with the aim of exporting a renewed interface,
       such as a Web Service one
             The possible reached scenarios represent the classes
              to recognise.
             The Wrapper has to automatically identify the class in
              order to know the state of the interaction with the UI
              and the actions to be performed
                                                                                                       Access
                                               Login                           Password                Permitted
                              Login                            Password
                              Request                          Request
                                                                                              ?
                                                                                 Password                Access
                                                                                                         Denied

    G. Di Lorenzo, A. R. Fasolino, L. Melcarne, P. Tramontana, V. Vittorini, “Turning Web Applications into Web Services by Wrapping   4
    Techniques”, to appear using Concept Conference on Reverse Engineering, WCRE 2007,
Web Pages Classificationin the 14th WorkingAnalysis - ICSM 2007 – Paris – 4/10/2007 Vancouver, BC, Canada
The proposed solution
     To find, for any class, a combination of Web
      page features for any class that are satisfied just
      by pages belonging to that class (Classification
      model)

     We are interested to a solution that is:
          Accurate
                 The classification approach should be very reliable, in order
                  to support automatic tasks
          Efficient
                 The classification approach should support a fast
                  classification of dynamic built client pages
                                                                                  5
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
The proposed approach

Three step iterative process:
                                                      Training               Feature               Feature
                                                        Set                 Candidature           Reduction
1.   Feature Candidature
     •    Generation of a set of features,
          representing page                                                          Feature
                                                           Test
          characteristics                                   Set
                                                                                    Validation


2.   Feature Reduction
     •    Proposition of a combination of                                             Classifi-
                                                                                      Cation
          the candidate features allowing                                              Model
          the identification of the class of
          any Web page
                                                                  Html               Classifier
                                                                  page
3.   Feature Validation
     •    Evaluation of the accurateness
          of the proposed classification                                 Class 1            Class 2   …. Class n
          model (i.e. the set of
          discriminating expressions)
                                                      The resulting classification model allows a
                                                     Classifier module to automatically determine
                                                         the class any Web page belongs to
                                                                                                                   6
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
1 - Feature Candidature
                                                             //a/text()=“Activities”                   //h1/img
      A feature is a characteristic
       retrievable in a Web page
       interface
           It is retrievable by analysing its
            HTML code
                  A feature can be associated to a
                   XPath query (if the HTML code is
                   XHTML-compatible)


      A set of features that
       could be useful for
       discriminating a class are
       proposed
           Ideally    speaking,     we
            search for features that are
            retrievable just in the
                                                                                      /html/body/div[4]/form
            pages belonging to a class
                                                                                                               7
    Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
2 - Feature Reduction
     The goal of the Feature Reduction step is to
      obtain a Discriminating Expression for each
      class, i.e. a combination of features that is true
      for each Web page belonging to the class, false
      elsewhere

     The Feature Reduction is executed on the set of
      candidate features
          A training set of pre-classified pages is adopted



                                                                                  8
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
Evaluation Matrix

                                                                            Candidate Features
  M(i,j) is the value of
  the Feature j for the
                                          Training
      Web Page i
                                               Set
                                           Pages


                                                                     Class C1={P1, P3}
                                                                     Class C2={P2, P4}
                                                                     Class C3={P5, P6, P7}
The features with all true (or false) values
are deleted: they surely are not useful to
discriminate Classes




                                                                                                 9
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
Concept Analysis


 • Concept Analysis is adopted to
 find features that characterise
 classes of equivalent pages

 • A Concept Lattice may be
 automatically built on the basis of
 the Evaluation Matrix values
 (evaluated on the Training Set)


                                                                  Attributes ≡ Candidate Features

                                                                  Objects ≡ Training Set Pages

                                                                                                 10
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
Feature Classification
        According to Eisenbarth, Koschke and Simon*, features are classified, with
        respect to classes as:

              • Specific, if it is satisfied by all and only the pages of the class C;

             • Relevant: if it is satisfied by all the
             pages of the class C, but also by other                                          CPP  44



             pages from other classes;                                                        C                      P4
                                                                                                                              P77
                                                                                                                              P
                                                                                                             P11
                                                                                                             P
                                                                                               C        P1
             • CSPC (i.e., Conditionally Specific): if it                                     P2    P22
                                                                                                    P
                                                                                                                  P P
                                                                                                               SpecP 5    5
                                                                                                                          5

             is satisfied by a subset of pages of the                                                     P3 P
                                                                                                             P 33
                                                                                                                          P66
                                                                                                                          P
                                                                                                                    P5
             class C and by no other page from other                                               CSPC       P6
                                                                                                      SharedIrrelevant
                                                                                                     Rel
             classes;
             • Shared: if it is satisfied by a subset of
             pages of the class C and by other pages
             from other classes too;
             • Irrelevant: if it is not satisfied by any
             page of the classes C.
  *     T. Eisenbarth, R. Koschke, D. Simon, “Locating features in source code”, IEEE Transactions on Software
  Engineering, Volume 29, Issue 3, IEEE CS Press, March 2003, pp.210 – 224                                           11
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
For the class
                                                                                  Successful Login

                                                                                  • Logout is Specific

                                                                                  • Not(Incorrect) is
                                                                                  Relevant

                                                                                  • AdminLogged is
                                                                                  CSPC

                                                                                  • Not(AdminLogged)
                                                                                  is Shared

                                                                                  • Login is Irrelevant
                                                                                                        12
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
Feature Classification Evaluation

    The Classification of the features may be
     evaluated on the basis of the Concept
     Analysis results:
    Specific(C ) =  Intent (OC ( p )) −  Intent (OC ( p ))
                         p∈C                           p∈TS −C

    Re levant (C ) =  Intent (OC ( p )) − Specific(C )
                            p∈C

    CSPC (C ) =  Intent (OC ( p )) −                   Intent (OC ( p)) − Specific(C )
                      p∈C                            p∈TS −C

    Shared (C ) =  Intent (OC ( p )) − Specific(C ) − Re levant (C ) − CSPC (C )
                        p∈C

    Irrelevant (C ) = F − Specific (C ) − Re levant (C ) − CSPC (C ) − Shared (C )
                                                                                           13
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
Rules for generating Discriminating
                          Expressions
• Our goal is to obtain combinations of features (Discriminating Expressions) that are specific
for a single class, i.e. that are true for any page belonging to the class and false elsewhere

1.        If the class C includes at least one specific
          feature, any of these specific features can be a
          candidate discriminating expression of the class.
                                                                                              C    P4        CSPC1
                                                                                                   C
 2.       If there are not specific features, then a                                                 P   1
                                                                                                              P1
                                                                                                                                       P7
          candidate discriminating expression can be                                               C          P1    P5
                                                                                              P2
          obtained by considering the logic and of the                                                 P
                                                                                                   P2 P2 3
                                                                                                                      P
                                                                                                                    Spec      5


          relevant features                                                                                   PP3
                                                                                                               3
                                                                                                                         P6
                                                                                                                                  P6
          •     If there is a couple of features f1 and f2 so that f1==> f2                             Rel2
                                                                                                    CSPC2
                (i.e. Extent(AC(f1))⊆ Extent(AC(f2))), then it is possible to
                simplify the expression by discarding the feature f2 from it
                .                                                                                       Rel1

3.    If there are neither specific nor relevant features, then a
      candidate discriminating expression can be obtained by
      considering the logic or of the CSPC features
      •       If there is a couple of features f1 and f2 so that f1==>f2, it is possible to
              simplify the expression by discarding the feature f1 from it .
                                                                                                                                  14
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
3 - Discriminating Expression
                              Validation

       Recall And Precision:
                         # ( pagesAttributedToClassC )                    # ( pagesCorrectlyAttributedToClassC )
        Re call =                                          Pr ecision =
                    # ( AnalysedPagesBelongingToClassC )                        # ( pagesAttributedToClassC )




       Failed Classification: the page class has not been identified,
        because no discriminating feature (or more than one) has been
        satisfied by the page;
       Incorrect Classification: the page has been attributed to an
        incorrect class;
       Correct Classification: the page has been attributed to the correct
        class

                                                                                                                   15
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
A Classification Example

• Web Pages related to the AdminBooks
                                                                    Empty
page of the open source Web application
                                                                      List
Bookstore has been considered
• Five classes of Web pages
• 25 Web pages training set                                          Single
• 65 Web pages test set                                               Page




                                                                       Last
                                                                      Page



       First Page                     Central Page
                                                                                  16
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
A Classification Example


    Last
Empty List
  Single
   Page
   Page




                                       Central Page
                                       First Page
                                                                                  17
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
Candidate features
                                                            • 8 candidate features have been
                                                            considered
                                                             • f1, f2, f3, f4 have been discarded
                                                             (always true)
                                                            • Negated features have been introduced




                                                                                                    18
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
Discriminating Expressions – First iteration

                                                                      Training Set Validation
                                                                      There is not enough
                                                                      precision!


                                                                        A new iteration of feature
                                                                            candidature is needed
                                                                        New features have been added:
                                                                            1. We need to distinguish
                                                                                 when the words “Next”
                                                                                 and “Previous” are
                                                                                 links or simple labels
                                                                            2. We need to
                                                                                 distinguish between
                                                                                 empty page and single
                                                                                 page
                                                                                                  19
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
Discriminating Expressions – Second
                        Iteration

                                                                                    Training Set
                                                                                    Validation is OK but
                                                                                    there are some
                                                                                    problems in a page
                                                                                    belonging to the Test
                                                                                    Set (assigned to
                                                                                    Central page instead
                                                                                    of Last Page)

  The incorrect-assigned page has been added to the Training Set and the
  Candidate Discriminating Expressions have been evaluated


                                                                            Recall and Precision are,
                                                    Modified
                                                    Expression              now, 100% both on the
                                                    s                       Training Set and on the
                                                                            Test Set                20
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
Discussion
    Process Effort
         In the case study, about 100 minutes were needed to
          obtain the classification model
         The activities requiring human intervention was the
          most critical
                Training and Test Set Collection step
                      The reliability of the classification model depends on the
                       representativeness of the Training and Test Sets
                      Web pages resulting from the execution of a Test Suite may be
                       good candidates to fill in the Training and the Test Set
                Feature Candidature step
                      Heuristic techniques allowing a semi-automatic generation of
                       features are currently under experimentation




                                                                                      21
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
Further application scenarios

    Support to testing automation
          The possible reached pages are arranged in a set of
           equivalence classes, corresponding to different test results
          The proposed discriminating expressions make it possible the
           automatic evaluation of the testing results

    Support to dynamic analysis
          The recognition of the reached interaction scenarios makes it
           possible the automatic collection of logs of the visited use cases
           and scenarios, providing also useful information for user profiling




                                                                                  22
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
Conclusions and Future Works

    A process for the definition of a classification model
     allowing the automatic identification of classes of Built
     Client pages has been defined and validated
    The classification model allows the run-time identification
     of the classification of a Built Client page, with short
     response time and high precision

    Future Works:
          We are working for the process improvement, in order to reduce
           the effort needed to human made tasks
          Further experimentation will be carried out in order to assess the
           scalability of the approach


                                                                                  23
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
24
Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007

More Related Content

Viewers also liked

Recovering a Business Object Model from Web Applications
Recovering a Business Object Model from Web ApplicationsRecovering a Business Object Model from Web Applications
Recovering a Business Object Model from Web Applications
Porfirio Tramontana
 
Considering Context Events in Event-Based Testing of Mobile Applications
Considering Context Events in Event-Based Testing of Mobile Applications Considering Context Events in Event-Based Testing of Mobile Applications
Considering Context Events in Event-Based Testing of Mobile Applications
Porfirio Tramontana
 
Using GUI Ripping for Automated Testing of Android Apps
Using GUI Ripping for Automated Testing of Android AppsUsing GUI Ripping for Automated Testing of Android Apps
Using GUI Ripping for Automated Testing of Android Apps
Porfirio Tramontana
 
Reverse Engineering Techniques: from Web Applications to Rich Internet Applic...
Reverse Engineering Techniques: from Web Applications to Rich Internet Applic...Reverse Engineering Techniques: from Web Applications to Rich Internet Applic...
Reverse Engineering Techniques: from Web Applications to Rich Internet Applic...
Porfirio Tramontana
 
Techniques and Tools for Rich Internet Applications Testing
Techniques and Tools for Rich Internet Applications TestingTechniques and Tools for Rich Internet Applications Testing
Techniques and Tools for Rich Internet Applications Testing
Porfirio Tramontana
 
Reverse Engineering Finite State Machines from Rich Internet Applications
Reverse Engineering Finite State Machines from Rich Internet ApplicationsReverse Engineering Finite State Machines from Rich Internet Applications
Reverse Engineering Finite State Machines from Rich Internet Applications
Porfirio Tramontana
 
Comprehending Web Applications by a Clustering Based Approach
Comprehending Web Applications by a Clustering Based Approach Comprehending Web Applications by a Clustering Based Approach
Comprehending Web Applications by a Clustering Based Approach
Porfirio Tramontana
 
Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...
Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...
Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...
Porfirio Tramontana
 
Comprehending Ajax Web Applications by the DynaRIA Tool
Comprehending Ajax Web Applications by the DynaRIA ToolComprehending Ajax Web Applications by the DynaRIA Tool
Comprehending Ajax Web Applications by the DynaRIA Tool
Porfirio Tramontana
 
DynaRIA: a Tool for Ajax Web Application Comprehension
DynaRIA: a Tool for Ajax Web Application ComprehensionDynaRIA: a Tool for Ajax Web Application Comprehension
DynaRIA: a Tool for Ajax Web Application Comprehension
Porfirio Tramontana
 
3D Printing for Surgical Innovation: A Primer
3D Printing for Surgical Innovation:  A Primer3D Printing for Surgical Innovation:  A Primer
3D Printing for Surgical Innovation: A Primer
Nigel Parsad
 
Recovering Interaction Design Patterns in Web Applications
Recovering Interaction Design Patterns in Web Applications Recovering Interaction Design Patterns in Web Applications
Recovering Interaction Design Patterns in Web Applications
Porfirio Tramontana
 
WARE: a tool for the Reverse Engineering of Web Applications
WARE: a tool for the Reverse Engineering of Web Applications WARE: a tool for the Reverse Engineering of Web Applications
WARE: a tool for the Reverse Engineering of Web Applications
Porfirio Tramontana
 
Standard 3D medical images and printing
Standard 3D medical images and printingStandard 3D medical images and printing
Standard 3D medical images and printing
Young Moon
 

Viewers also liked (14)

Recovering a Business Object Model from Web Applications
Recovering a Business Object Model from Web ApplicationsRecovering a Business Object Model from Web Applications
Recovering a Business Object Model from Web Applications
 
Considering Context Events in Event-Based Testing of Mobile Applications
Considering Context Events in Event-Based Testing of Mobile Applications Considering Context Events in Event-Based Testing of Mobile Applications
Considering Context Events in Event-Based Testing of Mobile Applications
 
Using GUI Ripping for Automated Testing of Android Apps
Using GUI Ripping for Automated Testing of Android AppsUsing GUI Ripping for Automated Testing of Android Apps
Using GUI Ripping for Automated Testing of Android Apps
 
Reverse Engineering Techniques: from Web Applications to Rich Internet Applic...
Reverse Engineering Techniques: from Web Applications to Rich Internet Applic...Reverse Engineering Techniques: from Web Applications to Rich Internet Applic...
Reverse Engineering Techniques: from Web Applications to Rich Internet Applic...
 
Techniques and Tools for Rich Internet Applications Testing
Techniques and Tools for Rich Internet Applications TestingTechniques and Tools for Rich Internet Applications Testing
Techniques and Tools for Rich Internet Applications Testing
 
Reverse Engineering Finite State Machines from Rich Internet Applications
Reverse Engineering Finite State Machines from Rich Internet ApplicationsReverse Engineering Finite State Machines from Rich Internet Applications
Reverse Engineering Finite State Machines from Rich Internet Applications
 
Comprehending Web Applications by a Clustering Based Approach
Comprehending Web Applications by a Clustering Based Approach Comprehending Web Applications by a Clustering Based Approach
Comprehending Web Applications by a Clustering Based Approach
 
Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...
Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...
Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...
 
Comprehending Ajax Web Applications by the DynaRIA Tool
Comprehending Ajax Web Applications by the DynaRIA ToolComprehending Ajax Web Applications by the DynaRIA Tool
Comprehending Ajax Web Applications by the DynaRIA Tool
 
DynaRIA: a Tool for Ajax Web Application Comprehension
DynaRIA: a Tool for Ajax Web Application ComprehensionDynaRIA: a Tool for Ajax Web Application Comprehension
DynaRIA: a Tool for Ajax Web Application Comprehension
 
3D Printing for Surgical Innovation: A Primer
3D Printing for Surgical Innovation:  A Primer3D Printing for Surgical Innovation:  A Primer
3D Printing for Surgical Innovation: A Primer
 
Recovering Interaction Design Patterns in Web Applications
Recovering Interaction Design Patterns in Web Applications Recovering Interaction Design Patterns in Web Applications
Recovering Interaction Design Patterns in Web Applications
 
WARE: a tool for the Reverse Engineering of Web Applications
WARE: a tool for the Reverse Engineering of Web Applications WARE: a tool for the Reverse Engineering of Web Applications
WARE: a tool for the Reverse Engineering of Web Applications
 
Standard 3D medical images and printing
Standard 3D medical images and printingStandard 3D medical images and printing
Standard 3D medical images and printing
 

Similar to Web Pages Classification using Concept Analysis

DSL, Page Object and WebDriver – the path to reliable functional tests.pptx
DSL, Page Object and WebDriver – the path to reliable functional tests.pptxDSL, Page Object and WebDriver – the path to reliable functional tests.pptx
DSL, Page Object and WebDriver – the path to reliable functional tests.pptx
Mikalai Alimenkou
 
DSL, Page Object и WebDriver – путь к надежным функциональным тестам
DSL, Page Object и WebDriver – путь к надежным функциональным тестамDSL, Page Object и WebDriver – путь к надежным функциональным тестам
DSL, Page Object и WebDriver – путь к надежным функциональным тестам
SQALab
 
Coast
CoastCoast
Coast
ESUG
 
Software Lifecycle Collaboration The Wiki Way
Software Lifecycle Collaboration The Wiki WaySoftware Lifecycle Collaboration The Wiki Way
Software Lifecycle Collaboration The Wiki Way
Serebrum Corporation
 
Selenium webdriver course content rakesh hansalia
Selenium webdriver course content rakesh hansaliaSelenium webdriver course content rakesh hansalia
Selenium webdriver course content rakesh hansaliaRakesh Hansalia
 
Web page classification features and algorithms
Web page classification features and algorithmsWeb page classification features and algorithms
Web page classification features and algorithmsunyil96
 
10265 developing data access solutions with microsoft visual studio 2010
10265 developing data access solutions with microsoft visual studio 201010265 developing data access solutions with microsoft visual studio 2010
10265 developing data access solutions with microsoft visual studio 2010bestip
 
Design patterns
Design patternsDesign patterns
Design patternsAlok Guha
 
Liferay Portal Customizing to Business Needs
Liferay Portal Customizing to Business NeedsLiferay Portal Customizing to Business Needs
Liferay Portal Customizing to Business Needs
InfoAxon Technologies Limited
 
Bartlesville Dot Net User Group Design Patterns
Bartlesville Dot Net User Group Design PatternsBartlesville Dot Net User Group Design Patterns
Bartlesville Dot Net User Group Design PatternsJason Townsend, MBA
 
Test Automation Framework Development Introduction
Test Automation Framework Development IntroductionTest Automation Framework Development Introduction
Test Automation Framework Development Introduction
Ganuka Yashantha
 
Resume debasish
Resume debasish Resume debasish
Resume debasish
Debasish Hotta
 
Naukri Search Team achievements, 2009-2010
Naukri Search Team achievements, 2009-2010Naukri Search Team achievements, 2009-2010
Naukri Search Team achievements, 2009-2010Aditya Varun Chadha
 
A1303060109
A1303060109A1303060109
A1303060109
IOSR Journals
 
A1303060109
A1303060109A1303060109
A1303060109
IOSR Journals
 
Introduction To Design Patterns
Introduction To Design PatternsIntroduction To Design Patterns
Introduction To Design Patterns
sukumarraju6
 
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Hans Põldoja
 

Similar to Web Pages Classification using Concept Analysis (20)

DSL, Page Object and WebDriver – the path to reliable functional tests.pptx
DSL, Page Object and WebDriver – the path to reliable functional tests.pptxDSL, Page Object and WebDriver – the path to reliable functional tests.pptx
DSL, Page Object and WebDriver – the path to reliable functional tests.pptx
 
DSL, Page Object и WebDriver – путь к надежным функциональным тестам
DSL, Page Object и WebDriver – путь к надежным функциональным тестамDSL, Page Object и WebDriver – путь к надежным функциональным тестам
DSL, Page Object и WebDriver – путь к надежным функциональным тестам
 
Coast
CoastCoast
Coast
 
Unit 5
Unit 5Unit 5
Unit 5
 
Unit 5
Unit 5Unit 5
Unit 5
 
Software Lifecycle Collaboration The Wiki Way
Software Lifecycle Collaboration The Wiki WaySoftware Lifecycle Collaboration The Wiki Way
Software Lifecycle Collaboration The Wiki Way
 
Selenium webdriver course content rakesh hansalia
Selenium webdriver course content rakesh hansaliaSelenium webdriver course content rakesh hansalia
Selenium webdriver course content rakesh hansalia
 
Web page classification features and algorithms
Web page classification features and algorithmsWeb page classification features and algorithms
Web page classification features and algorithms
 
10265 developing data access solutions with microsoft visual studio 2010
10265 developing data access solutions with microsoft visual studio 201010265 developing data access solutions with microsoft visual studio 2010
10265 developing data access solutions with microsoft visual studio 2010
 
Design patterns
Design patternsDesign patterns
Design patterns
 
Liferay Portal Customizing to Business Needs
Liferay Portal Customizing to Business NeedsLiferay Portal Customizing to Business Needs
Liferay Portal Customizing to Business Needs
 
Bartlesville Dot Net User Group Design Patterns
Bartlesville Dot Net User Group Design PatternsBartlesville Dot Net User Group Design Patterns
Bartlesville Dot Net User Group Design Patterns
 
Test Automation Framework Development Introduction
Test Automation Framework Development IntroductionTest Automation Framework Development Introduction
Test Automation Framework Development Introduction
 
Resume debasish
Resume debasish Resume debasish
Resume debasish
 
Naukri Search Team achievements, 2009-2010
Naukri Search Team achievements, 2009-2010Naukri Search Team achievements, 2009-2010
Naukri Search Team achievements, 2009-2010
 
A1303060109
A1303060109A1303060109
A1303060109
 
A1303060109
A1303060109A1303060109
A1303060109
 
Introduction To Design Patterns
Introduction To Design PatternsIntroduction To Design Patterns
Introduction To Design Patterns
 
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
 
Unit i
Unit iUnit i
Unit i
 

More from Porfirio Tramontana

An Approach for Model Based Testing of Augmented Reality Applications.pdf
An Approach for Model Based Testing of Augmented Reality Applications.pdfAn Approach for Model Based Testing of Augmented Reality Applications.pdf
An Approach for Model Based Testing of Augmented Reality Applications.pdf
Porfirio Tramontana
 
Towards the Generation of Robust E2E Test Cases in Template-based Web Applica...
Towards the Generation of Robust E2E Test Cases in Template-based Web Applica...Towards the Generation of Robust E2E Test Cases in Template-based Web Applica...
Towards the Generation of Robust E2E Test Cases in Template-based Web Applica...
Porfirio Tramontana
 
Techniques and Tools for Mobile Testing Automation
Techniques and Tools for Mobile Testing AutomationTechniques and Tools for Mobile Testing Automation
Techniques and Tools for Mobile Testing Automation
Porfirio Tramontana
 
A technique for parallel gui testing of android applications
A technique for parallel gui testing of android applicationsA technique for parallel gui testing of android applications
A technique for parallel gui testing of android applications
Porfirio Tramontana
 
Reverse Engineering of Data Models from Legacy Spreadsheets-Based Systems: An...
Reverse Engineering of Data Models from Legacy Spreadsheets-Based Systems: An...Reverse Engineering of Data Models from Legacy Spreadsheets-Based Systems: An...
Reverse Engineering of Data Models from Legacy Spreadsheets-Based Systems: An...
Porfirio Tramontana
 
Reverse Engineering Web Applications
Reverse Engineering Web ApplicationsReverse Engineering Web Applications
Reverse Engineering Web Applications
Porfirio Tramontana
 
A GUI Crawling-based Technique for Android Mobile Application Testing
A GUI Crawling-based Technique for Android Mobile Application TestingA GUI Crawling-based Technique for Android Mobile Application Testing
A GUI Crawling-based Technique for Android Mobile Application Testing
Porfirio Tramontana
 
Rich Internet Application Testing Using Execution Trace Data
Rich Internet Application Testing  Using Execution Trace Data Rich Internet Application Testing  Using Execution Trace Data
Rich Internet Application Testing Using Execution Trace Data
Porfirio Tramontana
 
An AHP-based Framework for Quality and Security Evaluation
An AHP-based Framework for Quality and Security EvaluationAn AHP-based Framework for Quality and Security Evaluation
An AHP-based Framework for Quality and Security Evaluation
Porfirio Tramontana
 
Experimenting a Reverse Engineering Technique for Modelling the Behaviour of ...
Experimenting a Reverse Engineering Technique for Modelling the Behaviour of ...Experimenting a Reverse Engineering Technique for Modelling the Behaviour of ...
Experimenting a Reverse Engineering Technique for Modelling the Behaviour of ...
Porfirio Tramontana
 
A policy-based evaluation framework for Quality and Security in Service Orien...
A policy-based evaluation framework for Quality and Security in Service Orien...A policy-based evaluation framework for Quality and Security in Service Orien...
A policy-based evaluation framework for Quality and Security in Service Orien...
Porfirio Tramontana
 
Turning Web Applications into Web Services by Wrapping Techniques
Turning Web Applications into Web Services by Wrapping TechniquesTurning Web Applications into Web Services by Wrapping Techniques
Turning Web Applications into Web Services by Wrapping Techniques
Porfirio Tramontana
 
Migrating Interactive Legacy Systems To Web Services
Migrating Interactive Legacy Systems To Web ServicesMigrating Interactive Legacy Systems To Web Services
Migrating Interactive Legacy Systems To Web Services
Porfirio Tramontana
 
A Technique for Reducing User Session Data Sets in Web Application Testing
A Technique for Reducing User Session Data Sets in Web Application TestingA Technique for Reducing User Session Data Sets in Web Application Testing
A Technique for Reducing User Session Data Sets in Web Application Testing
Porfirio Tramontana
 
Web Site Accessibility: Identifying and Fixing Accessibility Problems in Clie...
Web Site Accessibility: Identifying and Fixing Accessibility Problems in Clie...Web Site Accessibility: Identifying and Fixing Accessibility Problems in Clie...
Web Site Accessibility: Identifying and Fixing Accessibility Problems in Clie...Porfirio Tramontana
 

More from Porfirio Tramontana (15)

An Approach for Model Based Testing of Augmented Reality Applications.pdf
An Approach for Model Based Testing of Augmented Reality Applications.pdfAn Approach for Model Based Testing of Augmented Reality Applications.pdf
An Approach for Model Based Testing of Augmented Reality Applications.pdf
 
Towards the Generation of Robust E2E Test Cases in Template-based Web Applica...
Towards the Generation of Robust E2E Test Cases in Template-based Web Applica...Towards the Generation of Robust E2E Test Cases in Template-based Web Applica...
Towards the Generation of Robust E2E Test Cases in Template-based Web Applica...
 
Techniques and Tools for Mobile Testing Automation
Techniques and Tools for Mobile Testing AutomationTechniques and Tools for Mobile Testing Automation
Techniques and Tools for Mobile Testing Automation
 
A technique for parallel gui testing of android applications
A technique for parallel gui testing of android applicationsA technique for parallel gui testing of android applications
A technique for parallel gui testing of android applications
 
Reverse Engineering of Data Models from Legacy Spreadsheets-Based Systems: An...
Reverse Engineering of Data Models from Legacy Spreadsheets-Based Systems: An...Reverse Engineering of Data Models from Legacy Spreadsheets-Based Systems: An...
Reverse Engineering of Data Models from Legacy Spreadsheets-Based Systems: An...
 
Reverse Engineering Web Applications
Reverse Engineering Web ApplicationsReverse Engineering Web Applications
Reverse Engineering Web Applications
 
A GUI Crawling-based Technique for Android Mobile Application Testing
A GUI Crawling-based Technique for Android Mobile Application TestingA GUI Crawling-based Technique for Android Mobile Application Testing
A GUI Crawling-based Technique for Android Mobile Application Testing
 
Rich Internet Application Testing Using Execution Trace Data
Rich Internet Application Testing  Using Execution Trace Data Rich Internet Application Testing  Using Execution Trace Data
Rich Internet Application Testing Using Execution Trace Data
 
An AHP-based Framework for Quality and Security Evaluation
An AHP-based Framework for Quality and Security EvaluationAn AHP-based Framework for Quality and Security Evaluation
An AHP-based Framework for Quality and Security Evaluation
 
Experimenting a Reverse Engineering Technique for Modelling the Behaviour of ...
Experimenting a Reverse Engineering Technique for Modelling the Behaviour of ...Experimenting a Reverse Engineering Technique for Modelling the Behaviour of ...
Experimenting a Reverse Engineering Technique for Modelling the Behaviour of ...
 
A policy-based evaluation framework for Quality and Security in Service Orien...
A policy-based evaluation framework for Quality and Security in Service Orien...A policy-based evaluation framework for Quality and Security in Service Orien...
A policy-based evaluation framework for Quality and Security in Service Orien...
 
Turning Web Applications into Web Services by Wrapping Techniques
Turning Web Applications into Web Services by Wrapping TechniquesTurning Web Applications into Web Services by Wrapping Techniques
Turning Web Applications into Web Services by Wrapping Techniques
 
Migrating Interactive Legacy Systems To Web Services
Migrating Interactive Legacy Systems To Web ServicesMigrating Interactive Legacy Systems To Web Services
Migrating Interactive Legacy Systems To Web Services
 
A Technique for Reducing User Session Data Sets in Web Application Testing
A Technique for Reducing User Session Data Sets in Web Application TestingA Technique for Reducing User Session Data Sets in Web Application Testing
A Technique for Reducing User Session Data Sets in Web Application Testing
 
Web Site Accessibility: Identifying and Fixing Accessibility Problems in Clie...
Web Site Accessibility: Identifying and Fixing Accessibility Problems in Clie...Web Site Accessibility: Identifying and Fixing Accessibility Problems in Clie...
Web Site Accessibility: Identifying and Fixing Accessibility Problems in Clie...
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 

Web Pages Classification using Concept Analysis

  • 1. Web Pages Classification using Concept Analysis Porfirio Tramontana Anna Rita Fasolino Dipartimento di Informatica e Sistemistica University of Naples Federico II, Italy Giuseppe A. Di Lucca RCOST – Research Centre on Software Technology University of Sannio, Benevento, Italy 1
  • 2. The Context  With the diffusion of new paradigms and technological solutions for the Web (RIA and Ajax, SOA, …), existing Web Applications are rapidly become legacy  A strategic objective: integrating that existing applications with the new platforms  An open issue: defining effective approaches for analysing and classifying the Web Applications User Interface 2 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 3. Automatic Web Pages Classification  Web Applications interact with users by means of Client Pages that are dynamically generated  Given a classification of Built Client pages in a set of equivalent classes  … classes may correspond to the different reached Web Application scenarios  … classes may correspond to the different results of the execution of test cases  The problem:  To propose a technique for the automatic identification of the equivalence class a Web page belongs to 3 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 4. A possible application scenario  Designing wrappers that encapsulate the original UI with the aim of exporting a renewed interface, such as a Web Service one  The possible reached scenarios represent the classes to recognise.  The Wrapper has to automatically identify the class in order to know the state of the interaction with the UI and the actions to be performed Access Login Password Permitted Login Password Request Request ? Password Access Denied G. Di Lorenzo, A. R. Fasolino, L. Melcarne, P. Tramontana, V. Vittorini, “Turning Web Applications into Web Services by Wrapping 4 Techniques”, to appear using Concept Conference on Reverse Engineering, WCRE 2007, Web Pages Classificationin the 14th WorkingAnalysis - ICSM 2007 – Paris – 4/10/2007 Vancouver, BC, Canada
  • 5. The proposed solution  To find, for any class, a combination of Web page features for any class that are satisfied just by pages belonging to that class (Classification model)  We are interested to a solution that is:  Accurate  The classification approach should be very reliable, in order to support automatic tasks  Efficient  The classification approach should support a fast classification of dynamic built client pages 5 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 6. The proposed approach Three step iterative process: Training Feature Feature Set Candidature Reduction 1. Feature Candidature • Generation of a set of features, representing page Feature Test characteristics Set Validation 2. Feature Reduction • Proposition of a combination of Classifi- Cation the candidate features allowing Model the identification of the class of any Web page Html Classifier page 3. Feature Validation • Evaluation of the accurateness of the proposed classification Class 1 Class 2 …. Class n model (i.e. the set of discriminating expressions) The resulting classification model allows a Classifier module to automatically determine the class any Web page belongs to 6 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 7. 1 - Feature Candidature //a/text()=“Activities” //h1/img  A feature is a characteristic retrievable in a Web page interface  It is retrievable by analysing its HTML code  A feature can be associated to a XPath query (if the HTML code is XHTML-compatible)  A set of features that could be useful for discriminating a class are proposed  Ideally speaking, we search for features that are retrievable just in the /html/body/div[4]/form pages belonging to a class 7 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 8. 2 - Feature Reduction  The goal of the Feature Reduction step is to obtain a Discriminating Expression for each class, i.e. a combination of features that is true for each Web page belonging to the class, false elsewhere  The Feature Reduction is executed on the set of candidate features  A training set of pre-classified pages is adopted 8 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 9. Evaluation Matrix Candidate Features M(i,j) is the value of the Feature j for the Training Web Page i Set Pages Class C1={P1, P3} Class C2={P2, P4} Class C3={P5, P6, P7} The features with all true (or false) values are deleted: they surely are not useful to discriminate Classes 9 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 10. Concept Analysis • Concept Analysis is adopted to find features that characterise classes of equivalent pages • A Concept Lattice may be automatically built on the basis of the Evaluation Matrix values (evaluated on the Training Set) Attributes ≡ Candidate Features Objects ≡ Training Set Pages 10 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 11. Feature Classification According to Eisenbarth, Koschke and Simon*, features are classified, with respect to classes as: • Specific, if it is satisfied by all and only the pages of the class C; • Relevant: if it is satisfied by all the pages of the class C, but also by other CPP 44 pages from other classes; C P4 P77 P P11 P C P1 • CSPC (i.e., Conditionally Specific): if it P2 P22 P P P SpecP 5 5 5 is satisfied by a subset of pages of the P3 P P 33 P66 P P5 class C and by no other page from other CSPC P6 SharedIrrelevant Rel classes; • Shared: if it is satisfied by a subset of pages of the class C and by other pages from other classes too; • Irrelevant: if it is not satisfied by any page of the classes C. * T. Eisenbarth, R. Koschke, D. Simon, “Locating features in source code”, IEEE Transactions on Software Engineering, Volume 29, Issue 3, IEEE CS Press, March 2003, pp.210 – 224 11 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 12. For the class Successful Login • Logout is Specific • Not(Incorrect) is Relevant • AdminLogged is CSPC • Not(AdminLogged) is Shared • Login is Irrelevant 12 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 13. Feature Classification Evaluation  The Classification of the features may be evaluated on the basis of the Concept Analysis results: Specific(C ) =  Intent (OC ( p )) −  Intent (OC ( p )) p∈C p∈TS −C Re levant (C ) =  Intent (OC ( p )) − Specific(C ) p∈C CSPC (C ) =  Intent (OC ( p )) −  Intent (OC ( p)) − Specific(C ) p∈C p∈TS −C Shared (C ) =  Intent (OC ( p )) − Specific(C ) − Re levant (C ) − CSPC (C ) p∈C Irrelevant (C ) = F − Specific (C ) − Re levant (C ) − CSPC (C ) − Shared (C ) 13 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 14. Rules for generating Discriminating Expressions • Our goal is to obtain combinations of features (Discriminating Expressions) that are specific for a single class, i.e. that are true for any page belonging to the class and false elsewhere 1. If the class C includes at least one specific feature, any of these specific features can be a candidate discriminating expression of the class. C P4 CSPC1 C 2. If there are not specific features, then a P 1 P1 P7 candidate discriminating expression can be C P1 P5 P2 obtained by considering the logic and of the P P2 P2 3 P Spec 5 relevant features PP3 3 P6 P6 • If there is a couple of features f1 and f2 so that f1==> f2 Rel2 CSPC2 (i.e. Extent(AC(f1))⊆ Extent(AC(f2))), then it is possible to simplify the expression by discarding the feature f2 from it . Rel1 3. If there are neither specific nor relevant features, then a candidate discriminating expression can be obtained by considering the logic or of the CSPC features • If there is a couple of features f1 and f2 so that f1==>f2, it is possible to simplify the expression by discarding the feature f1 from it . 14 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 15. 3 - Discriminating Expression Validation  Recall And Precision: # ( pagesAttributedToClassC ) # ( pagesCorrectlyAttributedToClassC ) Re call = Pr ecision = # ( AnalysedPagesBelongingToClassC ) # ( pagesAttributedToClassC )  Failed Classification: the page class has not been identified, because no discriminating feature (or more than one) has been satisfied by the page;  Incorrect Classification: the page has been attributed to an incorrect class;  Correct Classification: the page has been attributed to the correct class 15 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 16. A Classification Example • Web Pages related to the AdminBooks Empty page of the open source Web application List Bookstore has been considered • Five classes of Web pages • 25 Web pages training set Single • 65 Web pages test set Page Last Page First Page Central Page 16 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 17. A Classification Example Last Empty List Single Page Page Central Page First Page 17 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 18. Candidate features • 8 candidate features have been considered • f1, f2, f3, f4 have been discarded (always true) • Negated features have been introduced 18 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 19. Discriminating Expressions – First iteration Training Set Validation There is not enough precision! A new iteration of feature candidature is needed New features have been added: 1. We need to distinguish when the words “Next” and “Previous” are links or simple labels 2. We need to distinguish between empty page and single page 19 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 20. Discriminating Expressions – Second Iteration Training Set Validation is OK but there are some problems in a page belonging to the Test Set (assigned to Central page instead of Last Page) The incorrect-assigned page has been added to the Training Set and the Candidate Discriminating Expressions have been evaluated Recall and Precision are, Modified Expression now, 100% both on the s Training Set and on the Test Set 20 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 21. Discussion  Process Effort  In the case study, about 100 minutes were needed to obtain the classification model  The activities requiring human intervention was the most critical  Training and Test Set Collection step  The reliability of the classification model depends on the representativeness of the Training and Test Sets  Web pages resulting from the execution of a Test Suite may be good candidates to fill in the Training and the Test Set  Feature Candidature step  Heuristic techniques allowing a semi-automatic generation of features are currently under experimentation 21 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 22. Further application scenarios  Support to testing automation  The possible reached pages are arranged in a set of equivalence classes, corresponding to different test results  The proposed discriminating expressions make it possible the automatic evaluation of the testing results  Support to dynamic analysis  The recognition of the reached interaction scenarios makes it possible the automatic collection of logs of the visited use cases and scenarios, providing also useful information for user profiling 22 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 23. Conclusions and Future Works  A process for the definition of a classification model allowing the automatic identification of classes of Built Client pages has been defined and validated  The classification model allows the run-time identification of the classification of a Built Client page, with short response time and high precision  Future Works:  We are working for the process improvement, in order to reduce the effort needed to human made tasks  Further experimentation will be carried out in order to assess the scalability of the approach 23 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007
  • 24. 24 Web Pages Classification using Concept Analysis - ICSM 2007 – Paris – 4/10/2007

Editor's Notes

  1. In example the ones generated by a single server page
  2. Intent and Object Concept are evaluated on the Concept Lattice
  3. Comparison with other techniques There are a number of other techniques in literature that address similar problems Tecniques based on clone analysis Tecniques based on artificial intelligence Our technique is characterised by very high reliability and a sustainable effort for the execution of the