WARE: a tool for the Reverse
        Engineering of Web Applications


                      Anna Rita Fasolino
     G. A. Di Lucca, F. Pace, P. Tramontana, U. De Carlini

                  Dipartimento di Informatica e Sistemistica
                  University of Naples Federico II, Italy



CSMR 2002, Budapest, Hungary
                                                1
Web Applications (WA):
               problems and open issues

  • The pressing market demand of web applications
      – WAs developed in very short time


  • The continuously changing needs of the evolving
    application domains
      – WAs frequently and rapidly modified with ad hoc approaches

  • The lack of method in producing and maintaining
    Web applications
      – low quality software, with disordered architecture, and
        inadequate and incomplete documentation

CSMR 2002, Budapest, Hungary
                                                      2
Web applications: a recent classification


  • Class 1: primarily static web sites

  • Class 2: sites providing client-side interaction
      – Based on an event model exploiting some script code


  • Class 3: applications with dynamic content
      – Pages are dynamically created on the fly, with the support of
        various WWW rechnologies (CGI, JSP, PHP, Javascript,
        XML, ODBC,…ASP,… )
                                          Tilley and Huang, ICSE 2001


CSMR 2002, Budapest, Hungary
                                                        3
Dynamic web applications


  • Due to the large number of employed technologies,
    understanding, maintaining and evolving a Class 3
    application is a complex task
  • The need for specific Reverse Engineering
    techniques and processes to recover:
      – Static and dynamic aspects of the applications
      – Suitable representation models
  • The need for reverse engineering tools that support
    the extraction and the abstraction of the needed
    information from a WA

CSMR 2002, Budapest, Hungary
                                                     4
Reverse engineering Web Applications

  • A reverse engineering process to recover the
    following views:
      – The static architecture of the WA
      – The dynamic interactions between its components
      – The final behavior offered

  • Extended UML diagrams to represent these views
      – Class diagrams to model the architecture
      – Sequence and Collaboration diagrams to represent the
        dynamic model
      – Use case diagrams to describe the behavior of the WA



CSMR 2002, Budapest, Hungary
                                                   5
The Reverse Engineering process


                            STATIC    Components and
   Web Application                     Direct Relations
                           ANALYSIS
    Source Code                         between them


                                      BEHAVIORAL          Behavioral
                                       ANALYSIS             Model




                           DYNAMIC     Dynamic Interactions
           WA in           ANALYSIS    between Components
         Execution




CSMR 2002, Budapest, Hungary
                                                   6
Views obtainable by static analysis

 •   A coarse grained view:
      – Web pages and Hypertextual links between pages :
      – Pages are distinguished into server and client pages, static and
        dynamic pages, simple and framed pages

 •   A finer grained view:
      – inner page components and relationships
      – input/output form, text box, anchors, scripts, applets, text, images,
        multimedia objects (sounds, movies), …...
      – Page components may be active components (e.g.,scripts or
        applets)
      – The relationships include: submit, build, redirect, include

 •   A UML class diagram representing both views



CSMR 2002, Budapest, Hungary
                                                              7
The meta-model of a WA




CSMR 2002, Budapest, Hungary
                                     8
The WARE tool


• Designed to support the reverse engineering of a WA
     – executes the static analysis of the WA source code
     – populates a repository with the extracted information
     – supports the user in abstracting the WA models


• Three main components:
     – Interface Layer
     – Service Layer
     – Repository



CSMR 2002, Budapest, Hungary
                                                      9
Interface layer
                                        Graphical Visualizer
                             WARE
                              GUI         VCG        Dotty        RIGI


  WARE
Architecture
                          Service Layer
                          Parsers
                                                      IRF Translator
                               HTML     JS
                  WA           ASP                  Query Executor
                 Source                 ….
                  Files        VBS
                                                      UML Diagram
                               PHP                     Abstractor


                           Extractor                           Abstractor




                                       IRF             Diagrams

                                             DBR         Repository


CSMR 2002, Budapest, Hungary
                                                                            10
The Service layer

  • The Extractor component
      – The Extractor parses the WA source code and produces an
        Intermediate Representation Form (IRF)
      – Several distinct parsers (for HTML, JavaScript, VBScript, ASP and
        PHP technologies)
      – New parsers can be added as the technology evolves
  • The Abstractor component
      – It implements abstraction operations necessary for reconstructing
        the more abstract views of the WA
      – It includes three main components
          • A Translator that populates the relational database from the IRF
          • A Query Executor that implements predefined queries over the
            database
          • The UML Diagram Abstractor that produces the class diagrams of a
            WA at various degrees of detail and other relevant information



CSMR 2002, Budapest, Hungary
                                                             11
The conceptual schema of the Repository
      Java Applet        Media         Flash Object       Mail Address         Other Object




                         Web Object                            redirect                                                                                         Server Class
                                                                                                                                    Server Function




  Client Function
                                                        Static Page         Built Page                                   redirect               Server Script
                                                                                                           Web Page

                                                                                                                                                                  Session Variable


                    Client Script                                                         Parameter


                                        Modify Tag              Client Page
                                                                                                                                    Server Page       include                  Server Cookie

                          event
                                                                               Anchor


  Client Module                     HTML Tag
                                                                                                                                             Interface Object
                                                                            Download           Hyperlink

                                                                                                               Submits
                                                                  Form
                                                                                                                         DB Interface         Mail Interface      Server File Interface        Other Interface

                                        Frameset

                                                                                         Generic File
                                                                    Field




                                                      Select      Button      Textarea




                                          Frame
                                                               source


CSMR 2002, Budapest, Hungary
                                                                                                                                                                12
The Interface Layer

  • It provides the user interface for accessing the WARE
    facilities:
       – Reverse Engineering for parsing the WA and producing the
         IRF
       – Comprehension for executing comprehension-related
         activities, such as exploring the inventory of the WA
         components and their source code, computing the
         reachability set of a component, creating clusters of related
         components according to a given clustering criterion
       – Query (Predefined query and Parametric query) for
         activating the Abstractor’s functions, and graphically
         visualizing the recovered information


CSMR 2002, Budapest, Hungary
                                                       13
Computation of
                                                                   the Reachability
                                                                     set of a WA
                                                                     component




Figure 4.4 – b: WA Components reachable starting from a selected
                         component




     Grouping the WA
     components into
     cohesive clusters


CSMR 2002, Budapest, Hungary
                                                                    14
An experiment

  • Carried out for assessing the adequacy of the tool functions in
    supporting maintenance tasks

  • The tool has been used to understand and redocument existing
    WAs

  • An example: A WA implementing a ‘Juridical Laboratory’
      –   201 files, in 19 directories, sizing 4 Mbytes
      –   55 Static Pages (55 HTML files in 10 directories)
      –   19 Server Pages (19 ASP files in 4 directories)
      –   No design documentation available
  • The WARE tool was preliminarly used to perform:
      – The static analysis of the application, and for producing an
        inventory of the WA components

CSMR 2002, Budapest, Hungary
                                                              15
First step: the static analysis


  • Main results:
      – The population of the repository automatically produced by
        the tool
      – The inventory of the WA components
      – Graphical representations of the class diagram of the WA at
        various degrees of detail

          • Coarse-grained representations (including just the pages)
          • Fine-grained representation (including inner page components)




CSMR 2002, Budapest, Hungary
                                                         16
The inventory of the WA components
                      Component type             # Detected
              Server page                            19
              Client Static page                     55
              Client Built page                      14
              External web page                      3
              Client script block                    53
              Function in Client script block        19
              Form                                   11
              Input/ output field                    71
              Submit Operation (POST method)         4
              Submit Operation (GET method)          7
              Anchor to files to be downloaded      111
              Anchor to Hypertextual link            49
              Data File                              61
              Server script block                    76
              Function in Server scripts             4
              Database Interface Object              29
              Mail Interface Object                  3
              Image file                             65
              Redirect operation in server           7
              blocks
CSMR 2002, Budapest, Hungary
                                                              17
The Class diagram of the WA




CSMR 2002, Budapest, Hungary
                                    18
The second phase: formulating and validating
        hypotheses about the WA behavior


  • Driven by the graphical representations, notable sub-graphs
    were looked for (isolated sub-graphs, sub-trees, strongly
    connected components, …)

  • A tentative hypothesis about the behavior implemented by each
    sub-graph was formulated with the support of:
      – the names of the components, their source code analysis, and by
        tracing the application execution

  • Hypotheses were validated by the source code execution




CSMR 2002, Budapest, Hungary
                                                         19
Results from the sub-graph analysis

  • Four isolated sub-graphs
       – Three small ones without static client pages
           • Two of them implemented server-side functions for the web
             administrator (management of the mailing list, and of the
             registered users list)
           • The third one resulted from an incorrectly made maintenance
             operation
       – A large one, rooted in the home page
           • Nine user functions could be associated with nine notable sub-
             graphs contained in it




CSMR 2002, Budapest, Hungary
                                                          20
The sub-graphs in the WA class diagram

                                      Professional Yellow Pages


   Mailing List
                                                         Judicial
                                                         Proceedings
                                                     Judicial
                                                     Case
 News                                                Studies
                   Home
                                                      Web Searching
                               Chat


                                                          Forum


                                      About
CSMR 2002, Budapest, Hungary
                                                    21
Modeling the WA behavior
                  About
                                                                      • A use case was defined and
                                                                        associated with each notable
                      News

                                      Home
                                                                        sub-graph

                      Judicial Case Studies
                                                                              • The scenarios describing the
                                                                                use cases were defined with
     User

                                        Judicial Proceedings
                                                                                the support of the tool (the
                                                                                interactions between objects
                  Web Searching
                                  Professionals Yellow Pages                    involved in the sub-graph
Registered User
                                                               WA Administrator were searched for and
                    Forum                 Mailing List                          analyzed)

                     Chat




  CSMR 2002, Budapest, Hungary
                                                                                           22
Modeling the WA dynamic view




CSMR 2002, Budapest, Hungary
                                   23
Conclusions


  • The experiments we carried out showed that the
    reverse engineering tool WARE can be used to
    support the comprehension of Web Applications to be
    maintained
  • It supports the reconstruction of various UML
    diagrams from undocumented applications:
      – Static views are automatically produced by the tool
      – Behavioral and dynamic views can be semi-automatically
        obtained with the tool assistance
  • A reverse engineering process based on the WARE
    tool could be defined

CSMR 2002, Budapest, Hungary
                                                   24
Future work


  • Specific clustering criteria may support a more
    effective analysis of the class diagrams recovered by
    the tool
  • A clustering approach should take into account both
    the topology of the graphs, and the typology of the
    connections in the graph
  • The automatic clustering of Was will be investigated
    in the context of large size web applications




CSMR 2002, Budapest, Hungary
                                             25

WARE: a tool for the Reverse Engineering of Web Applications

  • 1.
    WARE: a toolfor the Reverse Engineering of Web Applications Anna Rita Fasolino G. A. Di Lucca, F. Pace, P. Tramontana, U. De Carlini Dipartimento di Informatica e Sistemistica University of Naples Federico II, Italy CSMR 2002, Budapest, Hungary 1
  • 2.
    Web Applications (WA): problems and open issues • The pressing market demand of web applications – WAs developed in very short time • The continuously changing needs of the evolving application domains – WAs frequently and rapidly modified with ad hoc approaches • The lack of method in producing and maintaining Web applications – low quality software, with disordered architecture, and inadequate and incomplete documentation CSMR 2002, Budapest, Hungary 2
  • 3.
    Web applications: arecent classification • Class 1: primarily static web sites • Class 2: sites providing client-side interaction – Based on an event model exploiting some script code • Class 3: applications with dynamic content – Pages are dynamically created on the fly, with the support of various WWW rechnologies (CGI, JSP, PHP, Javascript, XML, ODBC,…ASP,… ) Tilley and Huang, ICSE 2001 CSMR 2002, Budapest, Hungary 3
  • 4.
    Dynamic web applications • Due to the large number of employed technologies, understanding, maintaining and evolving a Class 3 application is a complex task • The need for specific Reverse Engineering techniques and processes to recover: – Static and dynamic aspects of the applications – Suitable representation models • The need for reverse engineering tools that support the extraction and the abstraction of the needed information from a WA CSMR 2002, Budapest, Hungary 4
  • 5.
    Reverse engineering WebApplications • A reverse engineering process to recover the following views: – The static architecture of the WA – The dynamic interactions between its components – The final behavior offered • Extended UML diagrams to represent these views – Class diagrams to model the architecture – Sequence and Collaboration diagrams to represent the dynamic model – Use case diagrams to describe the behavior of the WA CSMR 2002, Budapest, Hungary 5
  • 6.
    The Reverse Engineeringprocess STATIC Components and Web Application Direct Relations ANALYSIS Source Code between them BEHAVIORAL Behavioral ANALYSIS Model DYNAMIC Dynamic Interactions WA in ANALYSIS between Components Execution CSMR 2002, Budapest, Hungary 6
  • 7.
    Views obtainable bystatic analysis • A coarse grained view: – Web pages and Hypertextual links between pages : – Pages are distinguished into server and client pages, static and dynamic pages, simple and framed pages • A finer grained view: – inner page components and relationships – input/output form, text box, anchors, scripts, applets, text, images, multimedia objects (sounds, movies), …... – Page components may be active components (e.g.,scripts or applets) – The relationships include: submit, build, redirect, include • A UML class diagram representing both views CSMR 2002, Budapest, Hungary 7
  • 8.
    The meta-model ofa WA CSMR 2002, Budapest, Hungary 8
  • 9.
    The WARE tool •Designed to support the reverse engineering of a WA – executes the static analysis of the WA source code – populates a repository with the extracted information – supports the user in abstracting the WA models • Three main components: – Interface Layer – Service Layer – Repository CSMR 2002, Budapest, Hungary 9
  • 10.
    Interface layer Graphical Visualizer WARE GUI VCG Dotty RIGI WARE Architecture Service Layer Parsers IRF Translator HTML JS WA ASP Query Executor Source …. Files VBS UML Diagram PHP Abstractor Extractor Abstractor IRF Diagrams DBR Repository CSMR 2002, Budapest, Hungary 10
  • 11.
    The Service layer • The Extractor component – The Extractor parses the WA source code and produces an Intermediate Representation Form (IRF) – Several distinct parsers (for HTML, JavaScript, VBScript, ASP and PHP technologies) – New parsers can be added as the technology evolves • The Abstractor component – It implements abstraction operations necessary for reconstructing the more abstract views of the WA – It includes three main components • A Translator that populates the relational database from the IRF • A Query Executor that implements predefined queries over the database • The UML Diagram Abstractor that produces the class diagrams of a WA at various degrees of detail and other relevant information CSMR 2002, Budapest, Hungary 11
  • 12.
    The conceptual schemaof the Repository Java Applet Media Flash Object Mail Address Other Object Web Object redirect Server Class Server Function Client Function Static Page Built Page redirect Server Script Web Page Session Variable Client Script Parameter Modify Tag Client Page Server Page include Server Cookie event Anchor Client Module HTML Tag Interface Object Download Hyperlink Submits Form DB Interface Mail Interface Server File Interface Other Interface Frameset Generic File Field Select Button Textarea Frame source CSMR 2002, Budapest, Hungary 12
  • 13.
    The Interface Layer • It provides the user interface for accessing the WARE facilities: – Reverse Engineering for parsing the WA and producing the IRF – Comprehension for executing comprehension-related activities, such as exploring the inventory of the WA components and their source code, computing the reachability set of a component, creating clusters of related components according to a given clustering criterion – Query (Predefined query and Parametric query) for activating the Abstractor’s functions, and graphically visualizing the recovered information CSMR 2002, Budapest, Hungary 13
  • 14.
    Computation of the Reachability set of a WA component Figure 4.4 – b: WA Components reachable starting from a selected component Grouping the WA components into cohesive clusters CSMR 2002, Budapest, Hungary 14
  • 15.
    An experiment • Carried out for assessing the adequacy of the tool functions in supporting maintenance tasks • The tool has been used to understand and redocument existing WAs • An example: A WA implementing a ‘Juridical Laboratory’ – 201 files, in 19 directories, sizing 4 Mbytes – 55 Static Pages (55 HTML files in 10 directories) – 19 Server Pages (19 ASP files in 4 directories) – No design documentation available • The WARE tool was preliminarly used to perform: – The static analysis of the application, and for producing an inventory of the WA components CSMR 2002, Budapest, Hungary 15
  • 16.
    First step: thestatic analysis • Main results: – The population of the repository automatically produced by the tool – The inventory of the WA components – Graphical representations of the class diagram of the WA at various degrees of detail • Coarse-grained representations (including just the pages) • Fine-grained representation (including inner page components) CSMR 2002, Budapest, Hungary 16
  • 17.
    The inventory ofthe WA components Component type # Detected Server page 19 Client Static page 55 Client Built page 14 External web page 3 Client script block 53 Function in Client script block 19 Form 11 Input/ output field 71 Submit Operation (POST method) 4 Submit Operation (GET method) 7 Anchor to files to be downloaded 111 Anchor to Hypertextual link 49 Data File 61 Server script block 76 Function in Server scripts 4 Database Interface Object 29 Mail Interface Object 3 Image file 65 Redirect operation in server 7 blocks CSMR 2002, Budapest, Hungary 17
  • 18.
    The Class diagramof the WA CSMR 2002, Budapest, Hungary 18
  • 19.
    The second phase:formulating and validating hypotheses about the WA behavior • Driven by the graphical representations, notable sub-graphs were looked for (isolated sub-graphs, sub-trees, strongly connected components, …) • A tentative hypothesis about the behavior implemented by each sub-graph was formulated with the support of: – the names of the components, their source code analysis, and by tracing the application execution • Hypotheses were validated by the source code execution CSMR 2002, Budapest, Hungary 19
  • 20.
    Results from thesub-graph analysis • Four isolated sub-graphs – Three small ones without static client pages • Two of them implemented server-side functions for the web administrator (management of the mailing list, and of the registered users list) • The third one resulted from an incorrectly made maintenance operation – A large one, rooted in the home page • Nine user functions could be associated with nine notable sub- graphs contained in it CSMR 2002, Budapest, Hungary 20
  • 21.
    The sub-graphs inthe WA class diagram Professional Yellow Pages Mailing List Judicial Proceedings Judicial Case News Studies Home Web Searching Chat Forum About CSMR 2002, Budapest, Hungary 21
  • 22.
    Modeling the WAbehavior About • A use case was defined and associated with each notable News Home sub-graph Judicial Case Studies • The scenarios describing the use cases were defined with User Judicial Proceedings the support of the tool (the interactions between objects Web Searching Professionals Yellow Pages involved in the sub-graph Registered User WA Administrator were searched for and Forum Mailing List analyzed) Chat CSMR 2002, Budapest, Hungary 22
  • 23.
    Modeling the WAdynamic view CSMR 2002, Budapest, Hungary 23
  • 24.
    Conclusions •The experiments we carried out showed that the reverse engineering tool WARE can be used to support the comprehension of Web Applications to be maintained • It supports the reconstruction of various UML diagrams from undocumented applications: – Static views are automatically produced by the tool – Behavioral and dynamic views can be semi-automatically obtained with the tool assistance • A reverse engineering process based on the WARE tool could be defined CSMR 2002, Budapest, Hungary 24
  • 25.
    Future work • Specific clustering criteria may support a more effective analysis of the class diagrams recovered by the tool • A clustering approach should take into account both the topology of the graphs, and the typology of the connections in the graph • The automatic clustering of Was will be investigated in the context of large size web applications CSMR 2002, Budapest, Hungary 25