Ph.D. Dissertation Forum – ICSM 2005
Ph.D. DissertationPh.D. Dissertation
Reverse EngineeringReverse Engineering
Web ApplicationsWeb Applications
Porfirio TramontanaPorfirio Tramontana
University of Naples “Federico II”University of Naples “Federico II”
Ph.D. Dissertation Forum – ICSM 2005
Web Applications: open problemsWeb Applications: open problems
 In the past years, a great request for Web
Applications takes place, due to the World Wide
Web diffusion making available many services
all over the world
 Web Applications have been developed with
immature design methodologies and
technologies
 Nowadays, there is a number of legacy Web
Applications needing for maintenance and re-
engineering
Ph.D. Dissertation Forum – ICSM 2005
Ph. D. Thesis Goals
• To propose models, methods and tools
supporting Reverse Engineering and
Comprehension of Web Applications
• Reverse Engineering and comprehension are
fundamental tasks needed to efficiently support
maintenance, testing and quality assessment of
Web Applications
Doctoral Thesis Goals
Ph.D. Dissertation Forum – ICSM 2005
Peculiarities of script-based Web Applications
 Page based
 Client-Server Architecture
 Interpreted languages
 Client pages may be generated “on the fly”
 Client pages are executed in a browser (and the
designer doesn’t know what kind of browser will
be used)
 HTML interpreters are fault tolerant
 ... and so on ...
Ph.D. Dissertation Forum – ICSM 2005
A process for theA process for the
Reverse Engineering of Web ApplicationsReverse Engineering of Web Applications
Abstraction
Extraction
WA
Source
Code
Static
Analysis
Dynamic
Analysis
Business Level UML
Diagram Abstractions
WA Execution
Identification of
cloned components
Identification of
Interaction Design
Patterns
Assignment of
Concepts
Functional Clustering
Cloned components
Interaction Design Patterns
Concepts describing Reverse
Engineering artifacts
Groups ofpages realizing Web
Application use cases
Structuraland Business
Level UML diagrams
Maintanability assessment
Abstraction
Extraction
WA
Source
Code
Static
Analysis
Dynamic
Analysis
Business Level UML
Diagram Abstractions
WA Execution
Identification of
cloned components
Identification of
Interaction Design
Patterns
Assignment of
Concepts
Functional Clustering
Cloned components
Interaction Design Patterns
Concepts describing Reverse
Engineering artifacts
Groups ofpages realizing Web
Application use cases
Structuraland Business
Level UML diagrams
Maintanability assessment
G.A. Di Lucca, A.R. Fasolino, P. Tramontana, “Reverse Engineering Web Application: the WARE approach”, Journal of Software Maintenance and Evolution:
Research and Practice, Volume 16, Issue 1-2, Date: January - April 2004, Pages: 71-101
Ph.D. Dissertation Forum – ICSM 2005
Analysis of Web ApplicationsAnalysis of Web Applications
1) Static analysis of the source code
A multi-language parser analysing the source code of Server
pages, Client pages and Script modules has been realized.
During the analysis of server pages, facts related to the client
pages that are built by server pages are also recorded.
Static analysis results are stored in a intermediate form and are
used to fill a relational database
2) Dynamic Analysis
Analysis of Built Client pages in order to add to the database some
facts that have been observed by executing the application
The reference model adopted is an extension of the one proposed by
Conallen for the forward engineering of Web Applications
Ph.D. Dissertation Forum – ICSM 2005
Model of Web ApplicationsModel of Web Applications
Static Page
DB Interface
Java Applet
TextareaSelect Button
Media Flash Object Mail Address
Mail Interface Server File Interface
Other Object
Generic File
Download
Parameter
Other Interface
Hyperlink
Frame
Web Object
Frameset
Anchor
Field
Server Function Server Class
Interface Object
Built Page
Form
Server Script
Session Variable
Server CookieServer Page
Submits
include
HTML Tag
Web Page
source
redirect
Client Page
Client Script
event
Modify Tag
redirect
Client Function
Client Module
Ph.D. Dissertation Forum – ICSM 2005
WARE (Web Application Reverse Engineering)WARE (Web Application Reverse Engineering)
tooltool
Extractor Abstractor
Interface
layer
IRF
DBR
Diagrams
Repository
HTML
Parser
s
Service
Layer
WARE-Tool
WA
Source
Files
WARE
GUI
Graphical Visualizer
Dott
y
VCG RIGI
ASP
VBS
PHP
JS
….
IRF Translator
Query Executor
UML Diagrams
Abstractor /areadocente.html
/check.asp
Redirect
/check.aspBuilds
/autenticazionedocente.html
Submit
/check.asp /check.asp/check.asp
Submit
/areadocente.html
/check.asp
Redirect
/check.aspBuilds
/autenticazionedocente.html
Submit
/check.asp /check.asp/check.asp
Submit
WARE Architecture
Detail Class Diagram abstracted by WARE
G. A. Di Lucca, A.R. Fasolino, U. De Carlini, F. Pace, P. Tramontana, “WARE: a tool for the Reverse Engineering of web Applications”, Proc. of 6th
IEEE European Conference on Software Maintenance and Reengineering, CSMR 2002, IEEE CS Press, Los Alamitos, CA, Pages:241 - 250
Ph.D. Dissertation Forum – ICSM 2005
Functional Clustering of Web PagesFunctional Clustering of Web Pages
• Goal:
To cluster together subsets
of components realizing Web
Application functionalities
• Proposed Technique:
Hierarchical clustering
algorithm, grouping Web
Application pages in
subsets, maximizing the
cohesion and minimizing the
coupling between them
G. A. Di Lucca, A.R. Fasolino, U. De Carlini, F. Pace, P. Tramontana,
“Comprehending Web Applications by a Clustering Based Approach”, Proc. of 10th
IEEE Workshop on Program Comprehension, IWPC 2002, Pages:261 - 270
Ph.D. Dissertation Forum – ICSM 2005
Concept AssignmentConcept Assignment
 Goal:Goal:
 To identify the more relevantTo identify the more relevant
concepts in client pages withconcepts in client pages with
the purpose to suggest athe purpose to suggest a
semantic description of clientsemantic description of client
pages and of functionalpages and of functional
clusters of pagesclusters of pages
 Proposed Technique:Proposed Technique:
 Heuristic Algorithms basedHeuristic Algorithms based
on Information Retrievalon Information Retrieval
 Candidate concepts areCandidate concepts are
searched in textual content ofsearched in textual content of
client pagesclient pages
 Single common words and shortSingle common words and short
word sequences are candidatedword sequences are candidated
to be conceptsto be concepts
Built Client Page
Server Page
0..*
1
0..*
1
<<builds>>
Data Component
StopWord
Word
has synonym
has stem
Web Page
Static Client Page
Attribute
Name
Tag
Name
Weight
nested in
0..*0..*
Control Component
0..*0..*
Client Page
File name
1111
Text
Weight
0..*0..*
0..1
0..1
0..1
0..1
0..*0..1 0..*0..1
Concept1
1
1
1
1
1
1
1
G.A. Di Lucca, A.R.Fasolino, P.Tramontana, U.De Carlini, “Supporting Concept Assignment in the Comprehension of Web Applications”, Proceedings of
the 28th
IEEE Annual International Computer Software and Applications Conference, COMPSAC 2004
Ph.D. Dissertation Forum – ICSM 2005
Interaction Design PatternInteraction Design Patterns Identifications Identification
 Goal:Goal:
 To identify repetitive structures in Web ClientTo identify repetitive structures in Web Client
pagespages
 These structures can be related to knownThese structures can be related to known
Programming PatternsProgramming Patterns
 Proposed Technique:Proposed Technique:
 Statistical methodology based on featuresStatistical methodology based on features
extracted in the source code of client pages.extracted in the source code of client pages.
 Presence, quantity and dimension of forms,Presence, quantity and dimension of forms,
tables, input fields, frames, common keywordstables, input fields, frames, common keywords
and so on.and so on.
G.A. Di Lucca, A.R.Fasolino, P.Tramontana, “Recovering Interaction Design Patterns in Web Applications”, submitted to 9th
IEEE European
Conference on Software Maintenace and Reengineering, CSMR 2005
Ph.D. Dissertation Forum – ICSM 2005
Identification of cloned componentsIdentification of cloned components
 Goals:Goals:
 Re-Engineering of cloned components via codeRe-Engineering of cloned components via code
transformationstransformations
 Classification of Built Client PagesClassification of Built Client Pages
 Identification of reusable Programming PatternsIdentification of reusable Programming Patterns
 Proposed Techniques:Proposed Techniques:
 Extraction of features in the structure of Client pagesExtraction of features in the structure of Client pages
and in the source code of server pagesand in the source code of server pages
 Computation of distance measures between pagesComputation of distance measures between pages
(Euclidean dstance, Levenshtein edit distance)(Euclidean dstance, Levenshtein edit distance)
G.A. Di Lucca, A.R. Fasolino, P. Tramontana, U. De Carlini, “Identifying Reusable Components in Web Applications”, IASTED International
Conference on Software Engineering, SE 2004, pp.526-531
Ph.D. Dissertation Forum – ICSM 2005
Abstraction of Business Level ModelsAbstraction of Business Level Models
 Goals:Goals:
 To abstract object orientedTo abstract object oriented
business level models of Webbusiness level models of Web
ApplicationsApplications
 Proposed Techniques:Proposed Techniques:
 Classes and attributes areClasses and attributes are
identified by analysing theidentified by analysing the
data that are exchangeddata that are exchanged
between user, Web pagesbetween user, Web pages
and databases.and databases.
 Class methods are identifiedClass methods are identified
by analysing the functionsby analysing the functions
implemented by cluster ofimplemented by cluster of
pagespages
 Relationships betweenRelationships between
classes are identifiedclasses are identified
analysing data structures andanalysing data structures and
data flow among pagesdata flow among pages
Tutoring request
Date
Teacher
Name
Surname
E-mail
Phone number
Password
Code
Tutoring
Date
Start time
End time
News
Number
Date
Text
Student
Name
Surname
E-mail
Password
Code
Phone number
Exam
Date
Time
Classroom
Course
Academic year
Code
Name
Exam Reservation
Date
G.A. Di Lucca, A.R.Fasolino, U.De Carlini, P.Tramontana, “Recovering a Business Object Model from Web Applications”, Proceedings of the 27th
IEEE Annual International Computer Software and Applications Conference, COMPSAC 2003, Pages: 348 - 353
Ph.D. Dissertation Forum – ICSM 2005
Maintainability ModelMaintainability Model
 Goals:Goals:
 To propose models and methods for the assessmentTo propose models and methods for the assessment
of the maintainability of Web Applicationsof the maintainability of Web Applications
 Proposed Models and Techniques:Proposed Models and Techniques:
 Adapting to Web Applications the Oman modelAdapting to Web Applications the Oman model
(thought for traditional applications)(thought for traditional applications)
 Selection of a set of product metrics and proposal of aSelection of a set of product metrics and proposal of a
maintainability index that can be calculated withmaintainability index that can be calculated with
negligible effort and timenegligible effort and time
G.A. Di Lucca, A.R.Fasolino, P.Tramontana, C.A.Visaggio, “Towards the definition of a maintainability model for web applications”, Proceedings of
the Eighth IEEE European Conference on Software Maintenance and Reengineering, CSMR 2004, pages:279 - 287
Ph.D. Dissertation Forum – ICSM 2005
Current and future worksCurrent and future works
 Techniques for the dynamicTechniques for the dynamic
analysis of Web Applicationsanalysis of Web Applications
 Accessibility assessment of ClientAccessibility assessment of Client
pagespages
 Migration from Web Applications toMigration from Web Applications to
Web ServicesWeb Services
 Testing of Web ApplicationsTesting of Web Applications
 Mutation Testing techniquesMutation Testing techniques
 Maintainability assessmentMaintainability assessment
 Definition of ageing measures for WebDefinition of ageing measures for Web
ApplicationsApplications
G.A. Di Lucca, M. Di Penta, A.R.
Fasolino, P. Tramontana, “Supporting
Web Application Evolution by Dynamic
Analysis”, IWPSE 2005
G.A. Di Lucca, A.R. Fasolino, P.
Tramontana, “Web Site Accessibility:
Identifying and Fixing of Accessibility
Problems in Client Page Code”, WSE
2005
Ph.D. Dissertation Forum – ICSM 2005

Reverse Engineering Web Applications

  • 1.
    Ph.D. Dissertation Forum– ICSM 2005 Ph.D. DissertationPh.D. Dissertation Reverse EngineeringReverse Engineering Web ApplicationsWeb Applications Porfirio TramontanaPorfirio Tramontana University of Naples “Federico II”University of Naples “Federico II”
  • 2.
    Ph.D. Dissertation Forum– ICSM 2005 Web Applications: open problemsWeb Applications: open problems  In the past years, a great request for Web Applications takes place, due to the World Wide Web diffusion making available many services all over the world  Web Applications have been developed with immature design methodologies and technologies  Nowadays, there is a number of legacy Web Applications needing for maintenance and re- engineering
  • 3.
    Ph.D. Dissertation Forum– ICSM 2005 Ph. D. Thesis Goals • To propose models, methods and tools supporting Reverse Engineering and Comprehension of Web Applications • Reverse Engineering and comprehension are fundamental tasks needed to efficiently support maintenance, testing and quality assessment of Web Applications Doctoral Thesis Goals
  • 4.
    Ph.D. Dissertation Forum– ICSM 2005 Peculiarities of script-based Web Applications  Page based  Client-Server Architecture  Interpreted languages  Client pages may be generated “on the fly”  Client pages are executed in a browser (and the designer doesn’t know what kind of browser will be used)  HTML interpreters are fault tolerant  ... and so on ...
  • 5.
    Ph.D. Dissertation Forum– ICSM 2005 A process for theA process for the Reverse Engineering of Web ApplicationsReverse Engineering of Web Applications Abstraction Extraction WA Source Code Static Analysis Dynamic Analysis Business Level UML Diagram Abstractions WA Execution Identification of cloned components Identification of Interaction Design Patterns Assignment of Concepts Functional Clustering Cloned components Interaction Design Patterns Concepts describing Reverse Engineering artifacts Groups ofpages realizing Web Application use cases Structuraland Business Level UML diagrams Maintanability assessment Abstraction Extraction WA Source Code Static Analysis Dynamic Analysis Business Level UML Diagram Abstractions WA Execution Identification of cloned components Identification of Interaction Design Patterns Assignment of Concepts Functional Clustering Cloned components Interaction Design Patterns Concepts describing Reverse Engineering artifacts Groups ofpages realizing Web Application use cases Structuraland Business Level UML diagrams Maintanability assessment G.A. Di Lucca, A.R. Fasolino, P. Tramontana, “Reverse Engineering Web Application: the WARE approach”, Journal of Software Maintenance and Evolution: Research and Practice, Volume 16, Issue 1-2, Date: January - April 2004, Pages: 71-101
  • 6.
    Ph.D. Dissertation Forum– ICSM 2005 Analysis of Web ApplicationsAnalysis of Web Applications 1) Static analysis of the source code A multi-language parser analysing the source code of Server pages, Client pages and Script modules has been realized. During the analysis of server pages, facts related to the client pages that are built by server pages are also recorded. Static analysis results are stored in a intermediate form and are used to fill a relational database 2) Dynamic Analysis Analysis of Built Client pages in order to add to the database some facts that have been observed by executing the application The reference model adopted is an extension of the one proposed by Conallen for the forward engineering of Web Applications
  • 7.
    Ph.D. Dissertation Forum– ICSM 2005 Model of Web ApplicationsModel of Web Applications Static Page DB Interface Java Applet TextareaSelect Button Media Flash Object Mail Address Mail Interface Server File Interface Other Object Generic File Download Parameter Other Interface Hyperlink Frame Web Object Frameset Anchor Field Server Function Server Class Interface Object Built Page Form Server Script Session Variable Server CookieServer Page Submits include HTML Tag Web Page source redirect Client Page Client Script event Modify Tag redirect Client Function Client Module
  • 8.
    Ph.D. Dissertation Forum– ICSM 2005 WARE (Web Application Reverse Engineering)WARE (Web Application Reverse Engineering) tooltool Extractor Abstractor Interface layer IRF DBR Diagrams Repository HTML Parser s Service Layer WARE-Tool WA Source Files WARE GUI Graphical Visualizer Dott y VCG RIGI ASP VBS PHP JS …. IRF Translator Query Executor UML Diagrams Abstractor /areadocente.html /check.asp Redirect /check.aspBuilds /autenticazionedocente.html Submit /check.asp /check.asp/check.asp Submit /areadocente.html /check.asp Redirect /check.aspBuilds /autenticazionedocente.html Submit /check.asp /check.asp/check.asp Submit WARE Architecture Detail Class Diagram abstracted by WARE G. A. Di Lucca, A.R. Fasolino, U. De Carlini, F. Pace, P. Tramontana, “WARE: a tool for the Reverse Engineering of web Applications”, Proc. of 6th IEEE European Conference on Software Maintenance and Reengineering, CSMR 2002, IEEE CS Press, Los Alamitos, CA, Pages:241 - 250
  • 9.
    Ph.D. Dissertation Forum– ICSM 2005 Functional Clustering of Web PagesFunctional Clustering of Web Pages • Goal: To cluster together subsets of components realizing Web Application functionalities • Proposed Technique: Hierarchical clustering algorithm, grouping Web Application pages in subsets, maximizing the cohesion and minimizing the coupling between them G. A. Di Lucca, A.R. Fasolino, U. De Carlini, F. Pace, P. Tramontana, “Comprehending Web Applications by a Clustering Based Approach”, Proc. of 10th IEEE Workshop on Program Comprehension, IWPC 2002, Pages:261 - 270
  • 10.
    Ph.D. Dissertation Forum– ICSM 2005 Concept AssignmentConcept Assignment  Goal:Goal:  To identify the more relevantTo identify the more relevant concepts in client pages withconcepts in client pages with the purpose to suggest athe purpose to suggest a semantic description of clientsemantic description of client pages and of functionalpages and of functional clusters of pagesclusters of pages  Proposed Technique:Proposed Technique:  Heuristic Algorithms basedHeuristic Algorithms based on Information Retrievalon Information Retrieval  Candidate concepts areCandidate concepts are searched in textual content ofsearched in textual content of client pagesclient pages  Single common words and shortSingle common words and short word sequences are candidatedword sequences are candidated to be conceptsto be concepts Built Client Page Server Page 0..* 1 0..* 1 <<builds>> Data Component StopWord Word has synonym has stem Web Page Static Client Page Attribute Name Tag Name Weight nested in 0..*0..* Control Component 0..*0..* Client Page File name 1111 Text Weight 0..*0..* 0..1 0..1 0..1 0..1 0..*0..1 0..*0..1 Concept1 1 1 1 1 1 1 1 G.A. Di Lucca, A.R.Fasolino, P.Tramontana, U.De Carlini, “Supporting Concept Assignment in the Comprehension of Web Applications”, Proceedings of the 28th IEEE Annual International Computer Software and Applications Conference, COMPSAC 2004
  • 11.
    Ph.D. Dissertation Forum– ICSM 2005 Interaction Design PatternInteraction Design Patterns Identifications Identification  Goal:Goal:  To identify repetitive structures in Web ClientTo identify repetitive structures in Web Client pagespages  These structures can be related to knownThese structures can be related to known Programming PatternsProgramming Patterns  Proposed Technique:Proposed Technique:  Statistical methodology based on featuresStatistical methodology based on features extracted in the source code of client pages.extracted in the source code of client pages.  Presence, quantity and dimension of forms,Presence, quantity and dimension of forms, tables, input fields, frames, common keywordstables, input fields, frames, common keywords and so on.and so on. G.A. Di Lucca, A.R.Fasolino, P.Tramontana, “Recovering Interaction Design Patterns in Web Applications”, submitted to 9th IEEE European Conference on Software Maintenace and Reengineering, CSMR 2005
  • 12.
    Ph.D. Dissertation Forum– ICSM 2005 Identification of cloned componentsIdentification of cloned components  Goals:Goals:  Re-Engineering of cloned components via codeRe-Engineering of cloned components via code transformationstransformations  Classification of Built Client PagesClassification of Built Client Pages  Identification of reusable Programming PatternsIdentification of reusable Programming Patterns  Proposed Techniques:Proposed Techniques:  Extraction of features in the structure of Client pagesExtraction of features in the structure of Client pages and in the source code of server pagesand in the source code of server pages  Computation of distance measures between pagesComputation of distance measures between pages (Euclidean dstance, Levenshtein edit distance)(Euclidean dstance, Levenshtein edit distance) G.A. Di Lucca, A.R. Fasolino, P. Tramontana, U. De Carlini, “Identifying Reusable Components in Web Applications”, IASTED International Conference on Software Engineering, SE 2004, pp.526-531
  • 13.
    Ph.D. Dissertation Forum– ICSM 2005 Abstraction of Business Level ModelsAbstraction of Business Level Models  Goals:Goals:  To abstract object orientedTo abstract object oriented business level models of Webbusiness level models of Web ApplicationsApplications  Proposed Techniques:Proposed Techniques:  Classes and attributes areClasses and attributes are identified by analysing theidentified by analysing the data that are exchangeddata that are exchanged between user, Web pagesbetween user, Web pages and databases.and databases.  Class methods are identifiedClass methods are identified by analysing the functionsby analysing the functions implemented by cluster ofimplemented by cluster of pagespages  Relationships betweenRelationships between classes are identifiedclasses are identified analysing data structures andanalysing data structures and data flow among pagesdata flow among pages Tutoring request Date Teacher Name Surname E-mail Phone number Password Code Tutoring Date Start time End time News Number Date Text Student Name Surname E-mail Password Code Phone number Exam Date Time Classroom Course Academic year Code Name Exam Reservation Date G.A. Di Lucca, A.R.Fasolino, U.De Carlini, P.Tramontana, “Recovering a Business Object Model from Web Applications”, Proceedings of the 27th IEEE Annual International Computer Software and Applications Conference, COMPSAC 2003, Pages: 348 - 353
  • 14.
    Ph.D. Dissertation Forum– ICSM 2005 Maintainability ModelMaintainability Model  Goals:Goals:  To propose models and methods for the assessmentTo propose models and methods for the assessment of the maintainability of Web Applicationsof the maintainability of Web Applications  Proposed Models and Techniques:Proposed Models and Techniques:  Adapting to Web Applications the Oman modelAdapting to Web Applications the Oman model (thought for traditional applications)(thought for traditional applications)  Selection of a set of product metrics and proposal of aSelection of a set of product metrics and proposal of a maintainability index that can be calculated withmaintainability index that can be calculated with negligible effort and timenegligible effort and time G.A. Di Lucca, A.R.Fasolino, P.Tramontana, C.A.Visaggio, “Towards the definition of a maintainability model for web applications”, Proceedings of the Eighth IEEE European Conference on Software Maintenance and Reengineering, CSMR 2004, pages:279 - 287
  • 15.
    Ph.D. Dissertation Forum– ICSM 2005 Current and future worksCurrent and future works  Techniques for the dynamicTechniques for the dynamic analysis of Web Applicationsanalysis of Web Applications  Accessibility assessment of ClientAccessibility assessment of Client pagespages  Migration from Web Applications toMigration from Web Applications to Web ServicesWeb Services  Testing of Web ApplicationsTesting of Web Applications  Mutation Testing techniquesMutation Testing techniques  Maintainability assessmentMaintainability assessment  Definition of ageing measures for WebDefinition of ageing measures for Web ApplicationsApplications G.A. Di Lucca, M. Di Penta, A.R. Fasolino, P. Tramontana, “Supporting Web Application Evolution by Dynamic Analysis”, IWPSE 2005 G.A. Di Lucca, A.R. Fasolino, P. Tramontana, “Web Site Accessibility: Identifying and Fixing of Accessibility Problems in Client Page Code”, WSE 2005
  • 16.

Editor's Notes

  • #4 RE + C  Riduzione impatto manutenzione RE  Automatizzazione creazione e esecuzione test RE  Calcolo metriche dimensionamento per l’assessment
  • #5 Non necessaria:basta una descrizione in due parole tipo: Web Applications based on scripting languages such as ASP, PHP and Javascript on the client side, have been studied because they were the most common technologies to produce Web Applications
  • #6 Cenni ai lavori presenti sull’analisi delle tracce
  • #7 Cenni ai lavori presenti sull’analisi delle tracce
  • #8 This model is an extension of the one proposed by Jim Conallen
  • #9 WARE in realtà si integra con molti altri tool realizzati IRF: Intermediate Representation Form
  • #10 3) Clustering Raggruppamento di componenti che collaborano alla realizzazione delle funzionalità dell’applicazione 4) Astrazione di diagrammi UML Produzione di diagrammi di dettaglio sulla base delle informazioni estratte