Reverse Engineering Web Applications

Ph.D. Dissertation Forum – ICSM 2005
Ph.D. DissertationPh.D. Dissertation
Reverse EngineeringReverse Engineering
Web ApplicationsWeb Applications
Porfirio TramontanaPorfirio Tramontana
University of Naples “Federico II”University of Naples “Federico II”

Web Applications: open problemsWeb Applications: open problems
 In the past years, a great request for Web
Applications takes place, due to the World Wide
Web diffusion making available many services
all over the world
 Web Applications have been developed with
immature design methodologies and
technologies
 Nowadays, there is a number of legacy Web
Applications needing for maintenance and re-
engineering

Ph. D. Thesis Goals
• To propose models, methods and tools
supporting Reverse Engineering and
Comprehension of Web Applications
• Reverse Engineering and comprehension are
fundamental tasks needed to efficiently support
maintenance, testing and quality assessment of
Web Applications
Doctoral Thesis Goals

Peculiarities of script-based Web Applications
 Page based
 Client-Server Architecture
 Interpreted languages
 Client pages may be generated “on the fly”
 Client pages are executed in a browser (and the
designer doesn’t know what kind of browser will
be used)
 HTML interpreters are fault tolerant
 ... and so on ...

A process for theA process for the
Reverse Engineering of Web ApplicationsReverse Engineering of Web Applications
Abstraction
Extraction
WA
Source
Code
Static
Analysis
Dynamic
Analysis
Business Level UML
Diagram Abstractions
WA Execution
Identification of
cloned components
Identification of
Interaction Design
Patterns
Assignment of
Concepts
Functional Clustering
Cloned components
Interaction Design Patterns
Concepts describing Reverse
Engineering artifacts
Groups ofpages realizing Web
Application use cases
Structuraland Business
Level UML diagrams
Maintanability assessment
Abstraction
Extraction
WA
Source
Code
Static
Analysis
Dynamic
Analysis
Business Level UML
Diagram Abstractions
WA Execution
Identification of
cloned components
Identification of
Interaction Design
Patterns
Assignment of
Concepts
Functional Clustering
Cloned components
Interaction Design Patterns
Concepts describing Reverse
Engineering artifacts
Groups ofpages realizing Web
Application use cases
Structuraland Business
Level UML diagrams
Maintanability assessment
G.A. Di Lucca, A.R. Fasolino, P. Tramontana, “Reverse Engineering Web Application: the WARE approach”, Journal of Software Maintenance and Evolution:
Research and Practice, Volume 16, Issue 1-2, Date: January - April 2004, Pages: 71-101

Analysis of Web ApplicationsAnalysis of Web Applications
1) Static analysis of the source code
A multi-language parser analysing the source code of Server
pages, Client pages and Script modules has been realized.
During the analysis of server pages, facts related to the client
pages that are built by server pages are also recorded.
Static analysis results are stored in a intermediate form and are
used to fill a relational database
2) Dynamic Analysis
Analysis of Built Client pages in order to add to the database some
facts that have been observed by executing the application
The reference model adopted is an extension of the one proposed by
Conallen for the forward engineering of Web Applications

Model of Web ApplicationsModel of Web Applications
Static Page
DB Interface
Java Applet
TextareaSelect Button
Media Flash Object Mail Address
Mail Interface Server File Interface
Other Object
Generic File
Download
Parameter
Other Interface
Hyperlink
Frame
Web Object
Frameset
Anchor
Field
Server Function Server Class
Interface Object
Built Page
Form
Server Script
Session Variable
Server CookieServer Page
Submits
include
HTML Tag
Web Page
source
redirect
Client Page
Client Script
event
Modify Tag
redirect
Client Function
Client Module

WARE (Web Application Reverse Engineering)WARE (Web Application Reverse Engineering)
tooltool
Extractor Abstractor
Interface
layer
IRF
DBR
Diagrams
Repository
HTML
Parser
s
Service
Layer
WARE-Tool
WA
Source
Files
WARE
GUI
Graphical Visualizer
Dott
y
VCG RIGI
ASP
VBS
PHP
JS
….
IRF Translator
Query Executor
UML Diagrams
Abstractor /areadocente.html
/check.asp
Redirect
/check.aspBuilds
/autenticazionedocente.html
Submit
/check.asp /check.asp/check.asp
Submit
/areadocente.html
/check.asp
Redirect
/check.aspBuilds
/autenticazionedocente.html
Submit
/check.asp /check.asp/check.asp
Submit
WARE Architecture
Detail Class Diagram abstracted by WARE
G. A. Di Lucca, A.R. Fasolino, U. De Carlini, F. Pace, P. Tramontana, “WARE: a tool for the Reverse Engineering of web Applications”, Proc. of 6th
IEEE European Conference on Software Maintenance and Reengineering, CSMR 2002, IEEE CS Press, Los Alamitos, CA, Pages:241 - 250

Functional Clustering of Web PagesFunctional Clustering of Web Pages
• Goal:
To cluster together subsets
of components realizing Web
Application functionalities
• Proposed Technique:
Hierarchical clustering
algorithm, grouping Web
Application pages in
subsets, maximizing the
cohesion and minimizing the
coupling between them
G. A. Di Lucca, A.R. Fasolino, U. De Carlini, F. Pace, P. Tramontana,
“Comprehending Web Applications by a Clustering Based Approach”, Proc. of 10th
IEEE Workshop on Program Comprehension, IWPC 2002, Pages:261 - 270

Concept AssignmentConcept Assignment
 Goal:Goal:
 To identify the more relevantTo identify the more relevant
concepts in client pages withconcepts in client pages with
the purpose to suggest athe purpose to suggest a
semantic description of clientsemantic description of client
pages and of functionalpages and of functional
clusters of pagesclusters of pages
 Proposed Technique:Proposed Technique:
 Heuristic Algorithms basedHeuristic Algorithms based
on Information Retrievalon Information Retrieval
 Candidate concepts areCandidate concepts are
searched in textual content ofsearched in textual content of
client pagesclient pages
 Single common words and shortSingle common words and short
word sequences are candidatedword sequences are candidated
to be conceptsto be concepts
Built Client Page
Server Page
0..*
1
0..*
1
<<builds>>
Data Component
StopWord
Word
has synonym
has stem
Web Page
Static Client Page
Attribute
Name
Tag
Name
Weight
nested in
0..*0..*
Control Component
0..*0..*
Client Page
File name
1111
Text
Weight
0..*0..*
0..1
0..1
0..1
0..1
0..*0..1 0..*0..1
Concept1
1
1
1
1
1
1
1
G.A. Di Lucca, A.R.Fasolino, P.Tramontana, U.De Carlini, “Supporting Concept Assignment in the Comprehension of Web Applications”, Proceedings of
the 28th
IEEE Annual International Computer Software and Applications Conference, COMPSAC 2004

Interaction Design PatternInteraction Design Patterns Identifications Identification
 Goal:Goal:
 To identify repetitive structures in Web ClientTo identify repetitive structures in Web Client
pagespages
 These structures can be related to knownThese structures can be related to known
Programming PatternsProgramming Patterns
 Proposed Technique:Proposed Technique:
 Statistical methodology based on featuresStatistical methodology based on features
extracted in the source code of client pages.extracted in the source code of client pages.
 Presence, quantity and dimension of forms,Presence, quantity and dimension of forms,
tables, input fields, frames, common keywordstables, input fields, frames, common keywords
and so on.and so on.
G.A. Di Lucca, A.R.Fasolino, P.Tramontana, “Recovering Interaction Design Patterns in Web Applications”, submitted to 9th
IEEE European
Conference on Software Maintenace and Reengineering, CSMR 2005

Identification of cloned componentsIdentification of cloned components
 Goals:Goals:
 Re-Engineering of cloned components via codeRe-Engineering of cloned components via code
transformationstransformations
 Classification of Built Client PagesClassification of Built Client Pages
 Identification of reusable Programming PatternsIdentification of reusable Programming Patterns
 Proposed Techniques:Proposed Techniques:
 Extraction of features in the structure of Client pagesExtraction of features in the structure of Client pages
and in the source code of server pagesand in the source code of server pages
 Computation of distance measures between pagesComputation of distance measures between pages
(Euclidean dstance, Levenshtein edit distance)(Euclidean dstance, Levenshtein edit distance)
G.A. Di Lucca, A.R. Fasolino, P. Tramontana, U. De Carlini, “Identifying Reusable Components in Web Applications”, IASTED International
Conference on Software Engineering, SE 2004, pp.526-531

Abstraction of Business Level ModelsAbstraction of Business Level Models
 Goals:Goals:
 To abstract object orientedTo abstract object oriented
business level models of Webbusiness level models of Web
ApplicationsApplications
 Proposed Techniques:Proposed Techniques:
 Classes and attributes areClasses and attributes are
identified by analysing theidentified by analysing the
data that are exchangeddata that are exchanged
between user, Web pagesbetween user, Web pages
and databases.and databases.
 Class methods are identifiedClass methods are identified
by analysing the functionsby analysing the functions
implemented by cluster ofimplemented by cluster of
pagespages
 Relationships betweenRelationships between
classes are identifiedclasses are identified
analysing data structures andanalysing data structures and
data flow among pagesdata flow among pages
Tutoring request
Date
Teacher
Name
Surname
E-mail
Phone number
Password
Code
Tutoring
Date
Start time
End time
News
Number
Date
Text
Student
Name
Surname
E-mail
Password
Code
Phone number
Exam
Date
Time
Classroom
Course
Academic year
Code
Name
Exam Reservation
Date
G.A. Di Lucca, A.R.Fasolino, U.De Carlini, P.Tramontana, “Recovering a Business Object Model from Web Applications”, Proceedings of the 27th
IEEE Annual International Computer Software and Applications Conference, COMPSAC 2003, Pages: 348 - 353

Maintainability ModelMaintainability Model
 Goals:Goals:
 To propose models and methods for the assessmentTo propose models and methods for the assessment
of the maintainability of Web Applicationsof the maintainability of Web Applications
 Proposed Models and Techniques:Proposed Models and Techniques:
 Adapting to Web Applications the Oman modelAdapting to Web Applications the Oman model
(thought for traditional applications)(thought for traditional applications)
 Selection of a set of product metrics and proposal of aSelection of a set of product metrics and proposal of a
maintainability index that can be calculated withmaintainability index that can be calculated with
negligible effort and timenegligible effort and time
G.A. Di Lucca, A.R.Fasolino, P.Tramontana, C.A.Visaggio, “Towards the definition of a maintainability model for web applications”, Proceedings of
the Eighth IEEE European Conference on Software Maintenance and Reengineering, CSMR 2004, pages:279 - 287

Current and future worksCurrent and future works
 Techniques for the dynamicTechniques for the dynamic
analysis of Web Applicationsanalysis of Web Applications
 Accessibility assessment of ClientAccessibility assessment of Client
pagespages
 Migration from Web Applications toMigration from Web Applications to
Web ServicesWeb Services
 Testing of Web ApplicationsTesting of Web Applications
 Mutation Testing techniquesMutation Testing techniques
 Maintainability assessmentMaintainability assessment
 Definition of ageing measures for WebDefinition of ageing measures for Web
ApplicationsApplications
G.A. Di Lucca, M. Di Penta, A.R.
Fasolino, P. Tramontana, “Supporting
Web Application Evolution by Dynamic
Analysis”, IWPSE 2005
G.A. Di Lucca, A.R. Fasolino, P.
Tramontana, “Web Site Accessibility:
Identifying and Fixing of Accessibility
Problems in Client Page Code”, WSE
2005

Reverse Engineering Web Applications

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Reverse Engineering Web Applications

Similar to Reverse Engineering Web Applications (20)

More from Porfirio Tramontana

More from Porfirio Tramontana (20)

Recently uploaded

Recently uploaded (20)

Reverse Engineering Web Applications

Editor's Notes