Reverse Engineering Web Applications

3,142 views

Published on

The heterogeneous and dynamic nature of components making up a Web Application, the lack of effective programming mechanisms for implementing basic software engineering principles in it, and undisciplined development processes induced by the high pressure of a very short time-to-market, make Web Application maintenance a challenging problem. A relevant issue consists of reusing the methodological and technological experience in the sector of traditional software maintenance, and exploring the opportunity of using Reverse Engineering to support effective Web Application maintenance.
The Ph.D. Thesis presents an approach for Reverse Engineering Web Applications. The approach include the definition of Reverse Engineering methods and supporting software tools, that help to understand existing undocumented Web Applications to be maintained or evolved, through the reconstruction of UML diagrams. Some validation experiments have been carried out and they showed the usefulness of the proposed approach and highlighted possible areas for improvement of its effectiveness.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,142
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • RE + C  Riduzione impatto manutenzione RE  Automatizzazione creazione e esecuzione test RE  Calcolo metriche dimensionamento per l’assessment
  • Non necessaria:basta una descrizione in due parole tipo: Web Applications based on scripting languages such as ASP, PHP and Javascript on the client side, have been studied because they were the most common technologies to produce Web Applications
  • Cenni ai lavori presenti sull’analisi delle tracce
  • Cenni ai lavori presenti sull’analisi delle tracce
  • This model is an extension of the one proposed by Jim Conallen
  • WARE in realtà si integra con molti altri tool realizzati IRF: Intermediate Representation Form
  • 3) Clustering Raggruppamento di componenti che collaborano alla realizzazione delle funzionalità dell’applicazione 4) Astrazione di diagrammi UML Produzione di diagrammi di dettaglio sulla base delle informazioni estratte
  • Reverse Engineering Web Applications

    1. 1. Ph.D. Dissertation Forum – ICSM 2005Ph.D. DissertationPh.D. DissertationReverse EngineeringReverse EngineeringWeb ApplicationsWeb ApplicationsPorfirio TramontanaPorfirio TramontanaUniversity of Naples “Federico II”University of Naples “Federico II”
    2. 2. Ph.D. Dissertation Forum – ICSM 2005Web Applications: open problemsWeb Applications: open problems In the past years, a great request for WebApplications takes place, due to the World WideWeb diffusion making available many servicesall over the world Web Applications have been developed withimmature design methodologies andtechnologies Nowadays, there is a number of legacy WebApplications needing for maintenance and re-engineering
    3. 3. Ph.D. Dissertation Forum – ICSM 2005Ph. D. Thesis Goals• To propose models, methods and toolssupporting Reverse Engineering andComprehension of Web Applications• Reverse Engineering and comprehension arefundamental tasks needed to efficiently supportmaintenance, testing and quality assessment ofWeb ApplicationsDoctoral Thesis Goals
    4. 4. Ph.D. Dissertation Forum – ICSM 2005Peculiarities of script-based Web Applications Page based Client-Server Architecture Interpreted languages Client pages may be generated “on the fly” Client pages are executed in a browser (and thedesigner doesn’t know what kind of browser willbe used) HTML interpreters are fault tolerant ... and so on ...
    5. 5. Ph.D. Dissertation Forum – ICSM 2005A process for theA process for theReverse Engineering of Web ApplicationsReverse Engineering of Web ApplicationsAbstractionExtractionWASourceCodeStaticAnalysisDynamicAnalysisBusiness Level UMLDiagram AbstractionsWA ExecutionIdentification ofcloned componentsIdentification ofInteraction DesignPatternsAssignment ofConceptsFunctional ClusteringCloned componentsInteraction Design PatternsConcepts describing ReverseEngineering artifactsGroups ofpages realizing WebApplication use casesStructuraland BusinessLevel UML diagramsMaintanability assessmentAbstractionExtractionWASourceCodeStaticAnalysisDynamicAnalysisBusiness Level UMLDiagram AbstractionsWA ExecutionIdentification ofcloned componentsIdentification ofInteraction DesignPatternsAssignment ofConceptsFunctional ClusteringCloned componentsInteraction Design PatternsConcepts describing ReverseEngineering artifactsGroups ofpages realizing WebApplication use casesStructuraland BusinessLevel UML diagramsMaintanability assessmentG.A. Di Lucca, A.R. Fasolino, P. Tramontana, “Reverse Engineering Web Application: the WARE approach”, Journal of Software Maintenance and Evolution:Research and Practice, Volume 16, Issue 1-2, Date: January - April 2004, Pages: 71-101
    6. 6. Ph.D. Dissertation Forum – ICSM 2005Analysis of Web ApplicationsAnalysis of Web Applications1) Static analysis of the source codeA multi-language parser analysing the source code of Serverpages, Client pages and Script modules has been realized.During the analysis of server pages, facts related to the clientpages that are built by server pages are also recorded.Static analysis results are stored in a intermediate form and areused to fill a relational database2) Dynamic AnalysisAnalysis of Built Client pages in order to add to the database somefacts that have been observed by executing the applicationThe reference model adopted is an extension of the one proposed byConallen for the forward engineering of Web Applications
    7. 7. Ph.D. Dissertation Forum – ICSM 2005Model of Web ApplicationsModel of Web ApplicationsStatic PageDB InterfaceJava AppletTextareaSelect ButtonMedia Flash Object Mail AddressMail Interface Server File InterfaceOther ObjectGeneric FileDownloadParameterOther InterfaceHyperlinkFrameWeb ObjectFramesetAnchorFieldServer Function Server ClassInterface ObjectBuilt PageFormServer ScriptSession VariableServer CookieServer PageSubmitsincludeHTML TagWeb PagesourceredirectClient PageClient ScripteventModify TagredirectClient FunctionClient Module
    8. 8. Ph.D. Dissertation Forum – ICSM 2005WARE (Web Application Reverse Engineering)WARE (Web Application Reverse Engineering)tooltoolExtractor AbstractorInterfacelayerIRFDBRDiagramsRepositoryHTMLParsersServiceLayerWARE-ToolWASourceFilesWAREGUIGraphical VisualizerDottyVCG RIGIASPVBSPHPJS….IRF TranslatorQuery ExecutorUML DiagramsAbstractor /areadocente.html/check.aspRedirect/check.aspBuilds/autenticazionedocente.htmlSubmit/check.asp /check.asp/check.aspSubmit/areadocente.html/check.aspRedirect/check.aspBuilds/autenticazionedocente.htmlSubmit/check.asp /check.asp/check.aspSubmitWARE ArchitectureDetail Class Diagram abstracted by WAREG. A. Di Lucca, A.R. Fasolino, U. De Carlini, F. Pace, P. Tramontana, “WARE: a tool for the Reverse Engineering of web Applications”, Proc. of 6thIEEE European Conference on Software Maintenance and Reengineering, CSMR 2002, IEEE CS Press, Los Alamitos, CA, Pages:241 - 250
    9. 9. Ph.D. Dissertation Forum – ICSM 2005Functional Clustering of Web PagesFunctional Clustering of Web Pages• Goal:To cluster together subsetsof components realizing WebApplication functionalities• Proposed Technique:Hierarchical clusteringalgorithm, grouping WebApplication pages insubsets, maximizing thecohesion and minimizing thecoupling between themG. A. Di Lucca, A.R. Fasolino, U. De Carlini, F. Pace, P. Tramontana,“Comprehending Web Applications by a Clustering Based Approach”, Proc. of 10thIEEE Workshop on Program Comprehension, IWPC 2002, Pages:261 - 270
    10. 10. Ph.D. Dissertation Forum – ICSM 2005Concept AssignmentConcept Assignment Goal:Goal: To identify the more relevantTo identify the more relevantconcepts in client pages withconcepts in client pages withthe purpose to suggest athe purpose to suggest asemantic description of clientsemantic description of clientpages and of functionalpages and of functionalclusters of pagesclusters of pages Proposed Technique:Proposed Technique: Heuristic Algorithms basedHeuristic Algorithms basedon Information Retrievalon Information Retrieval Candidate concepts areCandidate concepts aresearched in textual content ofsearched in textual content ofclient pagesclient pages Single common words and shortSingle common words and shortword sequences are candidatedword sequences are candidatedto be conceptsto be conceptsBuilt Client PageServer Page0..*10..*1<<builds>>Data ComponentStopWordWordhas synonymhas stemWeb PageStatic Client PageAttributeNameTagNameWeightnested in0..*0..*Control Component0..*0..*Client PageFile name1111TextWeight0..*0..*0..10..10..10..10..*0..1 0..*0..1Concept11111111G.A. Di Lucca, A.R.Fasolino, P.Tramontana, U.De Carlini, “Supporting Concept Assignment in the Comprehension of Web Applications”, Proceedings ofthe 28thIEEE Annual International Computer Software and Applications Conference, COMPSAC 2004
    11. 11. Ph.D. Dissertation Forum – ICSM 2005Interaction Design PatternInteraction Design Patterns Identifications Identification Goal:Goal: To identify repetitive structures in Web ClientTo identify repetitive structures in Web Clientpagespages These structures can be related to knownThese structures can be related to knownProgramming PatternsProgramming Patterns Proposed Technique:Proposed Technique: Statistical methodology based on featuresStatistical methodology based on featuresextracted in the source code of client pages.extracted in the source code of client pages. Presence, quantity and dimension of forms,Presence, quantity and dimension of forms,tables, input fields, frames, common keywordstables, input fields, frames, common keywordsand so on.and so on.G.A. Di Lucca, A.R.Fasolino, P.Tramontana, “Recovering Interaction Design Patterns in Web Applications”, submitted to 9thIEEE EuropeanConference on Software Maintenace and Reengineering, CSMR 2005
    12. 12. Ph.D. Dissertation Forum – ICSM 2005Identification of cloned componentsIdentification of cloned components Goals:Goals: Re-Engineering of cloned components via codeRe-Engineering of cloned components via codetransformationstransformations Classification of Built Client PagesClassification of Built Client Pages Identification of reusable Programming PatternsIdentification of reusable Programming Patterns Proposed Techniques:Proposed Techniques: Extraction of features in the structure of Client pagesExtraction of features in the structure of Client pagesand in the source code of server pagesand in the source code of server pages Computation of distance measures between pagesComputation of distance measures between pages(Euclidean dstance, Levenshtein edit distance)(Euclidean dstance, Levenshtein edit distance)G.A. Di Lucca, A.R. Fasolino, P. Tramontana, U. De Carlini, “Identifying Reusable Components in Web Applications”, IASTED InternationalConference on Software Engineering, SE 2004, pp.526-531
    13. 13. Ph.D. Dissertation Forum – ICSM 2005Abstraction of Business Level ModelsAbstraction of Business Level Models Goals:Goals: To abstract object orientedTo abstract object orientedbusiness level models of Webbusiness level models of WebApplicationsApplications Proposed Techniques:Proposed Techniques: Classes and attributes areClasses and attributes areidentified by analysing theidentified by analysing thedata that are exchangeddata that are exchangedbetween user, Web pagesbetween user, Web pagesand databases.and databases. Class methods are identifiedClass methods are identifiedby analysing the functionsby analysing the functionsimplemented by cluster ofimplemented by cluster ofpagespages Relationships betweenRelationships betweenclasses are identifiedclasses are identifiedanalysing data structures andanalysing data structures anddata flow among pagesdata flow among pagesTutoring requestDateTeacherNameSurnameE-mailPhone numberPasswordCodeTutoringDateStart timeEnd timeNewsNumberDateTextStudentNameSurnameE-mailPasswordCodePhone numberExamDateTimeClassroomCourseAcademic yearCodeNameExam ReservationDateG.A. Di Lucca, A.R.Fasolino, U.De Carlini, P.Tramontana, “Recovering a Business Object Model from Web Applications”, Proceedings of the 27thIEEE Annual International Computer Software and Applications Conference, COMPSAC 2003, Pages: 348 - 353
    14. 14. Ph.D. Dissertation Forum – ICSM 2005Maintainability ModelMaintainability Model Goals:Goals: To propose models and methods for the assessmentTo propose models and methods for the assessmentof the maintainability of Web Applicationsof the maintainability of Web Applications Proposed Models and Techniques:Proposed Models and Techniques: Adapting to Web Applications the Oman modelAdapting to Web Applications the Oman model(thought for traditional applications)(thought for traditional applications) Selection of a set of product metrics and proposal of aSelection of a set of product metrics and proposal of amaintainability index that can be calculated withmaintainability index that can be calculated withnegligible effort and timenegligible effort and timeG.A. Di Lucca, A.R.Fasolino, P.Tramontana, C.A.Visaggio, “Towards the definition of a maintainability model for web applications”, Proceedings ofthe Eighth IEEE European Conference on Software Maintenance and Reengineering, CSMR 2004, pages:279 - 287
    15. 15. Ph.D. Dissertation Forum – ICSM 2005Current and future worksCurrent and future works Techniques for the dynamicTechniques for the dynamicanalysis of Web Applicationsanalysis of Web Applications Accessibility assessment of ClientAccessibility assessment of Clientpagespages Migration from Web Applications toMigration from Web Applications toWeb ServicesWeb Services Testing of Web ApplicationsTesting of Web Applications Mutation Testing techniquesMutation Testing techniques Maintainability assessmentMaintainability assessment Definition of ageing measures for WebDefinition of ageing measures for WebApplicationsApplicationsG.A. Di Lucca, M. Di Penta, A.R.Fasolino, P. Tramontana, “SupportingWeb Application Evolution by DynamicAnalysis”, IWPSE 2005G.A. Di Lucca, A.R. Fasolino, P.Tramontana, “Web Site Accessibility:Identifying and Fixing of AccessibilityProblems in Client Page Code”, WSE2005
    16. 16. Ph.D. Dissertation Forum – ICSM 2005

    ×