Software Archaeology

1,730 views

Published on

As Java deployments have become more complex, it has become harder and harder to get good insight into the structure and execution flow of the application. This applies particularly where middleware and third-party components or legacy software that has little or no design documentation is involved. This session introduces you to software archaeology: how to build an understanding of the layout and execution of your Java application from the deployed application itself.

Presented at JavaOne 2012.

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,730
On SlideShare
0
From Embeds
0
Number of Embeds
35
Actions
Shares
0
Downloads
43
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Software Archaeology

  1. 1. Chris Bailey – IBM Java Service Architect2nd October 2012Software ArchaeologyRediscovering Your Architecture 1 © 2012 IBM Corporation
  2. 2. Important DisclaimersTHE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINEDIN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLEDENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR INFRASTRUCTUREDIFFERENCES.ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE.IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT PRODUCTPLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE.IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USEOF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:- CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIRSUPPLIERS AND/OR LICENSORS2 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  3. 3. Introduction to the speaker■ 12 years experience developing and deploying Java SDKs■ Recent work focus: – Java applications in the cloud – Java usability and quality – Debugging tools and capabilities – Requirements gathering – Highly resilient and scalable deployments■ My contact information: – baileyc@uk.ibm.com – http://www.linkedin.com/profile/view?id=31006663 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  4. 4. Goals of the Talk■ Introduce the concept of Software Archaeology and closing the software lifecycle■ Discuss some of the available methodologies■ Show how to rediscover your architecture with a sample application4 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  5. 5. Agenda■ The Software Lifecycle■ Software Archaeology■ An Introduction to UML■ Digging into applications5 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  6. 6. The Software Lifecycle “A software development process, also known as a software development life-cycle (SDLC), is a structure imposed on the development of a software product. Similar terms include software life cycle and software process. It is often considered a subset of systems development life cycle” - Wikipedia■ Should be a closed loop, iterative process■ Design should relate to Requirements■ Implementation should relate to Design■ Even as the application evolves6 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  7. 7. The Waterfall Model■ Not uncommon for applications to be more like the waterfall model■ Software lifecycle becomes linear...■ Changes needed between: – Requirements and Design – Design and Implementation … and as a result of maintenance become lost■ Result is that deployments do not reflect their original designs7 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  8. 8. Legacy Systems and Code■ Old applications or code that have been inherited■ Typically in maintenance, and providing a valuable function■ However it often provides challenges: – Maintenance cost is usually high • Outside of normal vendor support – Little or no requirement and design documentation • Potentially even without source code for some parts!8 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  9. 9. Software Archaeology “..the study of poorly documented or undocumented legacy software implementations, as part of software maintenance. .. includes the reverse engineering of software modules, and the application of a variety of tools and processes for extracting and understanding program structure and recovering design information. .. may reveal dysfunctional team processes which have produced poorly designed or even unused software modules.” - Wikipedia■ Software archaeology makes it possible to build designs from legacy systems and code9 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  10. 10. An Introduction to Unified Modelling Language (UML) “...a standardized general-purpose modeling language in the field of object-oriented software engineering. The standard is managed, and was created, by the Object Management Group. ...offers a standard way to visualize a systems architectural blueprints” - Wikipedia■ A unified standard modelling notation■ Allows the creation of structure and design plans■ UML provides two categories of diagram: – Behaviour Diagrams: show the dynamic behaviour between the elements of the system – Structural Diagrams: show the static structure of the system being modelled10 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  11. 11. UML Behaviour Diagrams■ Use-case diagram – used to visualize the functional requirements of a system – shows the relationship of "actors" to essential processes – often used to show the relationship between use cases11 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  12. 12. UML Behaviour Diagrams■ Activity Diagram – Shows flow of control during an activity – Best used to model to model higher-level processes – “Less technical" than sequence diagrams12 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  13. 13. UML Behaviour Diagrams■ Statechart Diagram – Shows the different states that a class can be in – Shows the transitions from state to state – Usually only covers "interesting" classes13 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  14. 14. UML Behaviour Diagrams■ Sequence Diagram – Shows the interactions between objects in the sequential order that those interactions occur – Applied to code objects, but can be applied to business objects. – Can be used for communication between teams or organisations14 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  15. 15. UML Structural Diagrams■ Deployment Diagram – Shows how a system will be physically deployed in the hardware environment – Shows where the different entities of the system will physically run – Shows how they will communicate with each other – Models the physical deployment15 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  16. 16. UML Structural Diagrams■ Component Diagram – Shows the dependencies that the software has on the other software components – Usually shown at a high level with large-grain components – Or at the component package level16 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  17. 17. UML Structural Diagrams■ Class diagrams – Shows the static structures of the system, generally implementation classes – Shows how the different entities relate to each other■ Object Diagrams: – Show a complete or partial view of the structure at a specific time. – Focus on a set of object instances and attributes, and the links between the instances – More concrete than class diagrams – Often used to provide examples, or act as test cases for the class diagrams17 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  18. 18. Software Archaeology: Methodologies■ Various methods exist for Software Archaeology, including: – Static analysis • Analysis of source code by tooling or by hand – Trace based analysis • Injecting trace into running application – Debugger based analysis • Stepping through code under a debugger18 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  19. 19. Digging into a Simple Application■ A simple application: “JavaGrep”■ Usage: JavaGrep <pattern> <list of files>■ Where: – Pattern is a search term – List of files is one of more files to search19 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  20. 20. Digging into a Simple Application■ JavaGrep.java: Compile regexp Add files to fileList Create FileScanner scan() Get matching lines Print matching lines For Each File Print summary20 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  21. 21. Digging into a Simple Application■ FileScanner.java: Create LineNumberReader Get Line For Each Check for match Line Add to matchLines21 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  22. 22. Digging into a Simple Application Run regexp matcher against line Get ArrayList of matching lines Get scanned line count Get matched line count22 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  23. 23. Understanding the code behaviour■ Dynamic behaviour at code level is the method call graph – Interaction between classes and objects – Sequence of methods called and returned■ Best fit in UML is the sequence diagram23 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  24. 24. Generating a sequence diagram■ Static Analysis – Limited tooling exists – Really needs to be done manually■ Runtime Analysis – Activate tracing mechanism(s) – Run the application • Code coverage testing • QA or Performance Testing • Production!24 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  25. 25. Generated Sequence Diagram: Static Analysis■ Static Analysis Results: – Requires reading of the code! – Accurate diagram – Limited to available source • JavaGrep • FileScanner – Possible to include other calls • Limited to single extra call25 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  26. 26. Runtime Analysis: Profiling Output■ Profiling output: – Full graph of all running methods – Access to call frequency/cost information – Requires methods to be run... – No call ordering or count data26 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  27. 27. Runtime Analysis: Profiling Output■ Digging into FileScanner.scan(): – FileScanner.scanLine() – LineNumberReader.readLine() – StringBuilder.append() – ArrayList.<init>() – StringBuilder.append() – ArrayList.add()27 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  28. 28. Generated Sequence Diagram: Runtime Analysis■ Profiling Analysis: – Scope to cover all code – No ordering information – No call count information28 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  29. 29. Runtime Analysis: Limited tracing to add order and count information14:38:09.260*0x6db300 mt.3 > JavaGrep.<clinit>()V Bytecode static method14:38:09.260 0x6db300 mt.9 < JavaGrep.<clinit>()V14:38:09.260 0x6db300 mt.3 > JavaGrep.main([Ljava/lang/String;)V14:38:09.281 0x6db300 mt.0 > FileScanner.<init>(Ljava/lang/String;Ljava/util/regex/Pattern;)V14:38:09.283 0x6db300 mt.6 < FileScanner.<init>(Ljava/lang/String;Ljava/util/regex/Pattern;)V14:38:09.283 0x6db300 mt.0 > FileScanner.scan()I14:38:09.292 0x6db300 mt.0 > FileScanner.scanLine(Ljava/lang/String;)............ ........ .... .....14:38:09.296 0x6db300 mt.6 < FileScanner.scanLine(Ljava/lang/String;)Z14:38:09.296 0x6db300 mt.6 < FileScanner.scan()I14:38:09.296 0x6db300 mt.0 > FileScanner.getMatchLines()Ljava/util/ArrayList;14:38:09.296 0x6db300 mt.6 < FileScanner.getMatchLines()Ljava/util/ArrayList;14:38:09.296 0x6db300 mt.0 > FileScanner.getMatchCount()I14:38:09.296 0x6db300 mt.6 < FileScanner.getMatchCount()I14:38:09.297 0x6db300 mt.0 > FileScanner.getMatchCount()I14:38:09.297 0x6db300 mt.6 < FileScanner.getMatchCount()I14:38:09.297 0x6db300 mt.0 > FileScanner.getLineCount()I14:38:09.297 0x6db300 mt.6 < FileScanner.getLineCount()I14:38:09.297 0x6db300 mt.9 < JavaGrep.main([Ljava/lang/String;)V■ Order is: scan() -> getMatchedLines() -> getMatchCount() -> getMatchCount() -> getLineCount()29 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  30. 30. Generated Sequence Diagram: Runtime Analysis■ Runtime Analysis: – Scope to cover all code – Ordering information included – Call count information included – Only covers executed code*30 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  31. 31. Understanding the structure■ Structure at code level is the object graph – Interaction between classes and objects – References between object on the Java heap■ Best fit in UML is the class and/or objectdiagram31 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  32. 32. Generating a Class or Object Diagram■ Static Analysis – Provide tool with access to necessary source – Run analysis!■ Runtime Analysis – Run the application • Code coverage testing • QA or Performance Testing • Production! – Generate a system or heap dump32 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  33. 33. Generated Class Diagram: Static Analysis■ Static Analysis Results: – Accurate diagram – Limited to available source33 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  34. 34. Runtime Analysis: System Dump Information■ JavaGrep: – Static attributes: • matchString java.util.String “for” • fileList java.util.ArrayList size is 1, one entry: “..srcJavaGrep” • _pattern java.util.regex.Pattern “sfors” compiled = true34 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  35. 35. Runtime Analysis: System Dump Information■ FileList java.uti..ArrayList: – Instance attributes: • ElementData java.lang.Object[10] single entry of “..srcJavaGrep.java” • size int 1 • modCount int 135 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  36. 36. Runtime Analysis: System Dump Information■ FileScanner – Object instance attributes: • _fileName java.lang.String “..srcJavaGrep.java” • _fReader java.io.LineNumberReader object @ 0x228165e0 • _pattern java.lang.String “sfors” compiled = true • _matchLines java.util.ArrayList object @ 0x22864410 • _matchCount int 4 • _lineCount int 4136 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  37. 37. Generated Class Diagram: Runtime Analysis■ Runtime Analysis Results: – The same accurate diagram – Methods could be added from sequence diagram – Scope to add further classes/objects37 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  38. 38. Digging into Something More Complex...■ Apache Tomcat 7.0.27■ org.apache.catalina.startup.Bootstrap.main() – init() – load() – start()■ Uses reflection to call: – org.apache.catalina.startup.Catalina.*38 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  39. 39. Digging into Something More Complex....39 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  40. 40. Digging into Something More Complex....■ Digging into “server”:40 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  41. 41. Summary■ Its possible to use static and/or runtime tooling to understand an application■ Allows you to close the development lifecycle – even for legacy and 3rd party code■ Allows you to migrate legacy systems where original requirements are lost – Definition of existing system as source of requirements for new implementation■ Allows you to debug problems – difference between design and implementation may well be your issue41 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  42. 42. References■ Get Products and Technologies: – IBM Monitoring and Diagnostic Tools for Java: • https://www.ibm.com/developerworks/java/jdk/tools/■ Learn: – Health Center InfoCenter: • http://publib.boulder.ibm.com/infocenter/hctool/v1r0/index.jsp■ Discuss: – IBM on Troubleshooting Java Applications Blog: • https://www.ibm.com/developerworks/mydeveloperworks/blogs/troubleshootingjava/ – Health Center Forum: • http://www.ibm.com/developerworks/forums/forum.jspa?forumID=1461 – IBM Java Runtimes and SDKs Forum: • http://www.ibm.com/developerworks/forums/forum.jspa?forumID=367&start=042 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation
  43. 43. Copyright and Trademarks© IBM Corporation 2012. All Rights Reserved.IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International BusinessMachines Corp., and registered in many jurisdictions worldwide.Other product and service names might be trademarks of IBM or other companies.A current list of IBM trademarks is available on the Web – see the IBM “Copyright and trademarkinformation” page at URL: www.ibm.com/legal/copytrade.shtml43 Rediscovering Your Architecture: With Software Archaeology © 2012 IBM Corporation

×