Getting Unstuck: Working with Legacy Code and Data

2,005 views

Published on

From this presentation for the IASA in 2007, Cory covers common challenges in dealing with Legacy Code and Data, and some tools and techniques for handling them.

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,005
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
30
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • 01/31/12
  • Getting Unstuck: Working with Legacy Code and Data

    1. 1. GETTING UNSTUCK: WORKING WITH LEGACY CODE AND DATA Cory Foy – http://www.cornetdesign.com
    2. 2. Goals <ul><li>What is Legacy Code? </li></ul><ul><li>How do we change Legacy Code? </li></ul><ul><li>Common patterns for code bases </li></ul><ul><li>Does Legacy Code have to be code, or can it be something else like a really long bullet on a PowerPoint slide, or perhaps a database? </li></ul><ul><li>Next Steps </li></ul>
    3. 3. Legacy Code <ul><li>How do you define Legacy Code? </li></ul><ul><li>Several definitions possible </li></ul><ul><ul><li>Code we’ve gotten from somewhere else </li></ul></ul><ul><ul><li>Code you have to change, but don’t understand </li></ul></ul><ul><ul><li>Demoralizing code (Big ball of mud) </li></ul></ul><ul><ul><li>Code without unit tests </li></ul></ul>
    4. 4. Legacy Code
    5. 5. Legacy Code <ul><li>Code that needs to have behavior preserved </li></ul><ul><li>What is behavior? </li></ul><ul><ul><li>The way in which someone behaves </li></ul></ul><ul><ul><li>The way in which a person, organism, or group responds to a specific set of conditions </li></ul></ul><ul><ul><li>The way that a machine operates or a substance reacts under a specific set of conditions </li></ul></ul>
    6. 6. Legacy Code <ul><ul><li>What’s the behavior of the following code? </li></ul></ul>
    7. 7. Legacy Code <ul><ul><li>Does the following code add behavior? </li></ul></ul>
    8. 8. Legacy Code <ul><ul><li>Now have we changed the behavior? </li></ul></ul>
    9. 9. How do we change Legacy Code? <ul><li>Why would we want to change the code? </li></ul><ul><li>Four reasons to change software </li></ul><ul><ul><li>Adding a feature </li></ul></ul><ul><ul><li>Fixing a bug </li></ul></ul><ul><ul><li>Improving the design </li></ul></ul><ul><ul><li>Optimizing resource usage </li></ul></ul><ul><li>Each has unique attributes </li></ul>
    10. 10. Adding a feature / Fixing a bug <ul><li>Causes the following changes </li></ul><ul><ul><li>Structure </li></ul></ul><ul><ul><li>Functionality (adding or replacing) </li></ul></ul><ul><li>Need to be able to know the new functionality works </li></ul><ul><li>Need to be able to know that the system as a whole is still functioning appropriately </li></ul>
    11. 11. Improving the Design <ul><li>Causes the following changes: </li></ul><ul><ul><li>Structure </li></ul></ul><ul><li>Note that it does functionality is not listed above </li></ul><ul><li>Important to be able to know that all functionality works before and after the change </li></ul>
    12. 12. Optimizing Resource Usage <ul><li>Changes </li></ul><ul><ul><li>Resource usage </li></ul></ul><ul><ul><li>May cause structure change </li></ul></ul><ul><li>Again note that functionality is ideally not in the above list </li></ul><ul><li>Need to have a way to make sure functionality was not changed </li></ul><ul><li>Need to have a way to verify the optimization goals have been met (and stay met) </li></ul>
    13. 13. Edit and Pray <ul><li>Carefully plan the changes you are going to make </li></ul><ul><li>Make sure you understand the code to be modified </li></ul><ul><li>Make the changes </li></ul><ul><li>Run the system to make sure the change was made </li></ul><ul><li>Do some additional testing to smoke test that everything seems to be functioning </li></ul><ul><li>Pray you don’t get a call at 2am that the system doesn’t work anymore </li></ul>
    14. 14. Cover and Modify <ul><li>Verify that the system is working by running the tests </li></ul><ul><li>Write tests to expose the behavior you want to add or change </li></ul><ul><li>Write code to make the test pass </li></ul><ul><li>Refactor duplication </li></ul><ul><li>Wash, rinse, repeat </li></ul><ul><li>Verify the system is still working by running the tests </li></ul>
    15. 15. Feather’s Legacy Change Algorithm <ul><li>Michael Feather’s discusses a Legacy Code Change Algorithm in Working Effectively with Legacy Code </li></ul><ul><li>Five steps </li></ul><ul><ul><li>Identify change points </li></ul></ul><ul><ul><li>Find test points </li></ul></ul><ul><ul><li>Break dependencies </li></ul></ul><ul><ul><li>Write tests </li></ul></ul><ul><ul><li>Make changes and refactor </li></ul></ul><ul><li>These steps have common steps and scenarios </li></ul>
    16. 16. Patterns for the Change Algorithm <ul><li>Identify Change Points </li></ul><ul><ul><li>One of the key areas architects and architecture comes into play </li></ul></ul><ul><ul><li>If you aren’t sure where, put it in – you can refactor later (with unit test support) </li></ul></ul>
    17. 17. Patterns for the Change Algorithm <ul><li>Identify Change Points </li></ul><ul><ul><li>Scenarios </li></ul></ul><ul><ul><ul><li>I don’t understand the code well enough to change it </li></ul></ul></ul><ul><ul><ul><ul><li>Notes / Sketching </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Listing Markup </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Separate Responsibilities </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Understand method structure </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Extract Methods </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Effect Sketch </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>Scratch Refactoring </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Delete Unused Code </li></ul></ul></ul></ul>
    18. 18. Patterns for the Change Algorithm <ul><li>Identify Change Points </li></ul><ul><ul><li>Scenarios </li></ul></ul><ul><ul><ul><li>My application has no structure </li></ul></ul></ul><ul><ul><ul><ul><li>Tell the story of the system </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Naked CRC (Class, Responsibility, and Collaborations) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Conversation Scrutiny </li></ul></ul></ul></ul>
    19. 19. Patterns for the Change Algorithm <ul><li>Find Test Points </li></ul><ul><ul><li>Where can you write tests to exercise the behavior you want to add/change? </li></ul></ul><ul><ul><li>Important to have team standards for where unit tests should go </li></ul></ul>
    20. 20. Patterns for the Change Algorithm <ul><li>Find Test Points </li></ul><ul><ul><li>Scenarios </li></ul></ul><ul><ul><ul><li>I need to make a change, what methods should I test? </li></ul></ul></ul><ul><ul><ul><ul><li>Reason about effects (Effect Sketch) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Reasoning Forward (TDD) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Effect propagation </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Effect reasoning </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Effect analysis </li></ul></ul></ul></ul>
    21. 21. Patterns for the Change Algorithm <ul><li>Find Test Points </li></ul><ul><ul><li>Scenarios </li></ul></ul><ul><ul><ul><li>I need to make many changes in one area – do I have to break all dependencies? </li></ul></ul></ul><ul><ul><ul><ul><li>Interception Points </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Higher-Level interception points </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Pinch Points (encapsulation boundary) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Pinch Point Traps </li></ul></ul></ul></ul>
    22. 22. Patterns for the Change Algorithm <ul><li>Break Dependencies </li></ul><ul><ul><li>Generally the most difficult part of the process </li></ul></ul><ul><ul><li>Usually don’t have tests to tell if breaking dependencies will cause problems </li></ul></ul>
    23. 23. Patterns for the Change Algorithm <ul><li>Break Dependencies </li></ul><ul><ul><li>Scenarios </li></ul></ul><ul><ul><ul><li>How do I know I’m not breaking anything? </li></ul></ul></ul><ul><ul><ul><ul><li>Hyperaware editing </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Single-goal editing </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Preserve Signatures </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Lean on the compiler </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Pair Programming (aka Real-Time Code Reviews) </li></ul></ul></ul></ul>
    24. 24. Patterns for the Change Algorithm <ul><li>Break Dependencies </li></ul><ul><ul><li>Scenarios </li></ul></ul><ul><ul><ul><li>I can’t get this class into a test harness </li></ul></ul></ul><ul><ul><ul><ul><li>Irritating Parameters </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Hidden Dependencies </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Construction Blob </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Irritating Global Dependency </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Horrible Include Dependencies </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Onion Parameter </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Aliased Parameter </li></ul></ul></ul></ul>
    25. 25. Patterns for the Change Algorithm <ul><li>Break Dependencies </li></ul><ul><ul><li>Scenarios </li></ul></ul><ul><ul><ul><li>I can’t run this method in a test harness </li></ul></ul></ul><ul><ul><ul><ul><li>Hidden Methods </li></ul></ul></ul></ul><ul><ul><ul><ul><li>“ Helpful” language features </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Undetectable Side Effect </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Sensing variables </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Command/Query Separation </li></ul></ul></ul></ul></ul>
    26. 26. Patterns for the Change Algorithm <ul><li>Break Dependencies </li></ul><ul><ul><li>Scenarios </li></ul></ul><ul><ul><ul><li>I need to change a monster method and can’t write tests </li></ul></ul></ul><ul><ul><ul><ul><li>Introduce sensing variables </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Extract what you know </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Break out a method object </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Skeletonize Methods </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Find Sequences </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Extract to the current class first </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Extract small pieces </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Be prepared to redo extractions </li></ul></ul></ul></ul>
    27. 27. Patterns for the Change Algorithm <ul><li>Break Dependencies </li></ul><ul><ul><li>Scenarios </li></ul></ul><ul><ul><ul><li>It takes forever to make a change </li></ul></ul></ul><ul><ul><ul><ul><li>Understanding </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Lag Time </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Breaking Dependencies </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Build Dependencies </li></ul></ul></ul></ul>
    28. 28. Patterns for the Change Algorithm <ul><li>Write Tests </li></ul><ul><ul><li>Tests may be more difficult to write then normal unit tests </li></ul></ul><ul><ul><li>May have less-than-ideal scenarios </li></ul></ul>
    29. 29. Patterns for the Change Algorithm <ul><li>Write Tests </li></ul><ul><ul><li>Scenarios </li></ul></ul><ul><ul><ul><li>I need to make a change, but don’t know what tests to write </li></ul></ul></ul><ul><ul><ul><ul><li>Characterization Tests </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Characterizing Classes </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Targeted Testing </li></ul></ul></ul></ul><ul><ul><ul><li>Writing Characterization Tests </li></ul></ul></ul><ul><ul><ul><ul><li>Write tests for the area you’ll be making the change. Write as many as you need to understand the code. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Then write tests for the things you need to change </li></ul></ul></ul></ul><ul><ul><ul><ul><li>If converting or moving functionality, write tests to verify the behavior on a case-by-case basis </li></ul></ul></ul></ul>
    30. 30. DEMO: Change Algorithm at Work <ul><li>Step through a common scenario, implementing the tests as we go </li></ul>
    31. 31. Legacy Code isn’t just Code <ul><li>Most applications aren’t just simple console apps </li></ul><ul><li>They deal with many dependencies </li></ul><ul><ul><li>File Systems </li></ul></ul><ul><ul><li>Registries </li></ul></ul><ul><ul><li>Databases </li></ul></ul><ul><ul><li>Hardware </li></ul></ul>
    32. 32. Legacy Code isn’t just Code <ul><li>These dependencies can cause legacy problems of their own </li></ul><ul><ul><li>Database schemas </li></ul></ul><ul><ul><li>Existing data in the tables </li></ul></ul><ul><ul><li>Business logic in the database </li></ul></ul><ul><ul><li>No access to development data that mirrors production </li></ul></ul><ul><li>In other words, Legacy Data </li></ul>
    33. 33. Legacy Data <ul><li>So where does this Legacy Data come from? </li></ul><ul><ul><li>Flat Files </li></ul></ul><ul><ul><li>XML Documents </li></ul></ul><ul><ul><li>RDB’s </li></ul></ul><ul><ul><li>Object DB’s </li></ul></ul><ul><ul><li>Other DB’s </li></ul></ul><ul><ul><li>Application Wrappers </li></ul></ul><ul><ul><li>Your DB </li></ul></ul><ul><ul><li>Many, many sources </li></ul></ul>
    34. 34. Legacy Data <ul><li>Legacy data produces its own unique set of challenges </li></ul><ul><ul><li>Data quality </li></ul></ul><ul><ul><li>Data architecture problems </li></ul></ul><ul><ul><li>Database design problems </li></ul></ul><ul><ul><li>Process-related challenges </li></ul></ul>
    35. 35. Data Quality <ul><li>Common Data Quality problems </li></ul>http://www.agiledata.org/essays/legacyDatabases.html#DataProblems <ul><li>A single column is used for several purposes </li></ul><ul><li>Determining the purpose of a column by the value of one or more other columns </li></ul><ul><li>Inconsistent data values / formatting </li></ul><ul><li>Missing data / columns </li></ul><ul><li>Additional columns </li></ul><ul><li>Important attributes and relationships are hidden in text fields </li></ul><ul><li>Data values that stray from their field descriptions and business rules </li></ul><ul><li>Various key strategies for the same type of entity </li></ul><ul><li>Unrealized relationships between data records </li></ul><ul><li>One attribute is stored in several fields </li></ul><ul><li>Inconsistent use of special characters </li></ul><ul><li>Different data types for similar columns </li></ul><ul><li>Different levels of detail </li></ul><ul><li>Different modes of operation </li></ul><ul><li>Varying timeliness of data </li></ul><ul><li>Varying default values </li></ul><ul><li>Various representations </li></ul>
    36. 36. Data Architecture Problems <ul><li>Common Architectural Problems may include: </li></ul><ul><ul><li>Applications responsible for data cleansing (instead of DB) </li></ul></ul><ul><ul><li>Different database paradigms </li></ul></ul><ul><ul><li>Different hardware platforms / storage </li></ul></ul><ul><ul><li>Fragmented / Redundant / Inaccessible data sources </li></ul></ul><ul><ul><li>Inconsistent semantics </li></ul></ul><ul><ul><li>Inflexible architecture </li></ul></ul><ul><ul><li>Lack of event notification </li></ul></ul><ul><ul><li>No or inefficient security </li></ul></ul><ul><ul><li>Varying timeliness of data sources </li></ul></ul>
    37. 37. Design Problems <ul><li>There may be key design issues with the database </li></ul><ul><ul><li>Database encapsulation scheme exists, but it’s difficult to use </li></ul></ul><ul><ul><li>Ineffective (or no) naming conventions </li></ul></ul><ul><ul><li>Inadequate documentation </li></ul></ul><ul><ul><li>Original design goals at odds with current project needs </li></ul></ul><ul><ul><li>Inconsistent key strategy </li></ul></ul><ul><ul><li>Design goals at odds with data storage (treating relational DBs as object DBs, etc) </li></ul></ul>
    38. 38. Design Problems <ul><li>Example </li></ul><ul><ul><li>Application which presented custom forms to users </li></ul></ul><ul><ul><li>Implementers could create custom forms with custom questions and validations </li></ul></ul><ul><ul><li>Beautiful OO architecture – Forms had Groups which had Items </li></ul></ul><ul><ul><li>Everything was rendered dynamically and could be updated on the fly </li></ul></ul>
    39. 39. Design Problems <ul><li>Example </li></ul><ul><ul><li>The Form, Group, Item and other “objects” were all stored as individual records in one database table </li></ul></ul><ul><ul><li>A user in the system had on average 74 forms with an average of 30 questions. With a target of 20,000 users in the database, this would lead to over 50 million rows in the one table. </li></ul></ul><ul><ul><li>We identified one stored proc as one of the main culprits. It had something like the following </li></ul></ul>
    40. 40. Design Problems <ul><li>Example </li></ul><ul><ul><li>INSERT INTO @tmpTable SELECT ot.myCol FROM OtherTable ot WHERE ot.bitMask & (144567 | 99435) = 0 </li></ul></ul><ul><ul><li>This led to a full table scan for one of their most heavily used procs – degrading performance significantly (average page load time of over 7 seconds) </li></ul></ul>
    41. 41. Working with Legacy Data <ul><li>So how do you deal with legacy data? </li></ul><ul><li>Strategies </li></ul><ul><ul><li>Avoid it </li></ul></ul><ul><ul><li>Develop Error Handling Strategy </li></ul></ul><ul><ul><li>Work Iteratively and Incrementally </li></ul></ul><ul><ul><li>Prefer Read-Only Legacy Access </li></ul></ul><ul><ul><li>Encapsulate Legacy Data Access </li></ul></ul><ul><ul><li>Introduce Data Adapters for Simple Data Access </li></ul></ul><ul><ul><li>Introduce a staging database for complex access </li></ul></ul><ul><ul><li>Adopt Existing Tools </li></ul></ul>
    42. 42. Working with Legacy Data <ul><li>We couldn’t avoid the data – the proc had to be changed </li></ul><ul><li>So we developed an incremental 5 step plan </li></ul><ul><ul><li>Add an IsValidRecord column to the table </li></ul></ul><ul><ul><li>Update the Column based on the bitmask for each row </li></ul></ul><ul><ul><li>Change the proc to use the column instead of the bitmask </li></ul></ul><ul><ul><li>Make sure all tests are still passing </li></ul></ul><ul><ul><li>Introduce Update and Insert Triggers to automatically populate the column </li></ul></ul>
    43. 43. Working with Legacy Data <ul><li>Advantages </li></ul><ul><ul><li>Required no change to application code </li></ul></ul><ul><ul><li>We could rapidly test the application </li></ul></ul><ul><ul><li>We could make incremental changes to see improvements </li></ul></ul><ul><li>What made it work </li></ul><ul><ul><li>Testing/QA Database with production-like data </li></ul></ul><ul><ul><li>Regression tests to insure functionality </li></ul></ul><ul><ul><li>Timing tests to show performance improvement </li></ul></ul>
    44. 44. Process Problems <ul><li>All the issues aren’t technical </li></ul><ul><ul><li>Working with legacy data when you don’t have to </li></ul></ul><ul><ul><li>Data design drives your object model </li></ul></ul><ul><ul><li>Legacy data issues overshadow everything else </li></ul></ul><ul><ul><li>App developers ignore legacy issues </li></ul></ul><ul><ul><li>You choose not to refactor the legacy data sources </li></ul></ul><ul><ul><li>Politics </li></ul></ul><ul><ul><li>You are too focused on the data to see the software </li></ul></ul>
    45. 45. Refactoring Databases <ul><li>Databases should not be left out of the refactoring process </li></ul><ul><ul><li>“ An interesting observation is that when you take a big design up front (BDUF) approach to development where your database schema is created early in the life of your project you are effectively inflicting a legacy schema on yourself. Don’t do this.” </li></ul></ul><ul><li>Scott Ambler maintains a catalog of DB Refactoring </li></ul><ul><li>How do you refactor a database? </li></ul>
    46. 46. Refactoring Databases
    47. 47. Refactoring Databases <ul><li>Implementing Database Refactoring in your organization </li></ul><ul><ul><li>Start simple </li></ul></ul><ul><ul><li>Accept that iterative and incremental development is the norm </li></ul></ul><ul><ul><li>Accept that there is no magic solution to get you out of your existing mess </li></ul></ul><ul><ul><li>Adopt a 100% regression testing policy </li></ul></ul><ul><ul><li>Try it </li></ul></ul>
    48. 48. Next Steps <ul><li>Dealing with legacy code is hard </li></ul><ul><ul><li>Integration issues </li></ul></ul><ul><ul><li>Code Issues </li></ul></ul><ul><ul><li>Political Issues </li></ul></ul><ul><li>There are ways out </li></ul><ul><li>Important to address pain points first </li></ul>
    49. 49. Next Steps <ul><li>So where can you go from here? </li></ul><ul><ul><li>Working Effectively With Legacy Code by Michael Feathers </li></ul></ul><ul><ul><li>Agile Database Techniques by Scott Ambler </li></ul></ul><ul><ul><li>Refactoring Databases by Scott Ambler </li></ul></ul><ul><ul><li>http://www.agiledata.org </li></ul></ul><ul><ul><li>NUnit, JUnit, CppUnit, CppUnitLite, dbFit, Fitnesse </li></ul></ul><ul><ul><li>http://www.cornetdesign.com </li></ul></ul>

    ×