Managing the the Technical Debt lifecycle. In this presentation we explore the evolution of the metaphor, the value it brings to organizations and challenges to successful adoption.
The full audio and video can be viewed at http://blog.acrowire.com/td-webinar.
1. Webinar:
Managing Technical Debt
Audio and video of this presentation are available at the link below
http://blog.acrowire.com/td-webinar
1
2. Ted Theodoropoulos
President
Acrowire
ted@acrowire.com
Michael Milutis
Director of Marketing
Computer Aid, Inc. (CAI)
Michael_milutis@compaid.com
2
3. Ted Theodoropoulos
President of Acrowire Technology Consulting
•Application Development
•Business Process Improvement
•ALM/Tech debt assessments
Programming since 1982
•TI-99/4a using BASIC
Microsoft SQL Server Team
10 years at Bank of America
•Development Team Manager
•IT Auditor
•Senior VP in Operational Risk
Undergrad in Mathematics & MBA from UNC
Six Sigma Black Belt/CSM/MCP
3
4. PDU CREDITS
FOR THIS WEBINAR
The Project Management Institute
has accredited this webinar with PDUs
4
7. 1. Introduction
Outline
1.Introduction
2.What is technical debt?
3.Opportunities and challenges?
4.Business impacts
5.Foursquare case study
6.Managing the lifecycle
7
9. 2. What is technical debt?
Evolution
Ward Cunningham
Invented the wiki in 1994
Coined the term at OOPSLA in 1992
Technical Debt includes those internal things that you choose not to do
now, but which will impede future development if left undone. This
includes deferred refactoring.
Technical Debt doesn't include deferred functionality, except possibly in
edge cases where delivered functionality is "good enough" for the
customer, but doesn't satisfy some standard (e.g., a UI element that isn't
fully compliant with some UI standard).
9
10. 2. What is technical debt?
Evolution
Jeff Sutherland
Cofounder of Scrum
Opined at Scrum Gathering in 2006
Described the following technical debt scenarios:
• The code is considered part of a core legacy system, in which its functionality is
connected to so many other parts of the system that it’s impossible to isolate any
one component.
• There is either no testing or minimal testing surrounding the code. Although it
may sound redundant, it is necessary to point out that without comprehensive
unit tests, it is impossible to refactor the code to a more manageable state.
• There is highly compartmentalized knowledge regarding the core/legacy system,
supported by only one or two people in the company.
10
11. 2. What is technical debt?
Evolution
Steve McConnell
Author/Software Engineer
Proposed First Taxonomy in 2007
I. Debt incurred unintentionally due to low quality work
II. Debt incurred intentionally
II.A. Short-term debt, usually incurred reactively, for tactical reasons
II.A.1. Individually identifiable shortcuts (like a car loan)
II.A.2. Numerous tiny shortcuts (like credit card debt)
II.B. Long-term debt, usually incurred proactively, for strategic reasons
11
12. 2. What is technical debt?
Evolution
Martin Fowler
Author/Software Engineer
Established TD Quadrants in 2009
12
13. 2. What is technical debt?
Evolution
Gartner, Inc
IT Research and Advisory
Estimated $500 Billion of “IT Debt” in 2010
Gartner Estimates Global 'IT Debt' to Be $500 Billion This Year, with
Potential to Grow to $1 Trillion by 2015
"The issue is not just that maintenance keeps on getting deferred, it is that
the lack of an application inventory and the absence of a structured review
process for the application portfolio. This means the IT management team
is simply never aware of the true scale of the problem”
13
14. 2. What is technical debt?
Current
Ted Theodoropoulos
Technical Debt Practitioner
Proposed “Stakeholder Perspective” at SEI in 2011
“Technical debt is any gap within the
technology infrastructure, or its
implementation, which has a material impact
on the required level of quality.”
14
15. 2. What is technical debt?
Stakeholder Perspective
Business Development
Executives Team
Internal
Risk Infrastructure
Managers Team
Technical Environment
Board of Directors Internal
Auditors
External
Customers Analysts Shareholders Regulators
External
Auditors
Stakeholders need better transparency and engagement around
issues affecting quality in the technical environment.
15
16. 2. What is technical debt?
Quality Requirements
• Gaps impacting required levels of quality represent technical debt
• Teams can “borrow” against the ideal solution to speed initial delivery
• Interest is then paid in the form of lower productivity and/or incremental risk
• Maintenance and enhancement activities become more onerous and expensive
• Interest compounds as workarounds are applied on top of workarounds
16
17. 2. What is technical debt?
Deficits and Surpluses
• Yellow section shows gap in maintainability on which interest is paid
• Conversely blue represents unneeded functionality that must be maintained
• Deficits and surpluses in application quality cost the organization money
• Ideally green area would fill area within dashed line
17
20. 3. Opportunities and Challenges
Opportunities
Prioritization
• Business leaders always want to build new stuff
• Quantifying gaps in dollars levels the playing field
• Getting the business to recognize the value of refactoring is difficult
• New initiatives can be prioritized based ROI against debt reduction
20
21. 3. Opportunities and Challenges
Opportunities
Transparency
Know what is beneath the surface!
21
22. 3. Opportunities and Challenges
Opportunities
Risk Management
Know what is beneath the surface!
22
24. 3. Opportunities and Challenges
Challenges
Concept Fragmentation
Vendor Support Debt
Quality Debt P airing D e b t
Documentation Debt
Configuration Management Debt
Testing Debt
Legacy Debt Access Control Debt
SEO Debt
Platform Experience Debt
Refactoring Debt
Data Quality Debt
Design Debt
“Cruft is technical debt!” “Cruft isn’t technical debt!”
-Ted Theodoropoulos -Uncle Bob Martin
24
25. 3. Opportunities and Challenges
Challenges
Unknown Future State
• No standards organization currently manages to concept
• Uncertainty around what technical debt is headed
• Adoption will be hampered by this uncertainty
• SEI is leading efforts to move the concept forward
25
27. 4. Business Impacts
Platform Stability
• Technical debt is often fragile or difficult to maintain code
• Has a destabilizing effect on production systems
• This type of technical debt decreases agility and increases
defects
• Increases risk of production issues with customer impact
• Decreases ability to seize market opportunities
• Increases fire drills which impacts morale
• Lower employee satisfaction makes talent retention challenging
27
28. 4. Business Impacts
Cost of Change
• Technical debt typically compounds over time
• This phenomena increases CoC exponentially
• Customer responsiveness is inversely proportional to CoC
28
29. 4. Business Impacts
Technical Bankruptcy
• Unabated technical debt leads to ballooning interest
payments
• Over time the interest payments become all consuming
• First there are no resources available for enhancements
• Then interest payments exceed the available resources
• This is known as technical bankruptcy
29
31. 5. Foursquare Case Study
Background
• In Spring 2011, Amazon had a major outage in AWS
• Multiple availability zones (AZs) were impacted
• While the outage was disappointing it did not violate the SLA
• As Gartner points out below there were no SLAs for impacted services
Amazon’s SLA for EC2 is 99.95% for multi-AZ deployments. That means that
you should expect that you can have about 4.5 hours of total region
downtime each year without Amazon violating their SLA. Note, by the way,
that this outage does not actually violate their SLA. Their SLA defines
unavailability as a lack of external connectivity to EC2 instances, coupled
with the inability to provision working instances. In this case, EC2 was just
fine by that definition. It was EBS and RDS which weren’t, and neither of
those services have SLAs.
31
32. 5. Foursquare Case Study
Architecture
• Amazon is an infrastructure as a service (IaaS) provider
• IaaS consumers can design applications as they see fit
• Individual requirements dictate architecture
• If an app requires HA then it must be accommodated in the design
• Failing to satisfy requirements introduces risk into the environment
• Foursquare replicated across AZs instead of across data centers
• Best practices for HA were not followed
32
33. 5. Foursquare Case Study
Technical Debt
• Implementing full redundancy is not cheap
• Startup capital is a scarce resource and must be used wisely
• Replicating across AZs was cheaper than across data centers
• This architecture created a requirements gap which represents debt
• The principal of the technical debt is the cost to provide full HA
• The interest takes the form of the incremental risk
33
34. 5. Foursquare Case Study
Debt Calculation
• Based on optimal design risk of an event is 0.5%
• Design shortcuts increased risk to 4%
• Incremental risk associated with design is 3.5%
• If outage occurs, damage to brand and investor confidence
• Additionally, there will be lost users and market share
• The estimated cost of such an event is $1M
Incremental Risk: 4%-0.5% = 3.5%
Cost of Failure: $1,000,000
Interest: $35,000
34
35. 5. Foursquare Case Study
Prudent Debt
• Technical debt can be leveraged responsibly just like financial debt
• Assume the appropriate design cost add’l $100K to implement
• That relatively large investment would eliminate $35K in risk
• Such an investment would provide a 35% ROI
• Each dollar invested would give $0.35 back to the business
• Currently, paying off debt might be a questionable use of capital
Principal: $100,000
Interest: $35,000
Return on Investment: 35%
35
36. 5. Foursquare Case Study
Imprudent Debt
• Sometimes the risk/reward equation is out of balance
• Assume the appropriate design cost add’l $5K to implement
• That relatively small investment would eliminate same $35K in risk
• Such an investment would provide a 700% ROI
• Each dollar invested would give $7 back to the business
• Currently, paying off debt would be a wise use of capital
Principal: $5,000
Interest: $35,000
Return on Investment: 700%
36
37. 5. Foursquare Case Study
Initial Focus
Am a z o n S e rve r
T r o u b le s T a k e d o w n
“Massive failure at Amazon Web Services causes havoc…” R e d d i t, F o u rs q u a re
-GeekWire & H o o ts u i te
-M ashable
Am azon E C 2 O u tage H ob b le s We b s ite s
-Information Week
“Amazon Server Outage Blanks
Popular Websites”
-Fox News
“Amazon’s Web Services outage: End of cloud innocence?”
-ZDNet
Am azon M alfu nction R ais e s D ou b ts Ab ou t C lou d C om p u ting
-NY Times
37
38. 5. Foursquare Case Study
Retrospective Focus
"In short, if your systems failed in the Amazon cloud this week, F ailing to p lan is
it wasn't Amazon's fault,“ p lanning to fail
-O’Reilly Media -A crowire
“The AWS story
shows how We designed for failure from day one. Any of our instances, or
any group of instances in an AZ, can be “shot in the head” and
important it is to our system will recover.
think about -SmugMug
engineering when
you're designing
systemswere some websites impacted while others were not? For Netflix, the
Why for the
cloud.“ answer is that our systems are designed explicitly for these sorts of
short
failures.
-DataPipe -Netflix
“Lessons from a cloud failure: It’s not Amazon, it’s YOU!
-Webmonkey
38
40. 6. Manage the Lifecycle
Lifecycle Phases
Technical
Debt
40
41. 6. Manage the Lifecycle
Define
• Define what qualifies as technical debt in your organization
• Think through the implications of the defined boundaries
• Process must be collaborative and not done in a vacuum
• Will key stakeholders (i.e. audit, risk mgmt, IT) buy into it?
Business Development
Executives Team
Internal
Risk Infrastructure
Managers Team
Technical Environment
Board of Directors Internal
Auditors
External
Customers Analysts Shareholders Regulators
External
Auditors
41
42. 6. Manage the Lifecycle
Define
Framework Alignment
42
43. 6. Manage the Lifecycle
Identify
Signs you might have it…
• Don’t we have documentation on the file layouts?
• I thought we had a test for that!
• If I change X it is going to break Y….I think.
• Don’t touch that code. The last time we did it took weeks to fix.
• The server is down. Where are the backups?
• Where is the email about that bug?
• We can’t upgrade. No one understands the code.
43
45. 6. Manage the Lifecycle
Measure
Calculating Principal
n = Number of resources required
R = Rate (hourly average) of resource
H = Hours required
C = Costs associated with benefits, payroll, recruitment (usually ~40% of hourly rate)
HC = Hardware Costs
SL = Software Licenses
MI = Migration and Implementation expenses (e.g. consulting engagements, training, etc)
45
46. 6. Manage the Lifecycle
Remediate
Prioritization
ROI
• Refactoring initiatives can be evaluated
• Quantifying gaps in dollars levels the playing field
• Getting the business to recognize the value of refactoring is difficult
• New initiatives can be prioritized based ROI against debt reduction
46
47. 6. Manage the Lifecycle
Govern
Capital Structure
• Evaluate free cash flow volatility over time
• Determine appropriate technical debt to equity ratio
• Monitor your technical balance sheet diligently
• Establish centralized debt registration database
• Implement credit limits for high risk areas of the infrastructure
• Foster a risk management culture within the organization
47
49. Ted Theodoropoulos
President
Acrowire
ted@acrowire.com
Michael Milutis
Director of Marketing
Computer Aid, Inc. (CAI)
Michael_milutis@compaid.com
49
Editor's Notes
The web browser didn’t become popular until 1994 so inventing the wiki then is impressive