Retiring Service Interfaces: A Retrospective on Two 10+ Year Old Services

JEFF
NICKOLOFF
All in Geek Consulting
Retiring Service
Interfaces: A
Retrospective on
Two 10+ Year Old
Services
@allingeek
jeff@allingeek.com

Who am I?
• Former Amazonian
• Author of Docker in Action
• Independent Software Engineer
• Blogger
• Containerization and AWS consulting

Why should I care? 
(about service retirement)
• There are a surprising number of ways to change a
service interface
• Most of those changes will require some retirement
campaign

Background and
Declarations
Microservices enable implementation iteration in isolation
while hindering interface iteration.
Amazon runs an amazing number of microservices.
My former team of eight owned hundreds of services.
Amazon has amazing tooling for service owners 
(but it wont save you).
What follows is an account of a process that surely happens
all the time and everywhere that different people’s code
establish runtime service dependencies.

A Long Story Short
• We retired two 10+ year old service interfaces each
having hundreds of unique consumers
• We did so in about 7 months
• We did not go out of business
• Other similar deprecation campaigns had been
attempted within the five years prior

What and Why
Two services that predated all of AWS:
• Ol’ McCruftyface suffered from non-sensical UX, an
RPC style interface, mutable entities, overly broad
ID space, “variadic” function definitions, and no
authentication.
• Blobby Cleartext used fake crypto, weak (never
rotated) keys, crappy write-through caching, file re-
streaming, and no authentication.

The Plan
• Identify clients that will need to migrate
• Identify active use-cases
• Document migration paths and timeline
• Open communication with clients and present plan
• Follow up regularly and monitor migration efforts
• Increase migration pressure
• Shut it down… eventually

Client Discovery
• Amazon has one package manager to rule them all
• Service dependencies are modeled
• Dependent packages have known owners
• Amazon has strong hardware ownership… Inspect
our service logs for IP addresses
• Additional challenges? 
Mixed ownership, old software, and infrequent usage

Analysis: Client
Composition
0
25
50
75
100
Identiﬁed Partial Unknown
0
25
50
75
100
> 10 yrs > 5 yrs > 1 yr < 1 yr
% Identiﬁable % Grouped by Age

Wait…
“Why didn’t you just make all of the client changes yourself?”
Hundreds of clients, unique release cycles, repositories,
permissions issues, and politics.

Migration Path and
Assistance
• Analyze existing API and usage patterns from logs
• Document an internal client migration
• Prepare a migration matrix (before and after
mapping for discovered use-cases)
• Organize a migration assistance on-call rotation
• Establish lightweight procedures for assistance

Timeline and
Communication
• Make your sales pitch (carrots)
• Provide a complete but concise explanation,
documentation, assistance options, timeline, and
milestones
• Establish clear rules for regular communication 
(frequency, medium, heartbeat, etc.)
• Highlight escalation paths and consequences of
compliance failure (sticks)

Carrots
• Better availability
• Clearer API UX
• Enhanced features
• Immutability
• Tighter latency guarantees
• Real data protection

“forgiveness > permission” 
— Larry Wall

Sticks and Secret Sticks
• Secretly reducing service redundancy and
throughput capacity (make the services worse)
• Failure to communicate - pageable
• Missed milestones - pageable
You’re supposed to own these clients, so own them.

Following Up… with Sticks
• Half your customers will comply quickly <3
• A quarter will talk to you and never act 
(missing milestones - page ‘em)
• About a quarter will ignore you  
(until later - page ‘em regularly)
• Some small percentage will never respond at all 
(wait for the hammer)

More Client Discovery…
Sticks
• We identified that some clients were not identifiable
• Fell back to weekly advertisements in org-wide
meetings
• Unscheduled, inconvenient scream tests
Scream tests have to be painful or they’ll be ignored.

Prioritizing Empathy
1. I acknowledge that the short term gains of ignoring
service debt are tempting.
2. I also acknowledge that your team’s needs are
important.
3. However I propose that massive risks like this one
are more important, and that our customers share my
opinion.
4. I’m always going to prioritize customer needs over
your comfort.

A Hammer
I could always just turn them off, release the
hardware, and delete the service definitions.
“… but you can’t do that.”
“Are you sure?”

A Few Anecdotes
• Muddled ownership problems
• Reorg problems
• Unowned clients
• “This person just doesn’t want to work” problems
• “The painful test is painful” problems

Swinging the Hammer
It’s 10:32 am PST, we’ve only got a trickle of traffic,
the remaining known clients are unresponsive for
months, we’ve extended the shutdown date a month,
we’re wearing “deal with it” sunglasses inside.
Hit it.

Dousing the Final Fires
• Swinging the hammer will light a few fires. They
may take a few days for people to notice.
• Deal with them and resist all urges to turn the
service back on. Salt the earth where it stood.
Say, “This is your life now.”

Lessons Learned 
… or suspicions confirmed
• Service adoption (really all dependency) is debt
• Don’t transfer ownership of “complete” services
• Services that “just work” suffer the most drastic
knowledge rot
• Greater success will bring greater pain

Things I’d do again…
• Structured planning and communication
• Provide strong positive incentives
• Use operational pain as leverage
• Scream tests were very successful
• Swing the hammer

Things I’d do Next Time
• Improve communication consistency
• Increase awareness of scream test risk 
… but not of individual tests
• Escalate more quickly

JEFF
NICKOLOFF
All in Geek Consulting
Questions about
services or
Docker? Come
talk in the hall!
@allingeek
jeff@allingeek.com

Retiring Service Interfaces: A Retrospective on Two 10+ Year Old Services

Recommended

Recommended

More Related Content

Similar to Retiring Service Interfaces: A Retrospective on Two 10+ Year Old Services

Similar to Retiring Service Interfaces: A Retrospective on Two 10+ Year Old Services (20)

Recently uploaded

Recently uploaded (20)

Retiring Service Interfaces: A Retrospective on Two 10+ Year Old Services