How MS Does Devops - DevOps Days Berlin 2018

DevOps at Microsoft
Data: Internal Microsoft engineering system activity, August 2018
372k
Pull Requests per
month
2m
Git commits per month
78,000Deployments per day
4.4m
Builds per month
500m
Test executions per day
500k
Work items updated
per day
5m
Work items viewed per
day
Azure DevOps is the toolchain of choice for Microsoft engineering with over 90,000 internal users
https://aka.ms/DevOpsAtMicrosoft

3,500
The Developer Division at Microsoft

800
The Azure DevOps team… spread out across 40 feature teams

3 weeks
Team Foundation Server (TFS)
Azure DevOps (formerly VSTS)

We are delivering value to customers and an
increased velocity.
• More features in the 2016 calendar year (262 features)…
• Than the previous 4 years combined (256 features).
• 364 features in the 2017 calendar year!
https://www.visualstudio.com/en-us/articles/news/features-timeline
22
58 65
111
249
364
97
0
50
100
150
200
250
300
350
400
450
2012 2013 2014 2015 2016 2017 2018
Features delivered per year

Sprint 1
Aug 2010
VSTS Preview
Sprint 29
Jun 2012
VSTS GA
Sprint 64
Apr 2014
1ES
Sprint 67
Jun 2014
GVFS
Sprint 102
Jun 2016
Sprint 136
Jun 2018

Planning M1 M2
Specs
We knew exactly what to build…
and we knew it was right!
Photo by Jose Antonio Gallego Vázquez on Unsp

Code Test & Stabilize
Code
Complete
We wrote all the code months before
we shipped.

Planning M1 M2
We had a perfect schedule and knew
exactly when it would be ready!

Planning
Customer feedback – we should
change the way a feature works. We
didn’t get it quite right…
… but we’re booked solid already.
M1

“Great feedback. Thanks! We’ll take a
look in planning for the next release. We
should get it to you….
in a few years.”

Culture eats strategy for breakfast.“ ”
Peter Drucker

Cross discipline
10-12 people
Self managing
Clear charter and goals
Intact for 12-18 months
Physical team rooms
Own features in production
Own deployment of features

Employee choice, not
manager driven
Typically <20%
change, but 100% get
to make a choice
Cross-pollinate talent
and micro-culture
Sticky Note Exercise - Self Forming Teams

We started off trying to set up a
small anarchist community, but
people wouldn't obey the rules.
“
”
Alan Bennett

Let’s try to give our teams three things….
Autonomy, Mastery, and Purpose.
Intrinsic
vs
extrinsic motivators
https://www.youtube.com/watch?v=u6XAPnuFjJc

Alignment
Autonomy
Autonomy
Alignment

A customer can have a car
painted any color he wants
as long as it’s black
“
”
Henry Ford
Autonomy

Week 1 Week 2 Week 3
Week 1 Week 2 Week 3Week 2 Week 3
Sprint 135Sprint 134 Sprint 136

S1 S2 S3 S4 S5 Stabilization S6
A
B
“Let’s do this Agile thing… but we should probably
reserve some time to stabilize things.”
Seemed Like a good idea at the time……
( famous last words)

Code Test & Stabilize Code Test & Stabilize
Code
Complete
Planning

engineers on
your team# 5 ?x =
We all follow a simple rule we call the “Bug Cap”:

We all follow a simple rule we call the “Bug Cap”:
Rule: If your bug count exceeds your bug cap… stop working
on new features until you’re back under the cap.
5 50x =10

What we track
Live Site Health/Debt
Time to Detect, Time To Mitigate
Incident prevention items
Aging live site problems
Customer support metrics (SLA, MPI, top
drivers)
Engineering Health/Debt
Bug cap per engineer
Aging bugs in important categories
Pass rate & coverage
Velocity
Time to build
Time to self test
Time to deploy
Time to learn (Telemetry pipe)
• Team burndown
• Team velocity
• Original estimate
• Completed hours
• Team capacity
• # of bugs found
Things we don’t watch
It is more about impact than activity

Week 1 Week 2 Week 3
Week 1 Week 2 Week 3Week 2 Week 3
Sprint 135Sprint 134 Sprint 136
At the end of a sprint, all teams send a “sprint mail” … communicating what they’ve
accomplished in the sprint, and what they’re planning to accomplish in the next sprint.

Value delivered
during the sprint
Video demonstrating
the value
What the team is
planning to accomplish
in the next sprint

6 month plan
Each team comes in and reviews with leadership three things:
1. What is the plan for the next 3-sprints?
2. Is the team healthy?
3. Any risks or issues to highlight?

• Storyboard of the customer
experience
• High level execution plan – sprints,
not hours
• Feedback, feedback, feedback

Dwight Eisenhower
Plans are worthless, but planning is
everything.
“
”

Strategy
12 months
Plan
3 sprints
3
Sprint
3 weeks
1
Season
6 months
6
Teams are responsible for the detail
Leadership is responsible
for the big picture

Strategy
Features
Stories
Tasks
Alignment
The big picture in light of our
business goals
Autonomy
The detail about what we’ll deliver
to achieve our business goals

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Strategy
FY18

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
6 month plan
FY18 H1
Strategy

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
6 month plan
Strategy

Strategy
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
6 month plan

Day in the
Life of an
Engineer
Photo by Goh Rhy Yan on Unsplash

Master
Week 3Week 2Week 1
Sprint Previous Sprint Next
175 commits/day
into Master
Release: Current Sprint x
Release: Sprint Previous x
https://aka.ms/releaseflow

Policies to keep master
branch healthy (green)
• Required reviewers
• Build must pass
• Security plugins
(opt-in) Run functional
tests in the cloud

Fast and reliable signals
All unit tests (L0/L1) run
in Pull Request

CI runs functional (L2) test
suites
Test reliability is actively
managed
Tests are trusted

Quality ownership
Photo by Sebastian Grochowicz on Unspla

Program Management Engineering

Program Management is responsible for:
WHAT we’re building, and
WHY we’re building it
Engineering is responsible for
HOW we’re building it, and that
we’re building it with QUALITY

Over 22 hours for nightly run and 2 days for the full run
Only ~60% of P0 runs passed 100%; Each NAR suite had many
failures
Test failure analysis was too costly
Took days to sift through failures before deployment could start

Tests should be written at the lowest level
possible
Write once, run anywhere including
production system
Product is designed for testability
Test code is product code, only reliable tests
survive
Testing infrastructure is a shared Service
Test ownership follows product ownership

Shared Platform Services (SPS)
North Central
TFS SU1
North Central
AT
AT
AT
JA
JA
JA
Blob
TFS SU7
Australia
TFS SU0
West Central
Containerized Services

• All code is deployed, but feature flags control exposure
– Reduces integration debt
• Flags provide runtime control down to individual user
• Users can be added or removed with no redeployment
• Mechanism for progressive experimentation & refinement
• Enables dark launch

Application Insights
Analytics (Project Kusto)
for
• text search and queries over
structured and semi-structured
data
• high volume ingestion
• fast queries over very large data
sets

•
•
•
•
•
•
•
•
•

 Double blind test
 Full disclosure at or near end
vs.
 Share tactics & lessons learned
 Continued evolution
Assume Breach - Use War Games to the learn attacks and practice response

3-week sprints
Vertical teams
Team rooms
Continual Planning & Learning
PM & Engineering
Continual customer engagement
Everyone in master
8-12 person teams
Publicly shared roadmap
Zero debt
Specs in PPT
Open source
Flattened organization hierarchy
User satisfaction determines success
Features shipped every sprint
4-6 month milestones
Horizontal teams
Personal offices
Long planning cycles
PM, Dev, Test
Yearly customer engagement
Feature branches
20+ person teams
Secret roadmap
Bug debt
100 page spec documents
Private repositories
Deep organizational hierarchy
Success is a measure of install numbers
Features shipped once a year

How MS Does Devops - DevOps Days Berlin 2018

How MS Does Devops - DevOps Days Berlin 2018

More Related Content

What's hot

Similar to How MS Does Devops - DevOps Days Berlin 2018

Recently uploaded

How MS Does Devops - DevOps Days Berlin 2018

Editor's Notes