6. 1. Agile Infrastructure: Patrick Debois
(@patrickdebois) and Andrew Shafer
(@littelidea)
2. Velocity 2009: “10+ Deploys Per Day: Dev and
Ops Cooperation” by John Allspaw (@allspaw)
3. Lean startup
Roots of DevOps
6
http://itrevolution.com/the-convergence-of-devops/, by John Willis (@botchagalupe)
21. A key performance indicator (KPI)
is a business metric used to
evaluate factors that are crucial to
the success of an organization.
Definition of KPI
21
22. We sought out a single indicator that closely approximated our most
important activity: viewing. We discovered that a server-side metric
related to playback starts (the act of “clicking play”) had both a
predictable pattern and fluctuated significantly when UI/device/server
problems were happening. The Netflix streaming pulse was created.
The Pulse of Netflix
22
http://techblog.netflix.com/2015/02/sps-pulse-of-netflix-streaming.html
We named it “SPS” for “starts per second”.
25. What’s so special about SPS?
25
• SPS is easy to understand by all stakeholders
• One metric that covers different point of failure: server
problems, device problems, etc.
• Most important: it’s a clear KPI that indicates when user
experience is compromised
26. Who in your organization defines
KPIs?
26
Product
Sales Marketing
What generates value
What generates
revenue
What generates
traction
28. Analytics GA: A True Story
28
• BigPanda Analytics: gain insights into your alerts data
• Beta feature up until recently: feature toggled for selected
customers
• Built on top of ElasticSearch
29. 2 Months Ago….
29
Marketing & Product: we want to PR analytics around the
end of September.
Engineering: ok. makes sense. but we need to run some tests
first. Let’s talk about KPIs.
30. The Problem
30
Engineering: query latency looks A-OK. But… we have a bit
of a problem..
Apparently a high number of shards per node
compromises cluster stability
31. Possible Solutions
31
• Gradually add customers instead of enabling to all customers:
Problem: sales wanted to immediately enable the feature to a long list of customer to increase
engagement
Problem: no control over customer sign-ups
• Add ElasticSearch nodes to the cluster when needed
Problem: more nodes = more money, more complexity
• Change our index configuration to be create less shards
Problem: affects SLA (replication and performance)
33. What was the Solution?
33
2 tiers of SLA according to customer’s plan
Sales
Product
&
Marketing
Ops
Feature launch
is on time
No lost opportunities No midnight PD
34. • Most of our effort today is spent on closing the gap between dev and
ops
• DevOps is not just about that gap - it’s about the business as a whole
• Whether we like it or not, it’s up to the business units to decide
whether the business is successful or not
• Conclusion: working closer to them is imperative and eventually very
gratifying
TL;DR
34