Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Metrics-Driven DevOps
Deliverying High Quality Software Like Facebook & Co
Andreas Grabner, @grabnerandi, blog.dynatrace.c...
https://dynatrace.github.io/ufo/
“In Your Face” Data!
Availability dropped to 0%
Metrics-Based Decisions
Update of Dependency Injection Library
impacts Memory & CPU
Object Churning impacting GC and eating CPU!
App with Regular
Load supported by
10 Containers
Twice the Load but 48
(=4.8x!) Containers!
App doesn’t scale!!
Does it re...
Infrastructure, Container, Cloud ... Metrics!!
#1: Where do
people come from?
#2: Who are
they?
Daily Deployments + Mkt Push
Increase # of unhappy users!
Drop in Conversion Rate
Overall increase of Users!
Satisfied Users Click more Content
Tolerating Users click less content
Frustrated Users mainly click on Support
AND MANY MORE
Vatican, 2005
Vatican, 2013
The Promise of
Confidential, Dynatrace, LLC
Boston Feb 2015!
After #3 of 7 Blizzards!
Total: 2.10m Snow!
Smart Roofs
Alert
BEFORE
It is too late!
What they
really want!
Confidential, Dynatrace, LLC
700 deployments / YEAR
10 + deployments / DAY
50 – 60 deployments / DAY
Every 11.6 SECONDS
Not only fast delivered but also delivering fast!
-1000ms +2%
Response Time Conversions
-1000ms +10%
+100ms -1%
Why most
(will) fail!
Confidential, Dynatrace, LLC
It‘s not about blindly giving everyone Ops power
to deploy changes only tested locally
It‘s not about blind automation of pushing more
bad code on new stacks through a pipeline
It‘s not about blindly adding new features on top
of existing withouth measuring its success
I
learning from
others
http://bit.ly/sharepurepath
282! Objects
on that page9.68MB Page Size
8.8s Page
Load Time
Most objects are images
delivered from your main
domain
Very...
Example #2: Online Sports Club Search Service
2015201420xx
Response Time
2016+
1) Started as a
small project
2) Slowly gro...
Early 2015: Monolithic App
Can‘t scale vertically endlessly!
2.68s Load Time
94.09% CPU
Bound
Proposal: Service approach!
Front End
to Cloud
Scale Backend
in Containers!
7:00 a.m.
Low Load and Service running
on minimum redundancy
12:00 p.m.
Scaled up service during peak load
with failover o...
Go live – 7:00 a.m.
Go live – 12:00 p.m.
What Went Wrong?
26.7s Load Time
5kB Payload
33! Service Calls
99kB - 3kB for each call!
171!Total SQL Count
Architecture Violation
Direct ...
The fixed end-to-end use case
“Re-architect” vs. “Migrate” to Service-Orientation
2.5s (vs 26.7)
5kB Payload
1! (vs 33!) S...
You measure it! from Dev (to) Ops
Build 17 testNewsAlert OK
testSearch OK
Build # Use Case Stat # API Calls # SQL Payload CPU
1 5 2kb 70ms
1 3 5kb 120ms
Use...
#1: Don’t Check In Bad Code
Step #1: Execute
your Tests just as
you always do ...
Step #2: ... but
CAPTURE Metrics!!
Step ...
#1: Analyzing every Unit,
Integration & REST API test
#2: Key Architectural
Metrics for each test
#3: Detecting regression...
#3: Monitor your Services/Users in Prod
#1: Usage
Tip: UEM Conversion!
#2: Load vs Response
Tip: See unusual spikes
#3: Ar...
#4: Metrics per Service in Ops
# SQLs per Search
# RESTs per Search
Spot bad Deployment?
Payload per Search
#1: Do my campaigns work?
#2: Who are my users?
#5: Understand your End Users
#6: Optimize End User Behavior#1: Are they using the
features we built?
#2: Is there a difference
between Premium and
Norm...
Dev&Test: Personal License
to Stop Bad Code when it
gets created!
Tip: Dont leave your IDE!
Continuous Integration: Auto-S...
12:00 a.m – 11:59 p.m.
Questions
Slides: slideshare.net/grabnerandi
Get Tools: bit.ly/dtpersonal
YouTube Tutorials: bit.ly/dttutorials
Contact Me...
Andreas Grabner
Dynatrace Developer Advocate
@grabnerandi
http://blog.dynatrace.com
Confidential, Dynatrace, LLC
Unicorns 2.0
Adam Auerbach
@bugman31
“All-in Agile: across the pipeline”
“We don’t log bugs, we fix them!”
“Measure Built-Into your Pip...
Technical Debt
Business Debt
Organizational Rust
Nita Awatramani
45% Apps Eliminated
60% DCs Consolidated
75% Virtualized
...
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Quality Software like Facebook & Co
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Quality Software like Facebook & Co
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Quality Software like Facebook & Co
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Quality Software like Facebook & Co
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Quality Software like Facebook & Co
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Quality Software like Facebook & Co
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Quality Software like Facebook & Co
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Quality Software like Facebook & Co
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Quality Software like Facebook & Co
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Quality Software like Facebook & Co
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Quality Software like Facebook & Co
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Quality Software like Facebook & Co
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Quality Software like Facebook & Co
Upcoming SlideShare
Loading in …5
×

Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Quality Software like Facebook & Co

155 views

Published on

Becoming the next Uber is only possible when bringing your ideas faster to your end users. Some aspects of DevOps are perfect for that as it only works if Ops and Dev work closely together. But what does this mean for you as a developers? Delivering code faster with the high chance of failing faster?

In my opinion we need to look at Key Technical Metrics such as Memory Usage per User or Request, # of SQLs, # of Service Calls, Transferred Bytes, ... - these are metrics you need to track starting at your workstation all the way through CI into Ops – and don’t forget the Business: How often is the new feature really used? What does it cost to run it? Let these metrics act as Quality Gateways and stop builds early before they Crash your System: faster than ever.

In this session we look at how companies like Facebook, CreditOne and Co apply metric-driven DevOps. We look at use cases that crashed rapid deployments, identify metrics that identify the reason of the crash and learn how to use these metrics to steer your pipeline to build better code, deploy faster, without failing faster!

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Quality Software like Facebook & Co

  1. 1. Metrics-Driven DevOps Deliverying High Quality Software Like Facebook & Co Andreas Grabner, @grabnerandi, blog.dynatrace.com, http://bit.ly/dtpersonal
  2. 2. https://dynatrace.github.io/ufo/ “In Your Face” Data!
  3. 3. Availability dropped to 0% Metrics-Based Decisions
  4. 4. Update of Dependency Injection Library impacts Memory & CPU
  5. 5. Object Churning impacting GC and eating CPU!
  6. 6. App with Regular Load supported by 10 Containers Twice the Load but 48 (=4.8x!) Containers! App doesn’t scale!! Does it really scale?
  7. 7. Infrastructure, Container, Cloud ... Metrics!!
  8. 8. #1: Where do people come from? #2: Who are they?
  9. 9. Daily Deployments + Mkt Push Increase # of unhappy users! Drop in Conversion Rate Overall increase of Users!
  10. 10. Satisfied Users Click more Content
  11. 11. Tolerating Users click less content
  12. 12. Frustrated Users mainly click on Support
  13. 13. AND MANY MORE
  14. 14. Vatican, 2005
  15. 15. Vatican, 2013
  16. 16. The Promise of Confidential, Dynatrace, LLC
  17. 17. Boston Feb 2015! After #3 of 7 Blizzards! Total: 2.10m Snow!
  18. 18. Smart Roofs Alert BEFORE It is too late!
  19. 19. What they really want! Confidential, Dynatrace, LLC
  20. 20. 700 deployments / YEAR 10 + deployments / DAY 50 – 60 deployments / DAY Every 11.6 SECONDS
  21. 21. Not only fast delivered but also delivering fast! -1000ms +2% Response Time Conversions -1000ms +10% +100ms -1%
  22. 22. Why most (will) fail! Confidential, Dynatrace, LLC
  23. 23. It‘s not about blindly giving everyone Ops power to deploy changes only tested locally
  24. 24. It‘s not about blind automation of pushing more bad code on new stacks through a pipeline
  25. 25. It‘s not about blindly adding new features on top of existing withouth measuring its success
  26. 26. I learning from others
  27. 27. http://bit.ly/sharepurepath
  28. 28. 282! Objects on that page9.68MB Page Size 8.8s Page Load Time Most objects are images delivered from your main domain Very long Connect time (1.8s) to your CDN „DevOps Deployment“ Example #1: Online Casino
  29. 29. Example #2: Online Sports Club Search Service 2015201420xx Response Time 2016+ 1) Started as a small project 2) Slowly growing user base 3) Expanding to new markets – 1st performance degradation! 4) Adding more markets – performance becomes a business impact Users 4) Potentially start loosing users
  30. 30. Early 2015: Monolithic App Can‘t scale vertically endlessly! 2.68s Load Time 94.09% CPU Bound
  31. 31. Proposal: Service approach! Front End to Cloud Scale Backend in Containers!
  32. 32. 7:00 a.m. Low Load and Service running on minimum redundancy 12:00 p.m. Scaled up service during peak load with failover of problematic node 7:00 p.m. Scaled down again to lower load and move to different geo location Testing the Backend Service alone scales well …
  33. 33. Go live – 7:00 a.m.
  34. 34. Go live – 12:00 p.m.
  35. 35. What Went Wrong?
  36. 36. 26.7s Load Time 5kB Payload 33! Service Calls 99kB - 3kB for each call! 171!Total SQL Count Architecture Violation Direct access to DB from frontend service Single search query end-to-end
  37. 37. The fixed end-to-end use case “Re-architect” vs. “Migrate” to Service-Orientation 2.5s (vs 26.7) 5kB Payload 1! (vs 33!) Service Call 5kB (vs 99) Payload! 3!(vs 177) Total SQL Count
  38. 38. You measure it! from Dev (to) Ops
  39. 39. Build 17 testNewsAlert OK testSearch OK Build # Use Case Stat # API Calls # SQL Payload CPU 1 5 2kb 70ms 1 3 5kb 120ms Use Case Tests and Monitors Service & App Metrics Build 26 testNewsAlert OK testSearch OK Build 25 testNewsAlert OK testSearch OK 1 4 1kb 60ms 34 171 104kb 550ms Ops #ServInst Usage RT 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 10kb 150ms 1 0.6% 4.2s 5 75% 2.5s Build 35 testNewsAlert - testSearch OK - - - - 2 3 10kb 150ms - - - 8 80% 2.0s Metrics from and for Dev(to)Ops Re-architecture into „Services“ + Performance Fixes Scenario: Monolithic App with 2 Key Features
  40. 40. #1: Don’t Check In Bad Code Step #1: Execute your Tests just as you always do ... Step #2: ... but CAPTURE Metrics!! Step #3: Verify Code works as intended – including your frameworks!
  41. 41. #1: Analyzing every Unit, Integration & REST API test #2: Key Architectural Metrics for each test #3: Detecting regression based on measure per Checkin #2: Stop Bad Builds in CI
  42. 42. #3: Monitor your Services/Users in Prod #1: Usage Tip: UEM Conversion! #2: Load vs Response Tip: See unusual spikes #3: Architectural Metrics DB, Exceptions, Web Service Calls
  43. 43. #4: Metrics per Service in Ops # SQLs per Search # RESTs per Search Spot bad Deployment? Payload per Search
  44. 44. #1: Do my campaigns work? #2: Who are my users? #5: Understand your End Users
  45. 45. #6: Optimize End User Behavior#1: Are they using the features we built? #2: Is there a difference between Premium and Normal users? #3: Does Performance have a Behavior Impact?
  46. 46. Dev&Test: Personal License to Stop Bad Code when it gets created! Tip: Dont leave your IDE! Continuous Integration: Auto-Stop Bad Builds based on AppMetrics from Unit-, Integration, - Perf Tests Tip: integrate with Jenkins, Bamboo ... Prod: Monitor Usage and Runtime Behavior per Service, User Action, Feature ... Tip: Stream to ELK, Splunk and Co ... Automated Tests: Identify Non-Functional Problems by looking at App Metrics Tip: Feed data back into your test tool! Build & Deliver Apps like the Unicorns! With a Metrics-Driven Pipeline!
  47. 47. 12:00 a.m – 11:59 p.m.
  48. 48. Questions Slides: slideshare.net/grabnerandi Get Tools: bit.ly/dtpersonal YouTube Tutorials: bit.ly/dttutorials Contact Me: agrabner@dynatrace.com Follow Me: @grabnerandi Read More: blog.dynatrace.com
  49. 49. Andreas Grabner Dynatrace Developer Advocate @grabnerandi http://blog.dynatrace.com
  50. 50. Confidential, Dynatrace, LLC Unicorns 2.0
  51. 51. Adam Auerbach @bugman31 “All-in Agile: across the pipeline” “We don’t log bugs, we fix them!” “Measure Built-Into your Pipeline” “All manual testers: automate!” LEARN MORE: READ DYNATRACE BLOG FROM VELOCITY 2015
  52. 52. Technical Debt Business Debt Organizational Rust Nita Awatramani 45% Apps Eliminated 60% DCs Consolidated 75% Virtualized 8M$ Ann. Costs Slashed 100% Verizon Agile LEARN MORE: PERFORM 2015 VIDEO + UPCOMING WEBINAR!

×