Becoming the next Uber is only possible when bringing your ideas faster to your end users. Some aspects of DevOps are perfect for that as it only works if Ops and Dev work closely together. But what does this mean for you as a developers? Delivering code faster with the high chance of failing faster?
In my opinion we need to look at Key Technical Metrics such as Memory Usage per User or Request, # of SQLs, # of Service Calls, Transferred Bytes, ... - these are metrics you need to track starting at your workstation all the way through CI into Ops – and don’t forget the Business: How often is the new feature really used? What does it cost to run it? Let these metrics act as Quality Gateways and stop builds early before they Crash your System: faster than ever.
In this session we look at how companies like Facebook, CreditOne and Co apply metric-driven DevOps. We look at use cases that crashed rapid deployments, identify metrics that identify the reason of the crash and learn how to use these metrics to steer your pipeline to build better code, deploy faster, without failing faster!
40. 282! Objects
on that page9.68MB Page Size
8.8s Page
Load Time
Most objects are images
delivered from your main
domain
Very long Connect time
(1.8s) to your CDN
„DevOps Deployment“ Example #1: Online Casino
41. Example #2: Online Sports Club Search Service
2015201420xx
Response Time
2016+
1) Started as a
small project
2) Slowly growing
user base
3) Expanding to
new markets –
1st performance
degradation!
4) Adding more markets
– performance becomes
a business impact Users
4) Potentially start
loosing users
42. Early 2015: Monolithic App
Can‘t scale vertically endlessly!
2.68s Load Time
94.09% CPU
Bound
44. 7:00 a.m.
Low Load and Service running
on minimum redundancy
12:00 p.m.
Scaled up service during peak load
with failover of problematic node
7:00 p.m.
Scaled down again to lower load
and move to different geo location
Testing the Backend Service alone scales well …
48. 26.7s Load Time
5kB Payload
33! Service Calls
99kB - 3kB for each call!
171!Total SQL Count
Architecture Violation
Direct access to DB from frontend service
Single search query end-to-end
49. The fixed end-to-end use case
“Re-architect” vs. “Migrate” to Service-Orientation
2.5s (vs 26.7)
5kB Payload
1! (vs 33!) Service Call
5kB (vs 99) Payload!
3!(vs 177) Total
SQL Count
52. Build 17 testNewsAlert OK
testSearch OK
Build # Use Case Stat # API Calls # SQL Payload CPU
1 5 2kb 70ms
1 3 5kb 120ms
Use Case Tests and Monitors Service & App Metrics
Build 26 testNewsAlert OK
testSearch OK
Build 25 testNewsAlert OK
testSearch OK
1 4 1kb 60ms
34 171 104kb 550ms
Ops
#ServInst Usage RT
1 0.5% 7.2s
1 63% 5.2s
1 4 1kb 60ms
2 3 10kb 150ms
1 0.6% 4.2s
5 75% 2.5s
Build 35 testNewsAlert -
testSearch OK
- - - -
2 3 10kb 150ms
- - -
8 80% 2.0s
Metrics from and for Dev(to)Ops
Re-architecture into „Services“ + Performance Fixes
Scenario: Monolithic App with 2 Key Features
53. #1: Don’t Check In Bad Code
Step #1: Execute
your Tests just as
you always do ...
Step #2: ... but
CAPTURE Metrics!!
Step #3: Verify Code works as intended –
including your frameworks!
54. #1: Analyzing every Unit,
Integration & REST API test
#2: Key Architectural
Metrics for each test
#3: Detecting regression
based on measure per Checkin
#2: Stop Bad Builds in CI
55. #3: Monitor your Services/Users in Prod
#1: Usage
Tip: UEM Conversion!
#2: Load vs Response
Tip: See unusual spikes
#3: Architectural Metrics
DB, Exceptions, Web
Service Calls
56. #4: Metrics per Service in Ops
# SQLs per Search
# RESTs per Search
Spot bad Deployment?
Payload per Search
57. #1: Do my campaigns work?
#2: Who are my users?
#5: Understand your End Users
58. #6: Optimize End User Behavior#1: Are they using the
features we built?
#2: Is there a difference
between Premium and
Normal users?
#3: Does Performance have
a Behavior Impact?
59. Dev&Test: Personal License
to Stop Bad Code when it
gets created!
Tip: Dont leave your IDE!
Continuous Integration: Auto-Stop Bad Builds based
on AppMetrics from Unit-, Integration, - Perf Tests
Tip: integrate with Jenkins, Bamboo ...
Prod: Monitor Usage and Runtime
Behavior per Service, User Action,
Feature ...
Tip: Stream to ELK, Splunk and Co ...
Automated Tests: Identify Non-Functional
Problems by looking at App Metrics
Tip: Feed data back into your test tool!
Build & Deliver Apps like the Unicorns!
With a Metrics-Driven Pipeline!
64. Adam Auerbach
@bugman31
“All-in Agile: across the pipeline”
“We don’t log bugs, we fix them!”
“Measure Built-Into your Pipeline”
“All manual testers: automate!”
LEARN MORE: READ DYNATRACE BLOG FROM VELOCITY 2015
Synthetic Availability Monitoring -> Clearly something went wrong
Even if the deployment seemed good because all features work and response time is the same as before. If your resource consumption goes up like this the deployment is NOT GOOD. As you are now paying a lot of money for that extra compute power
Monitor your end users after you deployed something
Monitoring user experience and impact on conversion rate
Another cool example of conversion rate compared to technical metrics
Yes – I am working for a tool vendor – BUT – you can try this with most of the tools in the APM, Tracing, Diagnostics space out there.
We all know that since then the world got divided into two big parties. So – just out of curiosity: who is using Apple vs Android? And who is using Windows?
The phones not only changed the way we communicate – it disrupted many other technologies as well. Look at this! Pope Election in 2005.
A SINGLE hand up with an old phone taking a picture
In 2013 the „picture“ is totally different. The smart phones and tablets not only changing the way we communicate but also disrupting many other things such as taking pictures, getting light when you need one (torchlight), measuring your steps, ...
So – just to see the difference again in one picture!
Why are people so crazy about DevOps?
Because new innovative companies are disrupting existing markets by delivering better services faster
Annd IoT will show us a lot of new markets and we will see a lot of new companies that deliver new services for new use cases
This is Boston February 2015!
We know that everyone these days can build a globally successful business with just an idea and a laptop. Such as Mark Z or the two tech brothers (12 & 14) founders of GoDimensions. They develop 12 Apps so far with more than 35k downloads.
Their idol? Steve Jobs!
Just to get the basics covered: I hope everyone has their own definition of DevOps by now. It is a lot of things depending on whom to list to our which blogs you follow.
In case you are a “DevOps Virgin” I definitely recommend checking out The Phoenix Project (the DevOps Bible) and Continuous Delivery (which is what we actually all want to achieve): Deliverying software faster with great quality and without all potential mistakes that a manual and rigid process brings with it
They really follow the stories of the first generation Unicorn Companies
Several companies changed their way they develop and deploy software over the years. Here are some examples (numbers from 2011 – 2014)
Cars: from 2 deployments to 700
Flicks: 10+ per Day
Etsy: lets every new employee on their first day of employment make a code change and push it through the pipeline in production: THAT’S the right approach towards required culture change
Amazon: every 11.6s
Remember: these are very small changes – which is also a key goal of continuous delivery. The smaller the change the easier it is to deploy, the less risk it has, the easier it is to test and the easier is it to take it out in case it has a problem.
But it is not only about delivering features faster – it is also about delivering fast features!
These stats come from here: http://nft.atcyber.com/infographics/infographic-the-importance-of-web-performance-20140913
But don’t make the mistake to blindly follow every unicorn out there
Taken from http://www.hostingadvice.com/blog/cloud-66-devops-as-a-service/
Its not about giving Devs Direct Access to Ops Deployments
Because this is what might happen:
If „Being DevOps“ just means you just increase the number of deployments then you are bound to fail. Here is an example of a bad web application. When deploying this more frequently you will end up in more war rooms
They had a monolithic app that couldnt scale endlessly. Their popularity caused them to think about re-architecture and allowing developers to make faster changes to their code. The were moving towards a Service Approach
Separating frontend logic from backend (search service). The idea was to also host these services potentially in the public cloud (frontend) and in a dynamic virtual enviornment (backend) to be able to scale better globally
The Backend Search Service Team did a lot of testing on their backend services. Scaling up and down on demand. All looked pretty good! They gave it a Thumbs Up!
On Go Live Date with the new architecture everything looked good at 7AM where not many folks were yet online!
By noon – when the real traffic started to come in the picture was completely different. User Experience across the globe was bad. Response Time jumped from 2.5 to 25s and bounce rate trippled from 20% to 60%
The backend service itself was well tested. The problem was that they never looked at what happens under load „end-to-end“. Turned out that the frontend had direct access to the database to execute the initial query when somebody executed a search. The returned list of search result IDs was then iterated over in a loop. For every element a „Micro“ Service call was made to the backend which resulted in 33! Service Invokations for this particular use case where the search result returned 33 items. Lots of wasted traffic and resources as these Key Architectural Metrics show us
They fixed the problem by understanding the end-to-end use cases and then defined backend service APIs that provided the data they really needed by the frontend. This reduced roundtrips, elimiated the architectural regression and improved performance and scalability
Lessons Learned!
If we monitor these key metrics in dev and in ops we can make much better decisions on which builds to deploy
We immediately detect bad changes and fix them. We will stop builds from making it into Production in case these metrics tell us that something is wrong.
We can also take features out that nobody uses if we have usage insights for our services. Like in this case we monitor % of Visitors using a certain feature. If a feature is never used – even when we spent time to improve performance – it is about time to take this feature out. This removes code that nobody needs and therefore reduces technical debt: less code to maintain – less tests to maintain – less bugs in the system!
And this is how it looks like with Dynatrace AppMon Test Automation Feature. We automatically montior every single test execution in your CI and analyze these metrics per Test and per Build. We automatically detect regressions as every metrics per Test will be baselined. This allows us to STOP A BUILD before it moves to other phases in the pipeline
In Production we monitor the same metrics for our services. Seeing if a recent deployment had any change in # of SQL calls for a particular feature or the # of internal Service Calls. Helps us to make sure that we do not make bad deployments – or at least be aware of it right away to take countermeasures, e.g: rollback or fix
With that we can make our users happy 24/7 – at any load
Good news is that we have stories about the next generation Unicorns – Make sure you learn about how these companies transferred there business
Read the full story on my blog: http://apmblog.dynatrace.com/2015/05/29/velocity-2015-highlights-from-last-day/
Adam Auerbach from Capital One at Star West 2015 and Velocity 2015
Not a small software startup -> but an established financial company that transformed the way they develop and especially „Continuously Test“ Software!
Watch the full Verizon story from our PERFORM 2015 Recordings
There is also going to be webinar in June where Nita goes into more details