Continuous Delivery only works if you combine automation with automatic metrics driven quality gates focusing on architectural, scalabilty and performance metrics.
In this presentation I start with several dashboard examples explaining key metrics in production and explain how to automate these metrics into your delivery pipeline.
41. 282! Objects
on that page9.68MB Page Size
8.8s Page
Load Time
Most objects are images
delivered from your main
domain
Very long Connect time
(1.8s) to your CDN
„DevOps Deployment“ Example #1: Online Casino
42. Example #2: Online Sports Club Search Service
2015201420xx
Response Time
2016+
1) Started as a
small project
2) Slowly growing
user base
3) Expanding to
new markets –
1st performance
degradation!
4) Adding more markets
– performance becomes
a business impact Users
4) Potentially start
loosing users
43. Early 2015: Monolithic App
Can‘t scale vertically endlessly!
2.68s Load Time
94.09% CPU
Bound
45. 7:00 a.m.
Low Load and Service running
on minimum redundancy
12:00 p.m.
Scaled up service during peak load
with failover of problematic node
7:00 p.m.
Scaled down again to lower load
and move to different geo location
Testing the Backend Service alone scales well …
49. 26.7s Load Time
5kB Payload
33! Service Calls
99kB - 3kB for each call!
171!Total SQL Count
Architecture Violation
Direct access to DB from frontend service
Single search query end-to-end
50. The fixed end-to-end use case
“Re-architect” vs. “Migrate” to Service-Orientation
2.5s (vs 26.7)
5kB Payload
1! (vs 33!) Service Call
5kB (vs 99) Payload!
3!(vs 177) Total
SQL Count
53. Build 17 testNewsAlert OK
testSearch OK
Build # Use Case Stat # API Calls # SQL Payload CPU
1 5 2kb 70ms
1 3 5kb 120ms
Use Case Tests and Monitors Service & App Metrics
Build 26 testNewsAlert OK
testSearch OK
Build 25 testNewsAlert OK
testSearch OK
1 4 1kb 60ms
34 171 104kb 550ms
Ops
#ServInst Usage RT
1 0.5% 7.2s
1 63% 5.2s
1 4 1kb 60ms
2 3 10kb 150ms
1 0.6% 4.2s
5 75% 2.5s
Build 35 testNewsAlert -
testSearch OK
- - - -
2 3 10kb 150ms
- - -
8 80% 2.0s
Metrics from and for Dev(to)Ops
Re-architecture into „Services“ + Performance Fixes
Scenario: Monolithic App with 2 Key Features
54. #1: Don’t Check In Bad Code
Step #1: Execute
your Tests just as
you always do ...
Step #2: ... but
CAPTURE Metrics!!
Step #3: Verify Code works as intended –
including your frameworks!
55. #1: Analyzing every Unit,
Integration & REST API test
#2: Key Architectural
Metrics for each test
#3: Detecting regression
based on measure per Checkin
#2: Stop Bad Builds in CI
56. #3: Monitor your Services/Users in Prod
#1: Usage
Tip: UEM Conversion!
#2: Load vs Response
Tip: See unusual spikes
#3: Architectural Metrics
DB, Exceptions, Web
Service Calls
57. #4: Metrics per Service in Ops
# SQLs per Search
# RESTs per Search
Spot bad Deployment?
Payload per Search
58. #1: Do my campaigns work?
#2: Who are my users?
#5: Understand your End Users
59. #6: Optimize End User Behavior#1: Are they using the
features we built?
#2: Is there a difference
between Premium and
Normal users?
#3: Does Performance have
a Behavior Impact?
60. Dev&Test: Personal License
to Stop Bad Code when it
gets created!
Tip: Dont leave your IDE!
Continuous Integration: Auto-Stop Bad Builds based
on AppMetrics from Unit-, Integration, - Perf Tests
Tip: integrate with Jenkins, Bamboo ...
Prod: Monitor Usage and Runtime
Behavior per Service, User Action,
Feature ...
Tip: Stream to ELK, Splunk and Co ...
Automated Tests: Identify Non-Functional
Problems by looking at App Metrics
Tip: Feed data back into your test tool!
Build & Deliver Apps like the Unicorns!
With a Metrics-Driven Pipeline!
65. Adam Auerbach
@bugman31
“All-in Agile: across the pipeline”
“We don’t log bugs, we fix them!”
“Measure Built-Into your Pipeline”
“All manual testers: automate!”
LEARN MORE: READ DYNATRACE BLOG FROM VELOCITY 2015
Get your own Dynatrace Personal License @ http://bit.ly/dtpersonal
Dynatrace Personal is the full Dynatrace AppMon Product. After 30 Days Trial Period it automatically converts to FREE FOR LIFE for your Local Workstation Apps
If you want to learn more check out my YouTube Tutorial Channel @ http://bit.ly/dttutorials
I love metrics! And I think we need to make metrics-based decisions. There are different types of metrics and different visualizations
This is what you see when you walk in our engineering labs. Did I break the build?
The most basic metric for everyone operating software. Did my last deployment break anything? Is the software still available from those locations where my users are accessing the software? Use Synthetic Monitoring: http://www.dynatrace.com/en/synthetic-monitoring/
Even if the deployment seemed good because all features work and response time is the same as before. If your resource consumption goes up like this the deployment is NOT GOOD. As you are now paying a lot of money for that extra compute power
Screenshot from Dynatrace AppMon
Another example. Deployment changed the memory allocation behavior of the app -> high object churn rate results in high GC -> results in high CPU!!
Screenshot from Dynatrace AppMon
If you test for scalability make sure the application scales „linear“ – or at least as linear as possible. Not like in this case where twice the load required 4.8X the number of containers.
Screenshot from Dynatrace AppMon -> comparing two Transaction Flows!
Using Docker or other container, cloud environments? Make sure you know how to monitor
Screenshot from Dynatrace AppMon: https://community.dynatrace.com/community/display/DL/Docker+Monitor+Plugin
Monitor your end users after you deployed something. Learn who the are.
Screenshot from Dynatrace AppMon & UEM: http://www.dynatrace.com/en/user-experience-management/
Monitoring user experience and impact on conversion rate
Screenshot from Dynatrace AppMon & UEM
Understand user behavior depending on who they are and what they are doing.
Screenshot from https://github.com/Dynatrace/Dynatrace-UEM-PureLytics-Heatmap
Does the behavior change if they have a less optimal user experience?
Screenshot from https://github.com/Dynatrace/Dynatrace-UEM-PureLytics-Heatmap
Seems like users that have a frustrating experience are more likely to click on Support
Screenshot from https://github.com/Dynatrace/Dynatrace-UEM-PureLytics-Heatmap
Another cool example of conversion rate compared to technical metrics
Where do all these metrics come from?
They come from tools. I work for Dynatrace and we provide all these metrics – but there are also other tools out there that do that job
Lets take one quick step back – why are we doing all of this?
We all know that since then the world got divided into two big parties. So – just out of curiosity: who is using Apple vs Android? And who is using Windows?
The phones not only changed the way we communicate – it disrupted many other technologies as well. Look at this! Pope Election in 2005.
A SINGLE hand up with an old phone taking a picture
In 2013 the „picture“ is totally different. The smart phones and tablets not only changing the way we communicate but also disrupting many other things such as taking pictures, getting light when you need one (torchlight), measuring your steps, ...
So – just to see the difference again in one picture!
Why are people so crazy about DevOps?
Because new innovative companies are disrupting existing markets by delivering better services faster
Annd IoT will show us a lot of new markets and we will see a lot of new companies that deliver new services for new use cases
This is Boston February 2015!
My IoT Example on smart roofs knowing how much pressure is on them already – notifying firefighters to clean them before it is too late: great use case for insurance companies.
We know that everyone these days can build a globally successful business with just an idea and a laptop. Such as Mark Z or the two tech brothers (12 & 14) founders of GoDimensions. They develop 12 Apps so far with more than 35k downloads.
Their idol? Steve Jobs!
Just to get the basics covered: I hope everyone has their own definition of DevOps by now. It is a lot of things depending on whom to list to our which blogs you follow.
In case you are a “DevOps Virgin” I definitely recommend checking out The Phoenix Project (the DevOps Bible) and Continuous Delivery (which is what we actually all want to achieve): Deliverying software faster with great quality and without all potential mistakes that a manual and rigid process brings with it
They really follow the stories of the first generation Unicorn Companies
Several companies changed their way they develop and deploy software over the years. Here are some examples (numbers from 2011 – 2014)
Cars: from 2 deployments to 700
Flicks: 10+ per Day
Etsy: lets every new employee on their first day of employment make a code change and push it through the pipeline in production: THAT’S the right approach towards required culture change
Amazon: every 11.6s
Remember: these are very small changes – which is also a key goal of continuous delivery. The smaller the change the easier it is to deploy, the less risk it has, the easier it is to test and the easier is it to take it out in case it has a problem.
But it is not only about delivering features faster – it is also about delivering fast features!
These stats come from here: http://nft.atcyber.com/infographics/infographic-the-importance-of-web-performance-20140913
But don’t make the mistake to blindly follow every unicorn out there
Taken from http://www.hostingadvice.com/blog/cloud-66-devops-as-a-service/
Its not about giving Devs Direct Access to Ops Deployments
If you just automate a process that hasnt yet had enough time for quality you will just produce bad software -> but faster
If you have the freedom to add more features more rapidly make sure you measure if they are used. If not – take them out. This avoids piling up Technical and Business Debt
We can solve this by combining metrics (which we‘ve seen before) and measure them from Dev to Ops
I get most of my stories from my Share Your PurePath program which is a free offering for our Dynatrace Free Trial & Personal License users: http://bit.ly/dtpersonal
Because this is what might happen:
If „Being DevOps“ just means you just increase the number of deployments then you are bound to fail. Here is an example of a bad web application. When deploying this more frequently you will end up in more war rooms
They had a monolithic app that couldnt scale endlessly. Their popularity caused them to think about re-architecture and allowing developers to make faster changes to their code. The were moving towards a Service Approach
Separating frontend logic from backend (search service). The idea was to also host these services potentially in the public cloud (frontend) and in a dynamic virtual enviornment (backend) to be able to scale better globally
The Backend Search Service Team did a lot of testing on their backend services. Scaling up and down on demand. All looked pretty good! They gave it a Thumbs Up!
On Go Live Date with the new architecture everything looked good at 7AM where not many folks were yet online!
By noon – when the real traffic started to come in the picture was completely different. User Experience across the globe was bad. Response Time jumped from 2.5 to 25s and bounce rate trippled from 20% to 60%
The backend service itself was well tested. The problem was that they never looked at what happens under load „end-to-end“. Turned out that the frontend had direct access to the database to execute the initial query when somebody executed a search. The returned list of search result IDs was then iterated over in a loop. For every element a „Micro“ Service call was made to the backend which resulted in 33! Service Invokations for this particular use case where the search result returned 33 items. Lots of wasted traffic and resources as these Key Architectural Metrics show us
They fixed the problem by understanding the end-to-end use cases and then defined backend service APIs that provided the data they really needed by the frontend. This reduced roundtrips, elimiated the architectural regression and improved performance and scalability
Lessons Learned!
Got this story also covered here: https://www.infoq.com/articles/Diagnose-Microservice-Performance-Anti-Patterns
If we monitor these key metrics in dev and in ops we can make much better decisions on which builds to deploy
We immediately detect bad changes and fix them. We will stop builds from making it into Production in case these metrics tell us that something is wrong.
We can also take features out that nobody uses if we have usage insights for our services. Like in this case we monitor % of Visitors using a certain feature. If a feature is never used – even when we spent time to improve performance – it is about time to take this feature out. This removes code that nobody needs and therefore reduces technical debt: less code to maintain – less tests to maintain – less bugs in the system!
To sum it up:
It starts with engineers. Do not even check in bad code. Look at the data tools such as dynatrace tell you!
And this is how it looks like with Dynatrace AppMon Test Automation Feature. We automatically montior every single test execution in your CI and analyze these metrics per Test and per Build. We automatically detect regressions as every metrics per Test will be baselined. This allows us to STOP A BUILD before it moves to other phases in the pipeline
Once deployed into production make sure you keep monitoring key metrics per feature.
Screenshot from Dynatrace AppMon & UEM where we can monitor usage and key metrics per Feature / Business Transaction
In Production we monitor the same metrics for our services. Seeing if a recent deployment had any change in # of SQL calls for a particular feature or the # of internal Service Calls. Helps us to make sure that we do not make bad deployments – or at least be aware of it right away to take countermeasures, e.g: rollback or fix
Make sure you understand your end users
Screenshot from Dynatrace AppMon & UEM: http://www.dynatrace.com/en/user-experience-management/
And learn to understand „how your users tick!“
Screenshot taken from https://github.com/Dynatrace/Dynatrace-UEM-PureLytics-Heatmap
If we do all that we can build a beautilful pipeline where quality metrics are enforced along the way!!
With that we can make our users happy 24/7 – at any load
Good news is that we have stories about the next generation Unicorns – Make sure you learn about how these companies transferred there business
Read the full story on my blog: http://apmblog.dynatrace.com/2015/05/29/velocity-2015-highlights-from-last-day/
Adam Auerbach from Capital One at Star West 2015 and Velocity 2015
Not a small software startup -> but an established financial company that transformed the way they develop and especially „Continuously Test“ Software!
Watch the full Verizon story from our PERFORM 2015 Recordings
There is also going to be webinar in June where Nita goes into more details