Rob.Jahn@Dynatrace.com
Your App /
Container
Cloud
Native
Proof – autonomous cloud survey (median vs 95th percentile)
Verdict: The majority are not “cloud native” (yet)
3 out of 10
Business Impacting
Deployments
3 hotfixes
Per Production
Deployment
4.8 days
MTTR
(Mean Time to Repair)
2.5 weeks
Code to Production
(Commit Cycle Time)
1 out of 10 0 hotfixes ~4 hours2 days
Median
95th Percentile
1. Complexity
2. Manual operations
3. Lack information
4. People
5. Identifying root cause
MULTI-PLATFORM MONITORING
QUALITY
SHIFT-LEFT
DEPLOYMENT
ENABLE-RIGHT
SELF
HEALING
• Automate operations (self-healing) – auto-
mitigate bad deployments in production
• Automate deployment (enable-right) – push “monitoring-as-
code” and “content events” for auto-validation and auto-alerting
• Automate quality (shift-left) – automate the pipeline and
stop bad code changes before they reach prod
• Automated monitoring – full stack monitoring with context
& dependency information of infrastructure and
transactions
Key differentiators – unbreakable continuous delivery blue print
• Automated monitoring – full stack monitoring with context
& dependency information of infrastructure and
transactions
Monitoring Solution Capabilities
Tags & Meta DataAutomated Rollout Full Stack Monitoring
Applications
Services
Processes
Hosts
Data Centers
Env: production App: crm
Container: blue
Service: front-end
Namespace: prod
Support-team: alpha-dog
Process-group: tomcat
owner: dev-one
region: eastus
Monitoring Solution Capabilities, continued
Understands Dependencies Event Context
1. Deployment
s
2. Configuratio
n Changes
3. Testing
Start/Stop
4. Maintenance
Start/Stop
API
1. Push Events
2. Query
Problems
3. Query
Topology
4. Query time-
series
metrics
MULTI-PLATFORM MONITORING
QUALITY
SHIFT-LEFT
• Automate quality (shift-left) – automate the pipeline and
stop bad code changes before they reach prod
• Automated monitoring – full stack monitoring with context
& dependency information of infrastructure and
transactions
Key differentiators – unbreakable continuous delivery blue print
1 3 5Staging
CI CD
Code / Config
Change
{
"lowerBound": 100,
"upperBound": 1000,
"_comment": "global configuration environment-wide",
"timeseries": [
{
"timeseriesId": "service.responsetime",
"aggregation": "avg",
"entityIds": "SERVICE-3211ABE8813B9239",
"lowerBound": 20,
"upperBound": 300
}
]
}
Monitoring
Spec
Monitoring2 4 End Users
YES / NO ?
Metric Source &
Query
Grading
Details
& Metric
Score
Pitometer Specfile
Total
Scoring
Objectives
2GB
Allocated Bytes (from Prometheus)
> 2GB: 0 Points
< 2GB: 20 Points
5%
2% < 2%: 0 Points
< 5%: 10 Points
> 5%: 20 Points
Conversion Rate (Dynatrace)
GraderSource
If value: 3GB
Score: 0
If value: 3.9%
Score: 10
Total Score: 10Result is a Fail
#1
#2
#3
Build Pipeline
1. Build
code
2. Run unit
tests
3. Create
artifact
that
contains
Perf Spec
Release Pipeline
1. Deploy
Code
Source Code
Keptn
Pitometer
Service
API
Calls
• Code
• Perf Spec
API
Calls
Full Stack Monitoring Data in Dynatrace
Application
Under Test
Collect
Data
1
2
3
4
5
64. Pitometer
Quality Gate
2. Performanc
e Test
3. Deployment
Event
https://github.com/dt-demos/pitometer-web-service
Application /
Cloud
Infrastructure
Monitoring Tool
db
Check 1
 Is bad coding
leading to higher
costs?
Check 2
• New dependencies? On
Purpose?
• Services connecting
accurately?
• Number of container
instances needed?
Check 3
• Are we jeopardizing
our SLAs?
• Does load balancing
work?
• Difference between
Canaries?
Check 4
• Did we introduce
new “hidden”
exceptions?
Metrics
 Memory usage
 Bytes sent /
received
 Overall CPU
 CPU per
transaction type
Metrics
 Number of incoming /
outgoing
dependencies
 Number of instances
running on containers
 Metrics
 Response Time
(Percentiles)
 Throughput & Perf
per Instance /
Canary
 Metrics
 Total Exceptions
 Exceptions by
Class & Service
MULTI-PLATFORM MONITORING
QUALITY
SHIFT-LEFT
DEPLOYMENT
ENABLE-RIGHT
• Automate deployment (enable-right) – push “monitoring-as-
code” and “content events” for auto-validation and auto-
alerting
• Automate quality (shift-left) – automate the pipeline and
stop bad code changes before they reach prod
• Automated monitoring – full stack monitoring with context
& dependency information of infrastructure and
transactions
Key differentiators – unbreakable continuous delivery blue print
1 2 3 4
Release
Automation
Deployment
Automation
“Green” Deployment
Deploy “Version
123”
Adjust memory by
“100MB”
Monitoring Tools
Shows All Dependencies
Augments Data for
Richer Context
Integrating the DevOps tool chain
Last Seen by
Source
App Owner
Run Books
MULTI-PLATFORM MONITORING
QUALITY
SHIFT-LEFT
DEPLOYMENT
ENABLE-RIGHT
SELF
HEALING
• Automate operations (self-healing) – auto-
mitigate bad deployments in production
• Automate deployment (enable-right) – push “monitoring-
as-code” for auto-validation and auto-alerting
• Automate quality (shift-left) – automate the pipeline and
stop bad code changes before they reach prod
• Automated monitoring – full stack monitoring with context
& dependency information of infrastructure and
transactions
Key differentiators – unbreakable continuous delivery blue print
2
Self-healing
Automation1
User Impact
Problem Evolution
1 CPU Exhausted? Add a new service instance!
2 High Garbage Collection? Adjust/Revert Memory Settings!
3 Issue with BLUE only? Switch back to GREEN!
Hung threads? Restart Service!4
Impact Mitigated??
5 Still ongoing? Initiate Rollback!
? Still ongoing? Escalate
Mark Bad Commits
Update Tickets
2:00 a.m. Alert?
Auto-mitigate! …
Application / Cloud
Infrastructure Monitoring Tool
Add new metric to quality
gate – think automation
here, too!
{
"lowerBound": 100,
"upperBound": 1000,
"_comment": "global configuration
environment-wide",
"timeseries": [
{
"timeseriesId":
"service.responsetime",
"aggregation": "avg",
"entityIds": "SERVICE-
3211ABE8813B9239",
"lowerBound": 20,
"upperBound": 300
}
]
}
…
…
…
Checkout Azure Logic Apps
Check out Azure Automation
https://medium.com/@sashman90/ops-mitigation-triangle-300c81d97df6
1 2 4 53
Production
Staging Approve Staging Production Approve Production
CI CD CI CD CI CD CI CD
Push Context Auto-Quality Gate Push Context Auto-Validate
Auto-Remediate!
Build #17 Build #18
Unbreakable delivery pipeline in action
Performance Gates as Code Ops as Code (SLA, Remediation)
https://cloudblogs.microsoft.com/openso
urce
keptn.sh - OpenSource framework for unbreakable pipeline and more …
CORE CAPABILITIES
• Automated multi-stage unbreakable
pipelines
• Self-healing blue / green
• Event-driven runbook automation
DESIGN PRINCIPALS
• GitOps-based collaboration
• Operator patterns for all logic
components
• Monitoring-and-operations-as-code
• Built-on and for Kubernetes
• Event-driven and serverless
• Pluggable tooling
Questions?
N + 1
Architectural
Anti-Pattern
“Works” well within
a single process
Product Service
Quote Service
1 call to Quote Service
=
44 calls to Product Service
1
1
4
1
7
1
3
1 call to Quote Service
=
87 calls to DB
Quote Service
1
8
7
Product Service Product DB
26k Database Calls
809
3956
4347
4773
3789
3915
4999
33
Payload Flood
Architectural
Anti-Pattern
18 MB
21. MB
69MB
Total
20 MB
69MB
31.6MB
vs

Using ai and automation to build resiliency into azure dev ops

  • 3.
  • 4.
  • 5.
    Proof – autonomouscloud survey (median vs 95th percentile) Verdict: The majority are not “cloud native” (yet) 3 out of 10 Business Impacting Deployments 3 hotfixes Per Production Deployment 4.8 days MTTR (Mean Time to Repair) 2.5 weeks Code to Production (Commit Cycle Time) 1 out of 10 0 hotfixes ~4 hours2 days Median 95th Percentile
  • 6.
    1. Complexity 2. Manualoperations 3. Lack information 4. People 5. Identifying root cause
  • 7.
    MULTI-PLATFORM MONITORING QUALITY SHIFT-LEFT DEPLOYMENT ENABLE-RIGHT SELF HEALING • Automateoperations (self-healing) – auto- mitigate bad deployments in production • Automate deployment (enable-right) – push “monitoring-as- code” and “content events” for auto-validation and auto-alerting • Automate quality (shift-left) – automate the pipeline and stop bad code changes before they reach prod • Automated monitoring – full stack monitoring with context & dependency information of infrastructure and transactions Key differentiators – unbreakable continuous delivery blue print • Automated monitoring – full stack monitoring with context & dependency information of infrastructure and transactions
  • 8.
    Monitoring Solution Capabilities Tags& Meta DataAutomated Rollout Full Stack Monitoring Applications Services Processes Hosts Data Centers Env: production App: crm Container: blue Service: front-end Namespace: prod Support-team: alpha-dog Process-group: tomcat owner: dev-one region: eastus
  • 9.
    Monitoring Solution Capabilities,continued Understands Dependencies Event Context 1. Deployment s 2. Configuratio n Changes 3. Testing Start/Stop 4. Maintenance Start/Stop API 1. Push Events 2. Query Problems 3. Query Topology 4. Query time- series metrics
  • 10.
    MULTI-PLATFORM MONITORING QUALITY SHIFT-LEFT • Automatequality (shift-left) – automate the pipeline and stop bad code changes before they reach prod • Automated monitoring – full stack monitoring with context & dependency information of infrastructure and transactions Key differentiators – unbreakable continuous delivery blue print
  • 11.
    1 3 5Staging CICD Code / Config Change { "lowerBound": 100, "upperBound": 1000, "_comment": "global configuration environment-wide", "timeseries": [ { "timeseriesId": "service.responsetime", "aggregation": "avg", "entityIds": "SERVICE-3211ABE8813B9239", "lowerBound": 20, "upperBound": 300 } ] } Monitoring Spec Monitoring2 4 End Users YES / NO ?
  • 12.
    Metric Source & Query Grading Details &Metric Score Pitometer Specfile Total Scoring Objectives 2GB Allocated Bytes (from Prometheus) > 2GB: 0 Points < 2GB: 20 Points 5% 2% < 2%: 0 Points < 5%: 10 Points > 5%: 20 Points Conversion Rate (Dynatrace) GraderSource If value: 3GB Score: 0 If value: 3.9% Score: 10 Total Score: 10Result is a Fail #1 #2 #3
  • 13.
    Build Pipeline 1. Build code 2.Run unit tests 3. Create artifact that contains Perf Spec Release Pipeline 1. Deploy Code Source Code Keptn Pitometer Service API Calls • Code • Perf Spec API Calls Full Stack Monitoring Data in Dynatrace Application Under Test Collect Data 1 2 3 4 5 64. Pitometer Quality Gate 2. Performanc e Test 3. Deployment Event https://github.com/dt-demos/pitometer-web-service Application / Cloud Infrastructure Monitoring Tool db
  • 14.
    Check 1  Isbad coding leading to higher costs? Check 2 • New dependencies? On Purpose? • Services connecting accurately? • Number of container instances needed? Check 3 • Are we jeopardizing our SLAs? • Does load balancing work? • Difference between Canaries? Check 4 • Did we introduce new “hidden” exceptions? Metrics  Memory usage  Bytes sent / received  Overall CPU  CPU per transaction type Metrics  Number of incoming / outgoing dependencies  Number of instances running on containers  Metrics  Response Time (Percentiles)  Throughput & Perf per Instance / Canary  Metrics  Total Exceptions  Exceptions by Class & Service
  • 15.
    MULTI-PLATFORM MONITORING QUALITY SHIFT-LEFT DEPLOYMENT ENABLE-RIGHT • Automatedeployment (enable-right) – push “monitoring-as- code” and “content events” for auto-validation and auto- alerting • Automate quality (shift-left) – automate the pipeline and stop bad code changes before they reach prod • Automated monitoring – full stack monitoring with context & dependency information of infrastructure and transactions Key differentiators – unbreakable continuous delivery blue print
  • 16.
    1 2 34 Release Automation Deployment Automation “Green” Deployment Deploy “Version 123” Adjust memory by “100MB” Monitoring Tools Shows All Dependencies Augments Data for Richer Context
  • 17.
    Integrating the DevOpstool chain Last Seen by Source App Owner Run Books
  • 18.
    MULTI-PLATFORM MONITORING QUALITY SHIFT-LEFT DEPLOYMENT ENABLE-RIGHT SELF HEALING • Automateoperations (self-healing) – auto- mitigate bad deployments in production • Automate deployment (enable-right) – push “monitoring- as-code” for auto-validation and auto-alerting • Automate quality (shift-left) – automate the pipeline and stop bad code changes before they reach prod • Automated monitoring – full stack monitoring with context & dependency information of infrastructure and transactions Key differentiators – unbreakable continuous delivery blue print
  • 19.
    2 Self-healing Automation1 User Impact Problem Evolution 1CPU Exhausted? Add a new service instance! 2 High Garbage Collection? Adjust/Revert Memory Settings! 3 Issue with BLUE only? Switch back to GREEN! Hung threads? Restart Service!4 Impact Mitigated?? 5 Still ongoing? Initiate Rollback! ? Still ongoing? Escalate Mark Bad Commits Update Tickets 2:00 a.m. Alert? Auto-mitigate! … Application / Cloud Infrastructure Monitoring Tool Add new metric to quality gate – think automation here, too! { "lowerBound": 100, "upperBound": 1000, "_comment": "global configuration environment-wide", "timeseries": [ { "timeseriesId": "service.responsetime", "aggregation": "avg", "entityIds": "SERVICE- 3211ABE8813B9239", "lowerBound": 20, "upperBound": 300 } ] } … … …
  • 20.
  • 21.
    Check out AzureAutomation
  • 22.
  • 24.
    1 2 453 Production Staging Approve Staging Production Approve Production CI CD CI CD CI CD CI CD Push Context Auto-Quality Gate Push Context Auto-Validate Auto-Remediate! Build #17 Build #18 Unbreakable delivery pipeline in action Performance Gates as Code Ops as Code (SLA, Remediation)
  • 25.
  • 26.
    keptn.sh - OpenSourceframework for unbreakable pipeline and more … CORE CAPABILITIES • Automated multi-stage unbreakable pipelines • Self-healing blue / green • Event-driven runbook automation DESIGN PRINCIPALS • GitOps-based collaboration • Operator patterns for all logic components • Monitoring-and-operations-as-code • Built-on and for Kubernetes • Event-driven and serverless • Pluggable tooling
  • 27.
  • 28.
  • 29.
  • 30.
    Product Service Quote Service 1call to Quote Service = 44 calls to Product Service 1 1 4 1 7 1 3
  • 31.
    1 call toQuote Service = 87 calls to DB Quote Service 1 8 7 Product Service Product DB
  • 32.
  • 33.
  • 34.
  • 35.

Editor's Notes

  • #3 Feeling the pressure to keep up with customers and competitive demand in the market? Or the squeeze to deliver more business value at an increasingly fast pace? While the cloud, containerization, and microservices offer efficiency and scaling, many organizations aren’t quite prepared for that additional complexity of cloud native technologies and demands. AI, automation and monitoring facilitate the management of complex modern delivery platforms and help build resiliency into your deployment to enable better performance. In the context of software delivery pipelines.
  • #4 Review how AI, automation and modern monitoring can facilitate management of complex modern delivery platforms help build resiliency into your deployments. increasing your feature velocity Review key performance indicators and an implementation map to get you there Demo concepts of automated quality gates
  • #5 Show of Hands How long does it take from final code commit to reach production? How many hot fixes in a typical release? How long does it take repair a production problem? Is it improving with cloud technologies? Many organizations aren’t quite prepared for that additional complexity of cloud native technologies and demands.
  • #6 Results from ACM survey of Dynatrace enterprise customers