DevOps Master Certified
AppDynamics Training Session
NOT FOR DISTRIBUTION © www.codvatechlabs.com
NOT FOR DISTRIBUTION © www.codvatechlabs.com
Agenda
• Role of Observability/Monitoring in DevOps/SRE
• AppDynamicsArchitecture
• AppDynamics – On PremVs SaaS
• Architecture walkthrough - PHPApplication
• AppDynamics UIWalkthrough
• Hands On - JavaAgent and MachineAgent installation
• LiveTroubleshooting Session
• Q/A
Challenges with Traditional NOC
High Alert
Noise
✓No alert prioritization (all alerts are getting converted into incidents directly)
✓High volume of incidents due to lack of event prioritization
Underutilize
NOC
✓NOC engineers are playing only alert escalation & follow up role (purely L1)
✓No technical inputs in case of high severity incidents (P1 Outage) resulted into High MTTR (Mean Time To Resolution)
✓80% of work involves around manually monitoring alerts , watching line of graphs on screen.
Problem
Management
✓Mindset is only on alert resolution rather than problem management
✓Lack of RCA & CAPA practice (Corrective Action & Preventive Action)for repetitive high severity incidents
Scalability
Issue
✓Not able to scale rapidity due to multiple manual process in case of infra expansion
✓High chance of missing monitoring coverage due to manual process & lack of feedback system
SLA Issues
✓Service Level Agreement (SLA) are not business aligned & focus is only on availability of infrastructure
✓Lack of SLIs (Service Level Indicator) & Service Level Objective which resulted into inefficient SLA tracking
✓SLIs are the best way to ensure availability & performance instead of SLA
Predict
Notify &
Act
SRE Roadmap
Collect Data
Correlate and
Triage
Identify
Trends
SRE Golden Signals (Alerting , Troubleshooting ,Tuning & Capacity Planning)
Monitoring , Auditing , Troubleshooting & Security(Compute| Storage | Network | Application)
Start Monitoring CIs
Work closely toward 100%
monitoring coverage using
continuous monitoring
(immutable Infrastructure
as Code)
Monitoring Data Source
▪ Solarwind(Compute,Sto
rage & Network)
▪ Dynatrace(APM)
▪ Synthetic Monitoring
Design & implement CMDB
(Single Source of truth) for
entire infrastructure
Trends & Anomalies
▪ Capacity Planning
▪ Cost
Recommendations
▪ Continuous
compliance
(Detect deviations
from a “golden
baseline” )
▪ Release-to-release
benchmarks
▪ Toil – Automate
repetitive task
Problem Management
▪ Publish Top N noise
makers Cis
▪ Post-mortem
Culture using
Problem
Management
(Learning from
failure)
▪ Implement custom
Self Healing for IT
Infrastructure &
services
▪ Publish SLIs , SLO &
SLM reports
Event Management
▪ Design & implement
AIOps based layer which
will collect
data(metrics/events)
from multiple data
sources & present into
single pane of glass
▪ Design & build service
models
▪ Build event correlation
(topology/stream) to
reduce alert noise
▪ Monitoring Tools
consolidation
Incident Management
▪ Integration of
monitoring events
with ITSM Ticketing
▪ Robust automated
alert notification
(Pager duty | Alarm
Point)
▪ Define SLIs, SLOs&
SLMs
▪ Data available during
production outage
SRE Level(L1) SRE Level(L2) SRE (Tools &Automation SMEs)
Improve MTTD
▪ Virtual team for Live 24*7 monitoring
(availability & performance)
▪ Automated alert escalation to L2 NOC
Support team(P1|P2|P3 - Incidents )
▪ Tracking of escalated alerts till alert
resolution
▪ Engage Incident Management in case
P1& P2 incidents
▪ Engage NOC Dev team in case of
monitoring miss opportunities
▪ Perform Schedule Health Check-up
▪ Daily Schedule Reports(Availability |
Performance | Outage etc)
▪ Other BAU activities
Improve MTTR
▪ Provide L2 analysis for all incidents
▪ Escalate incident to L3/Product
SMEs for open incident
▪ Analyse & fix monitoring alerts
▪ Runbook - Step by step guide for
resolving an incident
▪ Incident Response Report
▪ Post mortem reports(RCA and task to
be performed to avoid future outage)
▪ Engage NOC Dev team for repetitive
task
Note : This team will have L2/ SMEs
from OS , App , DB , Middleware&
Network domain)
Improve MTBF
▪ Monitor every possible metric in
environment
▪ Design & configure robust monitoring
system(Continuous Monitoring)
▪ Working on new monitoring
opportunities
▪ Automate Runbook (Self-Healing)
▪ Toil – Automate repetitive task(shift
from manual to automated approach)
Site Reliability Engineering - Landscape
SRE/DevOps Team Structure
NOT FOR DISTRIBUTION © www.codvatechlabs.com
AppDynamics Architecture
NOT FOR DISTRIBUTION © www.codvatechlabs.com
AppDynamics Architecture – On Prem Vs SaaS Platform
• AppDynamics On Prem Ref Architecture
• AppDynamics SaaS Ref Architecture
NOT FOR DISTRIBUTION © www.codvatechlabs.com
AppDynamics UI Walkthrough
• Application Flow Map
• Transaction Score Card
• BusinessTransactions
• Transaction Snapshots
• Errors and Exceptions
• Dashboards
• Alerting
• Reports
NOT FOR DISTRIBUTION © www.codvatechlabs.com
Hands On Session :
• Setup and configure Java agent for JavaApplication
• Setup and configure Machine agent for OS Monitoring
• Lets troubleshoot live application issue
DevOps Master Certified
Q/A
Feel free to reach out us in case of any queries.
✓ Website : https://www.codvatechlabs.com
✓ Email Id : learn@codvatechlabs.com
Ref :
https://docs.appdynamics.com/21.7/en
NOT FOR DISTRIBUTION © www.codvatechlabs.com

Appdynamics Training Session

  • 1.
    DevOps Master Certified AppDynamicsTraining Session NOT FOR DISTRIBUTION © www.codvatechlabs.com
  • 2.
    NOT FOR DISTRIBUTION© www.codvatechlabs.com Agenda • Role of Observability/Monitoring in DevOps/SRE • AppDynamicsArchitecture • AppDynamics – On PremVs SaaS • Architecture walkthrough - PHPApplication • AppDynamics UIWalkthrough • Hands On - JavaAgent and MachineAgent installation • LiveTroubleshooting Session • Q/A
  • 3.
    Challenges with TraditionalNOC High Alert Noise ✓No alert prioritization (all alerts are getting converted into incidents directly) ✓High volume of incidents due to lack of event prioritization Underutilize NOC ✓NOC engineers are playing only alert escalation & follow up role (purely L1) ✓No technical inputs in case of high severity incidents (P1 Outage) resulted into High MTTR (Mean Time To Resolution) ✓80% of work involves around manually monitoring alerts , watching line of graphs on screen. Problem Management ✓Mindset is only on alert resolution rather than problem management ✓Lack of RCA & CAPA practice (Corrective Action & Preventive Action)for repetitive high severity incidents Scalability Issue ✓Not able to scale rapidity due to multiple manual process in case of infra expansion ✓High chance of missing monitoring coverage due to manual process & lack of feedback system SLA Issues ✓Service Level Agreement (SLA) are not business aligned & focus is only on availability of infrastructure ✓Lack of SLIs (Service Level Indicator) & Service Level Objective which resulted into inefficient SLA tracking ✓SLIs are the best way to ensure availability & performance instead of SLA
  • 4.
    Predict Notify & Act SRE Roadmap CollectData Correlate and Triage Identify Trends SRE Golden Signals (Alerting , Troubleshooting ,Tuning & Capacity Planning) Monitoring , Auditing , Troubleshooting & Security(Compute| Storage | Network | Application) Start Monitoring CIs Work closely toward 100% monitoring coverage using continuous monitoring (immutable Infrastructure as Code) Monitoring Data Source ▪ Solarwind(Compute,Sto rage & Network) ▪ Dynatrace(APM) ▪ Synthetic Monitoring Design & implement CMDB (Single Source of truth) for entire infrastructure Trends & Anomalies ▪ Capacity Planning ▪ Cost Recommendations ▪ Continuous compliance (Detect deviations from a “golden baseline” ) ▪ Release-to-release benchmarks ▪ Toil – Automate repetitive task Problem Management ▪ Publish Top N noise makers Cis ▪ Post-mortem Culture using Problem Management (Learning from failure) ▪ Implement custom Self Healing for IT Infrastructure & services ▪ Publish SLIs , SLO & SLM reports Event Management ▪ Design & implement AIOps based layer which will collect data(metrics/events) from multiple data sources & present into single pane of glass ▪ Design & build service models ▪ Build event correlation (topology/stream) to reduce alert noise ▪ Monitoring Tools consolidation Incident Management ▪ Integration of monitoring events with ITSM Ticketing ▪ Robust automated alert notification (Pager duty | Alarm Point) ▪ Define SLIs, SLOs& SLMs ▪ Data available during production outage
  • 5.
    SRE Level(L1) SRELevel(L2) SRE (Tools &Automation SMEs) Improve MTTD ▪ Virtual team for Live 24*7 monitoring (availability & performance) ▪ Automated alert escalation to L2 NOC Support team(P1|P2|P3 - Incidents ) ▪ Tracking of escalated alerts till alert resolution ▪ Engage Incident Management in case P1& P2 incidents ▪ Engage NOC Dev team in case of monitoring miss opportunities ▪ Perform Schedule Health Check-up ▪ Daily Schedule Reports(Availability | Performance | Outage etc) ▪ Other BAU activities Improve MTTR ▪ Provide L2 analysis for all incidents ▪ Escalate incident to L3/Product SMEs for open incident ▪ Analyse & fix monitoring alerts ▪ Runbook - Step by step guide for resolving an incident ▪ Incident Response Report ▪ Post mortem reports(RCA and task to be performed to avoid future outage) ▪ Engage NOC Dev team for repetitive task Note : This team will have L2/ SMEs from OS , App , DB , Middleware& Network domain) Improve MTBF ▪ Monitor every possible metric in environment ▪ Design & configure robust monitoring system(Continuous Monitoring) ▪ Working on new monitoring opportunities ▪ Automate Runbook (Self-Healing) ▪ Toil – Automate repetitive task(shift from manual to automated approach) Site Reliability Engineering - Landscape SRE/DevOps Team Structure
  • 6.
    NOT FOR DISTRIBUTION© www.codvatechlabs.com AppDynamics Architecture
  • 7.
    NOT FOR DISTRIBUTION© www.codvatechlabs.com AppDynamics Architecture – On Prem Vs SaaS Platform • AppDynamics On Prem Ref Architecture • AppDynamics SaaS Ref Architecture
  • 8.
    NOT FOR DISTRIBUTION© www.codvatechlabs.com AppDynamics UI Walkthrough • Application Flow Map • Transaction Score Card • BusinessTransactions • Transaction Snapshots • Errors and Exceptions • Dashboards • Alerting • Reports
  • 9.
    NOT FOR DISTRIBUTION© www.codvatechlabs.com Hands On Session : • Setup and configure Java agent for JavaApplication • Setup and configure Machine agent for OS Monitoring • Lets troubleshoot live application issue
  • 10.
    DevOps Master Certified Q/A Feelfree to reach out us in case of any queries. ✓ Website : https://www.codvatechlabs.com ✓ Email Id : learn@codvatechlabs.com Ref : https://docs.appdynamics.com/21.7/en NOT FOR DISTRIBUTION © www.codvatechlabs.com