Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Redefine Triage by Learning the Golden Nuggets of APM From Noted "APM Best Practices" Author Michael Sydor

825 views

Published on

Increase your APM proficiency. Learn how you can identify and harness KPIs to make sense of your APM "big data." And find out how these techniques will help to prepare for your upgrade to the new features and functionality with latest APM release.

For more information on DevOps solutions from CA Technologies, please visit: http://bit.ly/1wbjjqX

Published in: Technology
  • Be the first to comment

Redefine Triage by Learning the Golden Nuggets of APM From Noted "APM Best Practices" Author Michael Sydor

  1. 1. ca Opscenter Redefine Triage by Learning the Golden Nuggets of APM from noted “APM Best Practices” Author Michael Sydor Michael Sydor OCX14S #CAWorld CA Technologies Service Assurance
  2. 2. 2 © 2014 CA. ALL RIGHTS RESERVED. Abstract Increase your APM proficiency. Learn how you can identify and harness KPIs to make sense of your APM "big data." And find out how these techniques will help to prepare for your upgrade to the new features and functionality with latest APM release. Michael Sydor CA Technologies Sr. Engineering Services Architect
  3. 3. 3 © 2014 CA. ALL RIGHTS RESERVED. Agenda WHY SO MANY METRICS WITH APM? WHAT WE ARE LEARNING WITH ADVANCED BEHAVIORAL ANALYTICS (ABA) HOW TO FIND KPIS HOW TO GENERATE A CUSTOMER ABA CONFIGURATION 1 2 3 4
  4. 4. 4 © 2014 CA. ALL RIGHTS RESERVED. Typical APM Cluster Dozens to hundreds of applications –2800 JVMs/CLRs Up to 5M metrics, every 15 seconds Large applications span multiple data centers –2-8 APM clusters, typical –30-70 EM Collectors for a nationwide portal application 12M to 28M metrics, every 15 seconds … certainly sounds like big data!!!
  5. 5. 5 © 2014 CA. ALL RIGHTS RESERVED. The Experts … "[T]he problem ... [w]iththis bulk acquisition of data on everybody [is that the NSA has] inundated their analysts with data. Unless they do a very focused attack, they're buried in information and that's why they can't succeed." –Bill Binney (former National Security Agency (NSA)), where he was a high- ranking official, mathematician and codebreaker. The numbers have no way of speaking for themselves. We speak for them. We imbue them with meaning .... [W]e may construe them in self- serving ways that are detached from their objective reality." –Nate Silver
  6. 6. 6 © 2014 CA. ALL RIGHTS RESERVED. What is Big Data? APM information is “big”… but it is not “big data” without enrichment. 5M Metrics that you don’t fully understand OR 5M Metrics that you don’t fully understand Trouble management Version control Time of ____ constraints Air traffic advisories Weather forecast AP news updates Marketing campaigns Enrichment Correlation Trends Insights Anomalies
  7. 7. 7 © 2014 CA. ALL RIGHTS RESERVED. Challenges for Big Data Data variety –Different sources give different perspectives. Does your data have a significant perspective? Validation–Is the data source meaningful/predictive? Consistency –Are the values trustworthy? Data structure and nomenclature –Mapping, Transformation Temporal impedance mismatch –APM: Real-time with 15 second reporting interval –Trouble management: +15-30 minutes later –Stock ticker: +15-30 minutes later –Air traffic advisories: +30-60 minutes later –Version control: days to weeks in advance –Marketing campaign assessment: 2-4 weeks later
  8. 8. 8 © 2014 CA. ALL RIGHTS RESERVED. KPI Management Maturity SGCM: Stalls, GC settings, Concurrency, Memory management trends APC: Availability, Performance, Capacity EKB: Errors, Key resource performance, Business transaction survey Value KPI maturity (Platform) (Application) (Transaction)
  9. 9. What We are Learning with APM Advanced Behavioral Analytics
  10. 10. 10 © 2014 CA. ALL RIGHTS RESERVED. Advanced Behavioral Analytics Logical Architecture APM Cluster 5M Metrics 100k Metrics (via RegEx) Anomaly engine Anomalies Alerts Why only 100k Metrics? Why not 5M?
  11. 11. 11 © 2014 CA. ALL RIGHTS RESERVED. RegEx = Regular Expression analytics.metricfeed.process.3 = Custom Metric Host (Virtual) |Custom Metric Process (Virtual)|Custom Business Application Agent (Virtual) analytics.metricfeed.metric.3 = By Business Service|[^|]+|[^|]+|[^|]+:.+
  12. 12. 12 © 2014 CA. ALL RIGHTS RESERVED. RegEx is hard … but easy to validate.
  13. 13. 13 © 2014 CA. ALL RIGHTS RESERVED. Metricfeed.3 1 2 3 4 5 6 7 8 9 10 11 12 0 20 40 60 80 100 120 140 160 180 200 Series1 metricfeed.3
  14. 14. 14 © 2014 CA. ALL RIGHTS RESERVED. Suspects Identified via Baseline Technique 1 2 3 4 5 6 0 2 4 6 8 10 12 14 16 18 Series1 Suspects via baselinetechniques average RT only
  15. 15. 15 © 2014 CA. ALL RIGHTS RESERVED. Metric Count TypeView
  16. 16. 16 © 2014 CA. ALL RIGHTS RESERVED. What is an Application? Front-ends –Browser? Webservice? Messaging? Back-ends –Databases WebservicesMessaging Mainframes Trading_Partners Muck-in-the-middle –Software quality, stability and scalability -We want to identify KPIs for each of these elements: –Helps us build a useful dashboard for operations –Helps expose with the resources are really doing –Helps us define acceptance criteria, to act proactively –Helps us to triage really effectively
  17. 17. How to Find KPIs
  18. 18. 18 © 2014 CA. ALL RIGHTS RESERVED. Capacity KPIs –“Tree Rings”
  19. 19. 19 © 2014 CA. ALL RIGHTS RESERVED. Performance KPIs High-volume + significant response time
  20. 20. 20 © 2014 CA. ALL RIGHTS RESERVED. Create a simple alert and threshold (ConnectionStatus).
  21. 21. 21 © 2014 CA. ALL RIGHTS RESERVED. Create a simple alert, find restart and threshold (MetricCount). “UP” –but not actually doing anything!!!
  22. 22. 22 © 2014 CA. ALL RIGHTS RESERVED. Understanding Your Environment Identify the KPIs. –Availability Agent ConnectionStatus Number live metrics (MetricCount) –Performance High-volume components with significantresponse time –NOT“Top 10 Response Time” –Capacity Highest-volume components Don’t wait for production. –Make it part of your pre-production review. –Manage the application lifecycle by trending KPIs.
  23. 23. 23 © 2014 CA. ALL RIGHTS RESERVED. KPI Evolution Good Better (additional) Best (additional) Stalls Availability –connected status Errors GC settings Availability–metric count Key resource performance Concurrency Suspect performance Business transaction survey Memory management (graph) Suspect capacity Platform Coarse information ... but not really APM Application, transactions, resources The APM Advantage
  24. 24. How to Generate a Custom ABA Configuration
  25. 25. 25 © 2014 CA. ALL RIGHTS RESERVED. Details are on the community site as blog updates. Search on each of the following keywords: –“average”, “responses”, “errors”, “Stalls”, “Stalled” Copy each result to a test file (notepad is best). Feed the files to ./build_config.py. Copy the resulting Regular Expressions to your Analytics.properties file. –96:: hot property –changes detected in about a minute –95:: recycle MOM
  26. 26. 26 © 2014 CA. ALL RIGHTS RESERVED. SORT on this column
  27. 27. 27 © 2014 CA. ALL RIGHTS RESERVED. <CTRL><A> <CTRL><C> <CTRL><V>
  28. 28. 28 © 2014 CA. ALL RIGHTS RESERVED. Example Execution
  29. 29. 29 © 2014 CA. ALL RIGHTS RESERVED. Resources Community site –Cookbook: APM HealthCheck –Understanding Which Metrics Matter (KPI discussion) –Cookbook: Application Audit More details on the baseline techniques and process –Blog entries Redefine Triage by Learning the Golden Nuggets of APM What are KPIs and how can I get some quick?! Big Data -What does it mean for APM Why Does ABA Find Anomalies When There Is Nothing Wrong In Production? APM best practices –Realizing Application Performance Management –Available on Amazon.com and Apress.com Baselines, Test Plans, App Audits, Triage, Firefighting Organizational Models, Service Catalogs
  30. 30. 30 © 2014 CA. ALL RIGHTS RESERVED. For More Information To learn more about DevOps, please visit: http://bit.ly/1wbjjqX Insert appropriate screenshot and textoverlayfrom following“More Info Graphics” slide here; ensure it links to correct page DevOps
  31. 31. 31 © 2014 CA. ALL RIGHTS RESERVED. For Informational Purposes Only © 2014CA. All rights reserved. All trademarks referenced herein belong to their respective companies. This presentation provided at CA World 2014 is intended for information purposes only and does not form any type of warranty. Some of the specific slides with customer references relate to customer's specific use and experience of CA products and solutionssoactual results may vary. Terms of this Presentation

×