Nastel AutoPilot Proactive Application Analytics

239
-1

Published on

A presentation on Nastel AutoPilot's capabilities for advanced application analytics. Based on Complex Event Processing (CEP) it provides early warning about potential or actual problems across multiple data sources - and it does it in real-time.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
239
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Nastel AutoPilot Proactive Application Analytics

  1. 1. Nastel Technologies ConfidentialAutoPilotMiddleware-centric Application Performance MonitoringWith Advanced Performance Analytics
  2. 2. Challenges many of our customers faceCompetitive PressuresAbility to react to volatile marketRapid changes in demandNeed to retain customers and keep service levels high2
  3. 3. Challenges many of our customers faceCompetitive PressuresAbility to react to volatile marketRapid changes in demandNeed to retain customers and keep service levels highRequirement for Sustainable Cost ReductionOff shoring & Out SourcingDe-duplication – overlapping products and rolesNeed to accomplish more for less3
  4. 4. Challenges many of our customers faceCompetitive PressuresAbility to react to volatile marketRapid changes in demandNeed to retain customers and keep service levels highRequirement for Sustainable Cost ReductionOff shoring & Out SourcingDe-duplication – overlapping products and rolesNeed to accomplish more for lessRegulatory ChallengesDodd-Frank, Basel III, HIPAA and moreNeed to manage risk4
  5. 5. Nastel helps address Competitive PressuresCompetitive PressuresIdentifies issues that could prevent systems fromhandling rapid changes in order volumeReduces number and duration of outages5CloudCEPAutoPilot’s Complex Event Processinghelps manage competitive pressuresby providing automated problemdetection - reducing number &duration of outages
  6. 6. Nastel helps address Competitive PressuresCompetitive PressuresBig Data – if you don’t master the exploitation of big data,your competitors will…6CloudCEP
  7. 7. Nastel helps address Competitive PressuresCompetitive PressuresBig Data – if you don’t master the exploitation of big data,your competitors will… If you master big data, you can:Resolve problems faster, improve service levels and retain customersUnderstand customer behaviourSee the patterns and learn how your users make use of your apps and from thisdesign ones that better meet their needs - before your competitors do7CloudCEP
  8. 8. Nastel helps address Competitive PressuresCompetitive PressuresBig Data – if you don’t master the exploitation of big data,your competitors will… If you master big data, you can:Resolve problems faster, improve service levels and retain customersUnderstand customer behaviourSee the patterns and learn how your users make use of your apps and from thisdesign ones that better meet their needs - before your competitors do8CloudCEPAutoPilot’s is almost unique inunderstanding applicationperformance data and analytics,both web and legacy. It was bakedinto AutoPilot from the ground upand is provided as close to real-time as is possible
  9. 9. Nastel helps address Cost ReductionRequirement for Sustainable Cost ReductionImprove effectiveness of offshore teams by avoidingeyes-on-screen monitoring9CloudCEPutilizationOffshore team effectiveness improved- No eyes-on-screen monitoringnecessary as AutoPilot only alerts ahuman when absolutely necessary,resulting in improved IT resourcesutilization
  10. 10. Nastel helps address Cost ReductionRequirement for Sustainable Cost ReductionImprove effectiveness of offshore teams by avoidingeyes-on-screen monitoringReduce the number of tools required for monitoring and managementStart by consolidating their data into AutoPilot for consistency10CloudCEPNumber of tools can be reduced - AutoPilot supports all majormiddleware platforms with a unified monitoring platformCloudServersApplicationServersTIBCO WMQSystem ZDataPowerSolaceDBCEPJ2EE/.NET
  11. 11. Nastel helps address Cost ReductionRequirement for Sustainable Cost ReductionImprove effectiveness of offshore teams by avoidingeyes-on-screen monitoringReduce the number of tools required for monitoring and managementImprove productivity by eliminating false-positive alerts11AutoPilot improves productivity using CEP to calculate a trend and instead of false alertsat T1, T2, T3 and T4 - CEP dynamically creates its own metrics based on the events itreceives from collectors (agents/probes) and turns them into actionable information ormetrics and correctly alerts on the trend at T5 – more effective staff utilizationTimeCPUThresholdT1 T2 T3 T4 T5
  12. 12. Nastel helps address Regulatory ChallengesRegulatory ChallengesSegregation of duties, Privileged access, recertification12AutoPilot helps enterprises controlSegregation of duties and privileged accessvia a single security model employed acrossall middleware – This helps reduce riskUser name: Albert MavashevPassword Expires in: 30 daysAccount disabled Audit accountAccount lockedLDAPInherit permissions from owner: √WMQ Group DataPower GroupSolace Group TIBCO RV Group√√√√Administrator@Acme.com√TIBCO EMS Group √
  13. 13. Nastel helps address Regulatory ChallengesRegulatory ChallengesSegregation of duties, Privileged access, recertificationProvides vital insight into compliance with regulatory standards13AutoPilot automatically tracksapplications across theenterprise capturing vitalinsight into compliance withregulatory standards. Its real-time performance monitoringenables you to you to staycompliant with your internaland external commitments.TradeStart Missing VerificationTradeEndCustomerAccess
  14. 14. NastelTechnologiesConfidential14Active Real-Time Dashboard
  15. 15. NastelTechnologiesConfidential15Active Real-Time Dashboard
  16. 16. Middleware-Centric Application Performance Monitoring16StorageServers DatabasesNetworkINFRASTRUCTUREMessagingMiddlewareApplicationServersEnterpriseService BusSOAAppliancesTradingEquitiesClaimsProcessingFundsTransfersOrderHandlingPaymentsProcessingAPPLICATIONSTRANSACTIONAL MONITORINGTRANSACTIONAL MONITORINGTRADE AUDITINGCUST IDTRACKINGBALANCEAUTHORIZATIONFAILED TXLOST TXVALIDATIONOPERATIONAL MONITORINGOPERATIONAL MONITORINGCEP Policy EngineCEP Policy Engine
  17. 17. Middleware-Centric Application Performance Monitoring17StorageServers DatabasesNetworkINFRASTRUCTUREMessagingMiddlewareApplicationServersEnterpriseService BusSOAAppliancesTradingEquitiesClaimsProcessingFundsTransfersOrderHandlingPaymentsProcessingAPPLICATIONSTRANSACTIONAL MONITORINGTRANSACTIONAL MONITORINGTRADE AUDITINGCUST IDTRACKINGBALANCEAUTHORIZATIONFAILED TXLOST TXVALIDATIONOPERATIONAL MONITORINGOPERATIONAL MONITORINGCEP Policy EngineCEP Policy EngineRepositoryBusiness Service Viewsfor Line of BusinessReal-time Viewsfor Operations
  18. 18. AutoPilot Architecture: Foundation for building Elastic APMDomainServer(CEP)CEPServerPRODCEPServerPRODCEPServerQACEPServerQACEPServerDEVCEPServerDEVCEPServerPRODCEPServerPRODPub-sub over IPPMDBGridGridFail-overFail-overStateState• Business Rules• Analytics• Actions• Notifications• Desired statePolicies• Sampling• Events• Transactions• Streaming• Data sourcesMonitors• Events• Event payload• Metrics• KPIs & KBIs• Derived MetricsFactsMonitorsFactsKPIsKBIsPoliciesObjectivesGoalsUsersDashboardAlertsNotifications18
  19. 19. Active Data Grid:In-memory cache with persistenceElastic APM:Just-in-time deployment across CEP instancesCEP InstancePoliciesDataSourcesCEP InstanceDataSource PolicyPersistentStorePersistentStore19
  20. 20. Policies: Rules &Situation AnalysisCompound Event /Predicted SituationCEP: Complex Event & Metric ProcessingKPIs, Events,Actions andNotificationsAutoPilot CEPEvents&MetricsRules processing speed:The single CEP engine running on 64 bitquad CPU server with 4 GB of memorycan process 2M rules per second.Because CEP is a virtual machine it canscale up linearly. By adding anadditional CEP engine the speed willdouble.20
  21. 21. Metrics21Metric Short DescriptionValue Current valueUpdate-Count Times value updated (changed or same)Change-Count Times value changedReset-Count Number of resetsPrevious-Value Previous valueTime-Created Time CreatedLast-Updated Time last updatedLast-Changed Time last changedUpdate-Age time since updateChange-Age time since changeTime-Difference time difference in ms between fact publisher (origin) and subscriberMin Overall Minimum since resetMax Overall Maximum since resetMAvg Moving averageCounter last actual value for a counter type, versus the delta reportedTime-Since-Reset Time since resetChange-Latency time between latest changesUpdate-Latency time between latest updatesUpdate-Velocity rate of updateHistory-Size number of facts in history storeHistory-Max-Size maximum number of history samplesHistory-Time time reprented by historyHistory-Avg Average of values in history factsHistory-EMAvg Exponential Moving Average of values in history factsHistory-Max Maximum values in fact historyHistory-Min Minimum values in fact historyHistory-Variance Variance of values in fact historyHistory-Deviation Standard Deviation of values in fact historyHistory-Dev-Mean number of standard deviations from the meanHistory-Bound Upper bound based on Chebyshev in-equalityHistory-Band-High High band based on Bolligner bandsHistory-Band-Low Low band based on Bolligner bandsHistory-RSI Relative Strength IndicatorHistory-SO-K Stochastic oscillatorHistory-CAvg Average percent change in history (based on % change)History-CVariance Variance of values in fact history(based on % change)History-CDeviation Standard Deviation of values in fact history (based on % change)History-CBound Upper bound based on Chebyshev in-equality (based on % change)History-CDev-Mean number of standard deviations from the mean (based on % change)History-CBand-High High band based on Bolligner bands (based on % change)History-CBand-Low Low band based on Bolligner bands (based on % change)History-CAvg-Gain Average Percent GainHistory-CAvg-Loss Average Percent LossHistory-CAD-Ratio ratio of Advances to DeclinesHistory-HROC historical rate of change percentHistory-IROC instantaneous rate of change percentSome of the derivedfacts we provide
  22. 22. Situation Detection & Event GenerationContext SensitiveApplication ViewsContext SensitiveApplication ViewsIntegration withEvent ManagementIntegration withEvent ManagementBusiness ActivityDashboardsBusiness ActivityDashboardsBusiness EventProcessingBusiness EventProcessingCompoundEventCompoundEventCompoundEventCompoundEventPoliciesPolicies PoliciesPoliciesEvents&MetricsEvents&MetricsTriggerActionSendEventTriggerActionSendEventTriggerActionSendEventTriggerActionSendEvent22
  23. 23. Complex Event Processing CapabilitiesDecouples rule evaluation from physical event structureChanges to the event patterns or structure do not break rulesSimulations and replay can be accomplished easilyLive recording and replay of actual event feedsNo need for actual event sourcesRules can be tested with simulations before going liveWhite Board aids during design and development ofrules based on transient data (real-time events)Evaluations can be performed based on statistical computedbased on real-time feeds.
  24. 24. USE CASE: TREND ANALYSIS
  25. 25. Ways to detect performance trendsMeasure relevant application performance indicatorsOrders filled, failed, missedJMV GC activity, memory, I/OCreate a base line for each relevant indicator1-60 sampling for near real-time baseline1, 10, 15 min daily, weakly, monthly for short, long termbaselineSamples can range anywhere from 1-60 seconds depending onlevel of required resolutionApply analytics to determine trends and behaviorCan vary from simple to complexPrefer KISS approach (Keep It Simple and Stupid)
  26. 26. 3 Simple methods to detect trends(No complex math required)Bollinger BandsDetermine high and low bands based on available baselineDefines a normal channel which is typically within 2standard deviations from the meanCompute STDDEV, Mean, Current sample% ChangeSample to sample, day-to-day, week-to-week, etc.VelocityNumber of measured units per unit of time (example:response time drops from 10 to 20 seconds over 5 secinterval – means (20-10/5)=2 units/sec.
  27. 27. Typical UsageHigh BandGiven a set of metrics, alert when one or more are above High band forat least 2+ samplesIndication of abnormal activity over a period of timeCaution: abnormal can become the new normal% ChangeUseful indicator for near real-time monitoring of resources (such asheap, memory, CPU, storage)Useful indicator for long term trends (daily, weekly)VelocityVery useful for monitoring metrics that measure usage of resourcethat have a finite upper bound (memory, storage, table spaceetc.)Measuring velocity can help measure when upper limits can bereached
  28. 28. Required instrumentationData collectorsAttempt to collect all relevant indicators within the same time tickResponse time, GC activity, memory usage, CPU usageBuild a history for each collected metricEither in memory for near real-time analysisStorage for short, long term (min, hours, days)Pattern matching, analyticsNeed to scan and pattern match application metrics (such as find allapplications whose GC is above High Bollinger Band for 2+ samples)Run as a continuous query, which is executed as metrics are collectedand updatedActionable OutcomeAlerts, notifications, actionsVisualization, dashboards
  29. 29. Example: Monitoring Java Application by examining GC ActivityJava Application running in a standalone JVMcontainerMonitoring JVM GC (Garbage Collection) as abyproduct of application activitySample GC every 10 seconds# GC SamplesGC Duration (ms.)GC CPU Usage %Avg. GC CPU Usage (since JVM startup)JVM Heap Utilization %
  30. 30. Example 1: Java Application, Sudden Spike in Activity
  31. 31. Example 2: Java Application, Adjustment to new workload– The New Normal
  32. 32. Nastel Technologies ConfidentialResource Leak DetectionDetecting Leaks using Trend Analysis(Java Example)
  33. 33. Typical causes of Java leaksProgramming errors, bugsUnchecked array, list, hash map growthNot closing JDBC Prepared StatementsNot closing Sockets, File handlesThread leaks, handle leaksClass loader leaksResources allocated outside JVM
  34. 34. Leaking Chart Pattern – Detecting ResourceAccumulationVM Heap Usage %VM Heap Usage %
  35. 35. Detecting Resource Leaks using Momentum OscillatorLeak patterndetectedMomentum OscillatorTrending higherHeap not yet exhaustedMomentum Oscillator: values between 0-100, difference between the sum of allrecent gains and losses in the underlying metric. Value of 50 means that the netdifference of gains and losses is zero – 0 net gain and loss.
  36. 36. Conclusion: Monitoring Elastic EnvironmentsElastic Applications can’t be monitored using static modelsStatic thresholdsStatic data/transaction flow modelsComplex systems layered on top of complex systemsToo many constantly changing variablesMakes root cause analysis very difficultRequires extensive cross technology expertizePreferred approach – Holistic Application MonitoringGranular data collection:Application and infrastructure metricsAnalytics, automated base linesReal-time and historicalResource monitoring coupled with Transaction ProfilingVisualization that connects different teams:Application support, DevOps, IT Support
  37. 37. SEMCDe Post – La Poste37Some of our valued ClientsDelivering value since 1994Over 200 customersCustomerfor 7 yearsCustomerfor 10 yearsCustomerfor 11 years

×