Inside Azure Diagnostics


Published on

** Session from Pittsburgh Tech Fest - June 2014 **

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Cloud Services (Web/Worker Roles)
  • Successful projects share one common trait . . Not what you might think
    Latest hot language
    Hot platform
    Smartest people
    Agile vs. waterfall
  • The #1 problem I see over & over
  • Multiple servers – more difficult to handle
    Keep locally? Hard
    What if a server dies?
    Need a central location
  • Configure in Visual Studio
    Show the declarative way
    Show were the file is located – bin and root for Web and Worker respective (D:ProjectsDemosJustAzureAzureDiagnosticsAzureDiagnosticscsxRelease

    Show in storage – show using AMS.
    Show the file in blob storage (pghtechfest14)
  • Can you spot the potential problem?
  • Show viewing data in Visual Studio
    Show LinqPad
    Show AMS
  • Log all calls to external services
    Include as much detail as possible (destination, method, timing info, result, etc.)
    Log details of transient faults
    Number of retry actions
    Cause of the fault
    Did the application fail over to a secondary instance?
    Detect an emerging problem!
    Partition telemetry data by date (or hour) – reduce impact of data aggregation or reporting
    Use a different storage account!
    Remove old / non-relevant telemetry data
  • Detect before issues impact your users
    Poll data sources, monitor, and alert
    Centralized repository

    Transient vs. Systemic
    Transient: SQL Database throttling
    Systemic: bug in the code; no retries will fix

    Recover First
    Right data can help speed up the process . . . Even with Microsoft support
    Your problem . . . Your solution

    Root Cause Analysis
    What, why, and how to fix going forward
  • We Don’t Know What We Don’t Know
    Incredibly hard to find problems solely by looking at code

    Preemptive vs. Reactionary
    Regular analysis of telemetry data can help find problems before they become severe.

    Recovery & Root Cause Analysis
    What is failing?
    Are we making it better or worse?
    What caused this problem in the first place?
  • 7 day s
  • ISO 8601 standard for duration -
  • 7 day s
  • 7 day s
  • Inside Azure Diagnostics

    1. 1. Inside Azure Diagnostics Pittsburgh Tech Fest June 7th, 2014
    3. 3. 1 / The need for diagnostic data in cloud applications 2 / Data we can we monitor 3 / Using the Microsoft Azure Diagnostic Agent 4 / Real-world guidance for troubleshooting Microsoft Azure apps
    4. 4. node.js C# Java Agile - vs - Waterfall
    5. 5. Diagnostics Data / Telemetry
    6. 6. Scenario
    7. 7. You’re kidding? Right?
    8. 8. Scenario o o o o o o We have a problem
    9. 9. Resolution o Step 0 – Enable Azure diagnostics • Set key performance counters o Step 1 – Add logging statements around key functionality • Especially external services o Step 3 – Test, test, test o Step 4 – Analyze o Step 5 – Fix it Scenario o o o o o o
    10. 10. o o o o o o o o
    11. 11. worker roles web roles
    12. 12. worker roles web roles Diagnostic Data – 4x
    13. 13.
    14. 14. Diagnostic Item Table Name Blob Container Name Windows Event Logs WADWindowsEventLogsTable Performance Counters WADPerformanceCountersTable Trace Log Statements WADLogsTable Azure Diagnostic Infrastructure Logs WADDiagnosticInfrastructureLogs Custom Logs (i.e. log4net, NLog, etc.) <custom> IIS Logs WADDirectoriesTable* wad-iis-logfiles IIS Failed Request Logs WADDirectoriesTable* wad-iis-failedreqlogfiles Crash Dumps WADDirectoriesTable* * Location of the blob log file is specified in the Container field and name of the blob in the RelativePath field. The AbsolutePath field contains the name of the file as it existed on the role instance.
    15. 15. 1. 2. 3. 4. 5. o
    16. 16. o Trace logs o IIS logs o Infrastructure logs o No transfer o OnStart() o Overrides default o diagnostics.wadcfg o Overrides imperative
    17. 17. public override bool OnStart() { // Create the DiagnosticMonitorConfiguration object to use for configuring the monitoring agent. DiagnosticMonitorConfiguration config = DiagnosticMonitor.GetDefaultInitialConfiguration(); // Performance Counter configuration config.PerformanceCounters.DataSources.Add(new PerformanceCounterConfiguration { CounterSpecifier = @"Processor(_Total)% Processor Time", SampleRate = TimeSpan.FromSeconds(30) }); config.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromMinutes(1); // Log configuration config.Logs.ScheduledTransferLogLevelFilter = LogLevel.Information; config.Logs.ScheduledTransferPeriod = TimeSpan.FromMinutes(1); // Event Log configuration config.WindowsEventLog.DataSources.Add("Application!*"); config.WindowsEventLog.DataSources.Add("System!*"); config.WindowsEventLog.ScheduledTransferLogLevelFilter = LogLevel.Warning; config.WindowsEventLog.ScheduledTransferPeriod = TimeSpan.FromMinutes(1); // Start the diagnostic monitor with the new configuration DiagnosticMonitor.Start("Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString", config); return base.OnStart(); } Impacts local agent only!
    18. 18. Deployment ID
    19. 19. Declarative Configuration using Visual Studio
    20. 20. 1. wad-control-container 2. Imperative code 3. Declarative configuration 4. Default configuration
    21. 21. o o o o o o
    22. 22. Instruct WAD to transfer specific data sources to storage Overwrites current diagnostic configuration
    23. 23. Additional host-level data – not DiagnosticAgent.exe
    24. 24. o o o o o
    25. 25. Query Azure Diagnostic Data
    26. 26. o o Vital information o o o o Day-to-day operational data o o o o
    27. 27. Compute node resource usage Windows Event logs Database queries response times Application specific exceptions Database connection & cmd failures Microsoft Azure Storage Analytics Process for Azure hosted solutions is not that different from traditional, on-premises solutions.
    28. 28. o o o o o o
    29. 29. o o o o o o o o
    30. 30. o o o o
    31. 31. o o o o o
    32. 32. o o o o o
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.