Inside Azure Diagnostics
Pittsburgh Tech Fest
June 7th, 2014
17
COLUMBUS, OH OCTOBER 17, 2014 CLOUDDEVELOP.ORG
1 / The need for diagnostic data in cloud applications
2 / Data we can we monitor
3 / Using the Microsoft Azure Diagnostic...
node.js
C#
Java
Agile
- vs -
Waterfall
Diagnostics Data / Telemetry
Scenario
You’re kidding? Right?
Scenario
o
o
o
o
o
o
We have a problem
Resolution
o Step 0 – Enable Azure
diagnostics
• Set key performance
counters
o Step 1 – Add logging
statements around key...
o
o
o
o
o
o
o
o
worker roles
web roles
worker roles
web roles
Diagnostic Data – 4x

Diagnostic Item Table Name Blob Container Name
Windows Event Logs WADWindowsEventLogsTable
Performance Counters WADPerform...
1.
2.
3.
4.
5.
o
o Trace logs
o IIS logs
o Infrastructure logs
o No transfer
o OnStart()
o Overrides default
o diagnostics.wadcfg
o Overrid...
public override bool OnStart()
{
// Create the DiagnosticMonitorConfiguration object to use for configuring the monitoring...
Deployment ID
Declarative Configuration
using Visual Studio
1. wad-control-container
2. Imperative code
3. Declarative configuration
4. Default configuration
o
o
o
o
o
o
Instruct WAD to transfer specific data sources to storage
Overwrites current diagnostic configuration
http://msdn.microsof...
Additional host-level data – not DiagnosticAgent.exe
o
o
o
o
o
Query Azure Diagnostic
Data
o
o Vital information
o
o
o
o Day-to-day operational data
o
o
o
o
Compute node
resource usage
Windows Event
logs
Database
queries
response times
Application
specific
exceptions
Database
co...
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o http://bit.ly/1eomek9
o http://bit.ly/1mVHN3u
o http://bit.ly/1k1YkjI
o http://bit.ly/Q33mkU
o
http://bit.ly/1qp4omC
o
o
o
o
o
www.JustAzure.com
Inside Azure Diagnostics
Inside Azure Diagnostics
Inside Azure Diagnostics
Inside Azure Diagnostics
Inside Azure Diagnostics
Inside Azure Diagnostics
Upcoming SlideShare
Loading in...5
×

Inside Azure Diagnostics

1,542

Published on

** Session from Pittsburgh Tech Fest - June 2014 **

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,542
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Cloud Services (Web/Worker Roles)
  • Successful projects share one common trait . . Not what you might think
    Latest hot language
    Hot platform
    Smartest people
    Agile vs. waterfall
    Money


    http://assets.bitnami.com/assets/windows_azure_logo-metro.png
    http://technologiesreview.com/wp-content/uploads/2011/02/AWS_LOGO_CMYK.png
    http://www.istockphoto.com/stock-photo-35165202-portrait-of-male-college-student.php?st=73c78c9
  • The #1 problem I see over & over
  • Multiple servers – more difficult to handle
    Keep locally? Hard
    What if a server dies?
    Need a central location
  • Configure in Visual Studio
    Show the declarative way
    Show were the file is located – bin and root for Web and Worker respective (D:ProjectsDemosJustAzureAzureDiagnosticsAzureDiagnosticscsxRelease
    olesWorkerRole1approot)

    Show in storage – show using AMS.
    Show the file in blob storage (pghtechfest14)
  • Can you spot the potential problem?
  • http://azure.microsoft.com/en-us/documentation/articles/cloud-services-how-to-monitor/
  • Show viewing data in Visual Studio
    Show LinqPad
    Show AMS
  • Log all calls to external services
    Include as much detail as possible (destination, method, timing info, result, etc.)
    Log details of transient faults
    Number of retry actions
    Cause of the fault
    Did the application fail over to a secondary instance?
    Detect an emerging problem!
    Partition telemetry data by date (or hour) – reduce impact of data aggregation or reporting
    Use a different storage account!
    Remove old / non-relevant telemetry data
  • Detect before issues impact your users
    Poll data sources, monitor, and alert
    Centralized repository

    Transient vs. Systemic
    Transient: SQL Database throttling
    Systemic: bug in the code; no retries will fix

    Recover First
    Right data can help speed up the process . . . Even with Microsoft support
    Your problem . . . Your solution

    Root Cause Analysis
    What, why, and how to fix going forward
  • We Don’t Know What We Don’t Know
    Incredibly hard to find problems solely by looking at code

    Preemptive vs. Reactionary
    Regular analysis of telemetry data can help find problems before they become severe.

    Recovery & Root Cause Analysis
    What is failing?
    Are we making it better or worse?
    What caused this problem in the first place?


    http://blogs.msdn.com/b/windowsazure/archive/2012/03/09/summary-of-windows-azure-service-disruption-on-feb-29th-2012.aspx
  • http://blogs.msdn.com/b/windowsazure/archive/2012/03/09/summary-of-windows-azure-service-disruption-on-feb-29th-2012.aspx
  • http://blogs.msdn.com/b/windowsazure/archive/2012/03/09/summary-of-windows-azure-service-disruption-on-feb-29th-2012.aspx
  • 7 day s
    Co-admin
  • ISO 8601 standard for duration - http://en.wikipedia.org/wiki/ISO_8601
  • 7 day s
    Co-admin
  • 7 day s
    Co-admin
  • Inside Azure Diagnostics

    1. 1. Inside Azure Diagnostics Pittsburgh Tech Fest June 7th, 2014
    2. 2. 17 COLUMBUS, OH OCTOBER 17, 2014 CLOUDDEVELOP.ORG
    3. 3. 1 / The need for diagnostic data in cloud applications 2 / Data we can we monitor 3 / Using the Microsoft Azure Diagnostic Agent 4 / Real-world guidance for troubleshooting Microsoft Azure apps
    4. 4. node.js C# Java Agile - vs - Waterfall
    5. 5. Diagnostics Data / Telemetry
    6. 6. Scenario
    7. 7. You’re kidding? Right?
    8. 8. Scenario o o o o o o We have a problem
    9. 9. Resolution o Step 0 – Enable Azure diagnostics • Set key performance counters o Step 1 – Add logging statements around key functionality • Especially external services o Step 3 – Test, test, test o Step 4 – Analyze o Step 5 – Fix it Scenario o o o o o o
    10. 10. o o o o o o o o
    11. 11. worker roles web roles
    12. 12. worker roles web roles Diagnostic Data – 4x
    13. 13.
    14. 14. Diagnostic Item Table Name Blob Container Name Windows Event Logs WADWindowsEventLogsTable Performance Counters WADPerformanceCountersTable Trace Log Statements WADLogsTable Azure Diagnostic Infrastructure Logs WADDiagnosticInfrastructureLogs Custom Logs (i.e. log4net, NLog, etc.) <custom> IIS Logs WADDirectoriesTable* wad-iis-logfiles IIS Failed Request Logs WADDirectoriesTable* wad-iis-failedreqlogfiles Crash Dumps WADDirectoriesTable* * Location of the blob log file is specified in the Container field and name of the blob in the RelativePath field. The AbsolutePath field contains the name of the file as it existed on the role instance.
    15. 15. 1. 2. 3. 4. 5. o
    16. 16. o Trace logs o IIS logs o Infrastructure logs o No transfer o OnStart() o Overrides default o diagnostics.wadcfg o Overrides imperative
    17. 17. public override bool OnStart() { // Create the DiagnosticMonitorConfiguration object to use for configuring the monitoring agent. DiagnosticMonitorConfiguration config = DiagnosticMonitor.GetDefaultInitialConfiguration(); // Performance Counter configuration config.PerformanceCounters.DataSources.Add(new PerformanceCounterConfiguration { CounterSpecifier = @"Processor(_Total)% Processor Time", SampleRate = TimeSpan.FromSeconds(30) }); config.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromMinutes(1); // Log configuration config.Logs.ScheduledTransferLogLevelFilter = LogLevel.Information; config.Logs.ScheduledTransferPeriod = TimeSpan.FromMinutes(1); // Event Log configuration config.WindowsEventLog.DataSources.Add("Application!*"); config.WindowsEventLog.DataSources.Add("System!*"); config.WindowsEventLog.ScheduledTransferLogLevelFilter = LogLevel.Warning; config.WindowsEventLog.ScheduledTransferPeriod = TimeSpan.FromMinutes(1); // Start the diagnostic monitor with the new configuration DiagnosticMonitor.Start("Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString", config); return base.OnStart(); } Impacts local agent only!
    18. 18. Deployment ID
    19. 19. Declarative Configuration using Visual Studio
    20. 20. 1. wad-control-container 2. Imperative code 3. Declarative configuration 4. Default configuration
    21. 21. o o o o o o
    22. 22. Instruct WAD to transfer specific data sources to storage Overwrites current diagnostic configuration http://msdn.microsoft.com/en-us/library/gg433075.aspx
    23. 23. Additional host-level data – not DiagnosticAgent.exe
    24. 24. o o o o o
    25. 25. Query Azure Diagnostic Data
    26. 26. o o Vital information o o o o Day-to-day operational data o o o o
    27. 27. Compute node resource usage Windows Event logs Database queries response times Application specific exceptions Database connection & cmd failures Microsoft Azure Storage Analytics Process for Azure hosted solutions is not that different from traditional, on-premises solutions.
    28. 28. o o o o o o
    29. 29. o o o o o o o o
    30. 30. o o o o
    31. 31. o http://bit.ly/1eomek9 o http://bit.ly/1mVHN3u o http://bit.ly/1k1YkjI o http://bit.ly/Q33mkU o http://bit.ly/1qp4omC
    32. 32. o o o o o www.JustAzure.com
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×