No, this session is not about greener IT. Learn about using the RoleEnvironment and diagnostics provided by Windows Azure. Communication between roles, logging and automatic upscaling of your application are just some of the possibilities of what you can do if you know about how the Windows Azure environment works.
1. Taking Care of a Cloud Environment: Windows Azure Maarten BalliauwRealDolmen @maartenballiauwhttp://blog.maartenballiauw.be
2. Who am I? Maarten Balliauw Antwerp, Belgium www.realdolmen.com Focus on web ASP.NET, ASP.NET MVC, PHP, Azure, VSTS, … MVP ASP.NET http://blog.maartenballiauw.be http://twitter.com/maartenballiauw
3. Agenda Windows Azure Environment Fabric Controller Windows Azure Guest OS Fabric Agent Diagnostic Monitor Interacting with the Environment Interacting with the Fabric Monitoring and Diagnostics Management API Bringing it all together: Automatic scaling Takeaways Q&A
6. Fabric Controller Communicates with every server within the Fabric Interacts with a “Fabric Agent” on each machine Monitors every VM, application and instance Service Management is performed by the Fabric Controller
7. Fabric Controller Manages the life cycle of Azure services Allocates resources Provisioning Deployment Monitoring Manages VM’s and physical machines in the fabric Based on a state machine 1 heartbeat = comparing services’ goal states with the current node states, tries to move node to goal state if possible
8. Fabric Controller Resource allocation based on # update and fault domains OS features/versions Network channels Available load balancers Resource allocation is transactional Deployments and upgrades Automatically Optional: manual through service portal Maintenance Standard health and failure monitoring Reported by Fabric Agent Discovered by Fabric Controller
9. Networking VIP automatically registered in load balancers Load balancer traffic only to role instances in goal state Upgrades can be done by VIP swap Web Role VIP Web Role
11. Windows Azure Environment Fabric Controller Virtual machine Windows Azure Guest OS (http://bit.ly/aZqSdp) Fabric agent Diagnostic monitor Your web/worker role instance
12. Windows Azure Guest OS Based on Windows Server 2008 Enterprise 3 current versions Windows Azure Guest OS 1.0 (Release 200912-01) Windows Azure Guest OS 1.1 (Release 201001-01) Windows Azure Guest OS 1.2 (Release 201003-01) Similar environment as W2K8 server Filesystem Performance counters Event logs ... <ServiceConfigurationserviceName="service-name"osVersion="WA-GUEST-OS-1.2_201003-01">
13. Fabric Agent Runs on every node Separate process Reports current instance’s operational status to FC Goal state Failures Health
14. Diagnostic Monitor Runs on every node Separate process Performs automatic and on-demand diagnostics transfer
16. Interacting with the Fabric Trough Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment What it provides... Read the deployment id Read configuration values from ServiceConfiguration.cscfg Get references to local resources Request a recycle of the role instance Capture events Configuration changes Status checks (where FC checks FA) Get the current role instance And a list of all the other role instances in the current role And even a list of all the roles in the deployment (i.e. other web/worker roles)
17. Use Cases Read the deployment id Can be used to use the Management API Read configuration values Configure your application through ServiceConfiguration.cscfg Allow your configuration to be modified through Windows Azure portal Get references to local resources Local, temporary storage on a role instance Use for caching data Use for temporary file processing ... Request a recycle of the role instance i.e. after a configuration change or a specific event
18. Use Cases Capture events RoleEnvironment_Changing and RoleEnvironment_Changed Respond to changes in the Environment Configuration change Topology changes RoleEnvironment_StatusCheck Inform fabric controller of the current state Intercept FA status reporting Implement your own status reporting conditions “SetBusy” RoleEnvironment_Stopping What to do when the current role is stopping? I.e. unmount of drives, resource cleanup, ...
19. Use Cases Get all role instances in the current role Status checks Know about endpoints Inter-role communication?
21. Inter-Role Communication Scenario: chat application Users get connected to different worker roles Worker roles should relay messages to other users Implement separate worker roles Internal endpoint Looping other roles and relaying
23. Diagnostics: Single Server vs. the Cloud Single Server Static Environment Single well-known instance Traceable local transactions Local Access Feasible All in one TS session Data & tools co-located In-Place Changes Cloud Dynamic Environment Multi-instance, elastic capacity Distributed transactions Local Access Infeasible Many nodes Distributed, scaled-out data Service Upgrades
24. Monitoring and Diagnostics Trough Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitor What it provides... API for monitoring & data collection for cloud apps Support standard diagnostics API Manage all role instances or one specific instance Scalable: built on WA storage and used by WA components Developer in control
25. Windows Azure Diagnostics Configuration Role Instance Role Data collection (traces, logs, crash dumps) Quota enforcement Diagnostic Monitor Local directory storage Windows Data Sources IIS Logs & Failed Request Logs Perf Counters Windows Event Logs
26. Windows Azure Diagnostics Request upload Role Instance Windows Azure Storage Role Diagnostic Monitor Local directory storage Windows Data Sources Scheduled or on-demand upload
28. Development Fabric Windows Azure Diagnostics Windows Azure Hosted Service Diagnostic Manager Some diagnostics application Controller Code Configure
29. Feature Summary Local data buffering Configurable trace, perf counter, Windows event log, IIS log & file buffering Local buffer quota management Query & modify config from the cloud or from the desktop per role instance Transfer to WA Storage Scheduled & on-demand Filter by data type, verbosity & time range Transfer completion notification Query & modify from the cloud and from the desktop per role instance
31. Sample: Activate WA Diagnostics // This is done for you automatically by // Windows Azure Tools for Visual Studio // Add a reference to Microsoft.WindowsAzure.Diagnostics usingMicrosoft.WindowsAzure.Diagnostics; // Activate diagnostics in the role's OnStart() method publicoverrideboolOnStart() { // Use the connection string contained in the // application configuration setting named // "DiagnosticsConnectionString” // If the value of this setting is // "UseDevelopmentStorage=true" then will use dev stg DiagnosticMonitor.Start("DiagnosticsConnectionString"); ... }
32. Sample: Web.Config Changes <!– This is automatically inserted by VS.The listener routes System.Diagnostics.Trace messages to Windows Azure Diagnostics. --> <system.diagnostics> <trace> <listeners> <addtype="Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener, Microsoft.WindowsAzure.Diagnostics, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" name="AzureDiagnostics"> <filtertype="" /> </add> </listeners> </trace> </system.diagnostics>
33. Sample: Generate Diagnostics Data stringmyRoleInstanceName = RoleEnvironment.CurrentRoleInstance.Id; // Trace with standard .Net tracing APIs System.Diagnostics.Trace.TraceInformation( "Informational trace from " + myRoleInstanceName); // Capture full crash dumps CrashDumps.EnableCollection(true); // Capture mini crash dumps CrashDumps.EnableCollection(false);
34. Sample: Enable Local Data Buffering // Managed traces, IIS logs, failed request logs, // crashdumps and WA diags internal logs are buffered // in local storage by default. Other data sources must be // added explicitly DiagnosticMonitorConfigurationdiagConfig = DiagnosticMonitor.GetDefaultInitialConfiguration(); // Add performance counter monitoring PerformanceCounterConfigurationprocTimeConfig = new PerformanceCounterConfiguration(); // Run typeperf.exe /q to query for counter names procTimeConfig.CounterSpecifier = @"rocessor(*) Processor Time"; procTimeConfig.SampleRate = System.TimeSpan.FromSeconds(1.0); diagConfig.PerformanceCounters.DataSources.Add(procTimeConfig); // Continued on next slide...
35. Sample: Enable Local Data Buffering // Continued from previous slide... // Add event collection from the Windows Event Log // Syntax: <Channel>!<xpath query> // http://msdn.microsoft.com/en-us/library/dd996910(VS.85).aspx diagConfig.WindowsEventLog.DataSources.Add("System!*"); // Restart diagnostics with this custom local buffering // configuration DiagnosticMonitor.Start( "DiagnosticsConnectionString", diagConfig);
36. Sample: Web.Config Changes <!-- You can optionally enable IIS failed request tracing. This has some performance overhead A service upgrade is required to toggle this setting. --> <system.webServer> <tracing> <traceFailedRequests> <addpath="*"> <traceAreas> <addprovider="ASP"verbosity="Verbose" /> <addprovider="ASPNET" areas="Infrastructure,Module,Page,AppService" verbosity="Verbose" /> <addprovider="ISAPI Extension"verbosity="Verbose"/> <addprovider="WWW Server"verbosity="Verbose"/> </traceAreas> <failureDefinitionsstatusCodes="200-599"/> </add> </traceFailedRequests> </tracing> </system.webServer>
37. Sample: Scheduled Data Transfer // Start off with the default initial configuration DiagnosticMonitorConfiguration dc = DiagnosticMonitor.GetDefaultInitialConfiguration(); dc.WindowsEventLog.DataSources.Add("Application!*"); dc.WindowsEventLog.ScheduledTransferPeriod = System.TimeSpan.FromMinutes(5.0); DiagnosticMonitor.Start("DiagnosticsConnectionString", dc);
38. Sample: On-Demand Data Transfer // On-Demand transfer of buffered files. // This code can live in the role, or on the desktop, // or even in another service. varddm = newDeploymentDiagnosticManager( storageAccount, deploymentID); varridm = ddm.GetRoleInstanceDiagnosticManager( roleName, roleInstanceName); vardataBuffersToTransfer = DataBufferName.Logs; OnDemandTransferOptionstransferOptions = newOnDemandTransferOptions(); transferOptions.From = DateTime.MinValue; transferOptions.To = DateTime.UtcNow; transferOptions.LogLevelFilter = LogLevel.Critical; GuidrequestID = ridm.BeginOnDemandTransfer( dataBuffersToTransfer, transferOptions);
40. Storage Considerations Standard WA Storage costs apply for transactions, storage & bandwidth Data Retention Local buffers are aged out by the Diagnostic Monitor according to configurable quotas You control data retention for data in table/blob storage You should manage cleanup of this! Query Performance on Tabular Data Partitioned by high-order bits of the tick count Query by time is efficient Filter by verbosity level at transfer time
41. Common Diagnostic Tasks Performance measurement Resource usage Troubleshooting and debugging Problem detection Quality of Service Metrics Capacity planning Traffic analysis (users, views, peak times) Billing Auditing
43. Management API Trough Microsoft.Samples.WindowsAzure.Management.* What it provides... X509 client certificates for authentication View, create, delete, swap, … deployments Edit configuration (and change instance count) List and view properties for hosted services, storage accounts and affinity groups Also exists as PowerShell scripts Msbuild tasks (CI & auto-deploy anyone?)
46. Auto-Scaling As easy as doing this? Unfortunately: no… “When” should it scale? “How” should it scale? “Who” / “What” is responsible for scaling? <InstancesminInstances="3" maxInstances="10" />
47. Auto-Scaling – “When” Different for every application Based on performance counters Based on queue length / workload Based on the weather? Weight of metrics Trends in metric data Answer: “Sensors” “Scaling logic provider”
48. Auto-Scaling - Sensors Sensors provide metrics and trend Performance counter Queue length Custom
49. Auto-Scaling – “How” Average topology change takes up to 15 minutes What if your load goes up too fast? Weight of metrics Trends in metric data Answer: “Scaling logic provider”
50. Auto-Scaling – Scaling logic Scaling logic provider uses sensor data to suggest an action (up/fast-up/down/stable) To implement per application Just a suggestion!
51. Auto-Scaling – “Who”/”What” A dedicated server / worker role? At least two workers for WA SLA: costs! The application itself? Master election (which role instance responsible?) Answer: Approach will differ per application…
52. Auto-Scaling - Responsabilities More approaches are possible Dedicated worker On-premise monitoring app The app itself Master election based on RoleEnvironment.Roles
54. Auto-Scaling Demo Scaling based on custom sensor # users logged in Monitoring done by the app itself Which brings everything together: Master election Role Environment Performance counters Diagnostics API Queue length Storage API Scaling (changing # instances in config) Management API
56. Takeaways Windows Azure Environment components Fabric Controller Windows Azure Guest OS Fabric Agent Diagnostic Monitor All components provide interaction Interacting with the Fabric Monitoring and Diagnostics Management API Bringing it all together gives you power!