Taking care of a cloud environment


Published on

No, this session is not about greener IT. Learn about using the RoleEnvironment and diagnostics provided by Windows Azure. Communication between roles, logging and automatic upscaling of your application are just some of the possibilities of what you can do if you know about how the Windows Azure environment works.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • http://snarfed.org/space/windows_azure_detailshttp://azure.snagy.name/blog/?p=93
  • Taking care of a cloud environment

    1. 1. Taking Care of a Cloud Environment: Windows Azure<br />Maarten BalliauwRealDolmen<br />@maartenballiauwhttp://blog.maartenballiauw.be<br />
    2. 2. Who am I?<br />Maarten Balliauw<br />Antwerp, Belgium<br />www.realdolmen.com<br />Focus on web<br />ASP.NET, ASP.NET MVC, PHP, Azure, VSTS, …<br />MVP ASP.NET<br />http://blog.maartenballiauw.be<br />http://twitter.com/maartenballiauw<br />
    3. 3. Agenda<br />Windows Azure Environment<br />Fabric Controller<br />Windows Azure Guest OS <br />Fabric Agent<br />Diagnostic Monitor<br />Interacting with the Environment<br />Interacting with the Fabric<br />Monitoring and Diagnostics<br />Management API<br />Bringing it all together: Automatic scaling<br />Takeaways<br />Q&A<br />
    4. 4. Windows Azure Environment<br />Where will my application live?<br />
    5. 5. Windows Azure environment<br />
    6. 6. Fabric Controller<br />Communicates with every server within the Fabric<br />Interacts with a “Fabric Agent” on each machine<br />Monitors every VM, application and instance<br />Service Management is performed by the Fabric Controller<br />
    7. 7. Fabric Controller<br />Manages the life cycle of Azure services<br />Allocates resources<br />Provisioning<br />Deployment<br />Monitoring<br />Manages VM’s and physical machines in the fabric<br />Based on a state machine<br />1 heartbeat = comparing services’ goal states with the current node states, tries to move node to goal state if possible<br />
    8. 8. Fabric Controller<br />Resource allocation based on<br /># update and fault domains<br />OS features/versions<br />Network channels<br />Available load balancers<br />Resource allocation is transactional<br />Deployments and upgrades<br />Automatically<br />Optional: manual through service portal<br />Maintenance<br />Standard health and failure monitoring<br />Reported by Fabric Agent<br />Discovered by Fabric Controller<br />
    9. 9. Networking<br />VIP automatically registered in load balancers<br />Load balancer traffic only to role instances in goal state<br />Upgrades can be done by VIP swap<br />Web Role<br />VIP<br />Web Role<br />
    10. 10. Windows Azure Environment<br />
    11. 11. Windows Azure Environment<br />Fabric Controller<br />Virtual machine<br />Windows Azure Guest OS (http://bit.ly/aZqSdp) <br />Fabric agent<br />Diagnostic monitor<br />Your web/worker role instance<br />
    12. 12. Windows Azure Guest OS<br />Based on Windows Server 2008 Enterprise<br />3 current versions<br />Windows Azure Guest OS 1.0 (Release 200912-01)<br />Windows Azure Guest OS 1.1 (Release 201001-01)<br />Windows Azure Guest OS 1.2 (Release 201003-01)<br />Similar environment as W2K8 server<br />Filesystem<br />Performance counters<br />Event logs<br />...<br /><ServiceConfigurationserviceName="service-name"osVersion="WA-GUEST-OS-1.2_201003-01"><br />
    13. 13. Fabric Agent<br />Runs on every node<br />Separate process<br />Reports current instance’s operational status to FC<br />Goal state<br />Failures<br />Health<br />
    14. 14. Diagnostic Monitor<br />Runs on every node<br />Separate process<br />Performs automatic and on-demand diagnostics transfer<br />
    15. 15. Interacting with the Fabric<br />What can I do to make the most out of it?<br />
    16. 16. Interacting with the Fabric<br />Trough Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment<br />What it provides...<br />Read the deployment id<br />Read configuration values from ServiceConfiguration.cscfg<br />Get references to local resources<br />Request a recycle of the role instance<br />Capture events<br />Configuration changes<br />Status checks (where FC checks FA)<br />Get the current role instance<br />And a list of all the other role instances in the current role<br />And even a list of all the roles in the deployment (i.e. other web/worker roles)<br />
    17. 17. Use Cases<br />Read the deployment id<br />Can be used to use the Management API<br />Read configuration values<br />Configure your application through ServiceConfiguration.cscfg<br />Allow your configuration to be modified through Windows Azure portal<br />Get references to local resources<br />Local, temporary storage on a role instance<br />Use for caching data<br />Use for temporary file processing<br />...<br />Request a recycle of the role instance<br />i.e. after a configuration change or a specific event<br />
    18. 18. Use Cases<br />Capture events<br />RoleEnvironment_Changing and RoleEnvironment_Changed<br />Respond to changes in the Environment<br />Configuration change<br />Topology changes<br />RoleEnvironment_StatusCheck<br />Inform fabric controller of the current state<br />Intercept FA status reporting<br />Implement your own status reporting conditions<br />“SetBusy”<br />RoleEnvironment_Stopping<br />What to do when the current role is stopping?<br />I.e. unmount of drives, resource cleanup, ...<br />
    19. 19. Use Cases<br />Get all role instances in the current role<br />Status checks<br />Know about endpoints<br />Inter-role communication?<br />
    20. 20. Inter-Role Communication<br />Demo<br />
    21. 21. Inter-Role Communication<br />Scenario: chat application<br />Users get connected to different worker roles<br />Worker roles should relay messages to other users<br />Implement separate worker roles<br />Internal endpoint<br />Looping other roles and relaying<br />
    22. 22. Monitoring and Diagnostics<br />What is my application doing?<br />
    23. 23. Diagnostics: Single Server vs. the Cloud<br />Single Server<br />Static Environment<br />Single well-known instance<br />Traceable local transactions<br />Local Access Feasible<br />All in one TS session<br />Data & tools co-located<br />In-Place Changes<br />Cloud<br />Dynamic Environment<br />Multi-instance, elastic capacity<br />Distributed transactions<br />Local Access Infeasible<br />Many nodes<br />Distributed, scaled-out data<br />Service Upgrades<br />
    24. 24. Monitoring and Diagnostics<br />Trough Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitor<br />What it provides...<br />API for monitoring & data collection for cloud apps<br />Support standard diagnostics API<br />Manage all role instances or one specific instance<br />Scalable: built on WA storage and used by WA components<br />Developer in control<br />
    25. 25. Windows Azure Diagnostics<br />Configuration<br />Role Instance<br />Role<br />Data collection<br />(traces, logs, crash dumps)<br />Quota enforcement <br />Diagnostic Monitor<br />Local directory storage<br />Windows Data Sources<br />IIS Logs & Failed Request Logs<br />Perf Counters<br />Windows Event Logs<br />
    26. 26. Windows Azure Diagnostics<br />Request upload<br />Role Instance<br />Windows Azure Storage<br />Role<br />Diagnostic Monitor<br />Local directory storage<br />Windows Data Sources<br />Scheduled or on-demand upload<br />
    27. 27. Windows Azure Diagnostics<br />Development<br />Fabric<br />Windows Azure<br />Hosted Service<br />
    28. 28. Development<br />Fabric<br />Windows Azure Diagnostics<br />Windows Azure<br />Hosted Service<br />Diagnostic Manager<br />Some diagnostics application<br />Controller Code<br />Configure<br />
    29. 29. Feature Summary<br />Local data buffering<br />Configurable trace, perf counter, Windows event log, IIS log & file buffering<br />Local buffer quota management<br />Query & modify config from the cloud or from the desktop per role instance<br />Transfer to WA Storage<br />Scheduled & on-demand<br />Filter by data type, verbosity & time range<br />Transfer completion notification<br />Query & modify from the cloud and from the desktop per role instance<br />
    30. 30. Feature Matrix<br />
    31. 31. Sample: Activate WA Diagnostics<br />// This is done for you automatically by <br />// Windows Azure Tools for Visual Studio<br />// Add a reference to Microsoft.WindowsAzure.Diagnostics<br />usingMicrosoft.WindowsAzure.Diagnostics;<br /> <br />// Activate diagnostics in the role's OnStart() method<br />publicoverrideboolOnStart()<br />{<br /> // Use the connection string contained in the <br /> // application configuration setting named <br /> // "DiagnosticsConnectionString” <br /> // If the value of this setting is <br /> // "UseDevelopmentStorage=true" then will use dev stg<br />DiagnosticMonitor.Start("DiagnosticsConnectionString");<br /> ...<br />}<br />
    32. 32. Sample: Web.Config Changes<br /><!–<br /> This is automatically inserted by VS.The listener routes <br />System.Diagnostics.Trace messages to <br /> Windows Azure Diagnostics.<br />--><br /><system.diagnostics><br /> <trace><br /> <listeners><br /> <addtype="Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener, Microsoft.WindowsAzure.Diagnostics, Version=, Culture=neutral, PublicKeyToken=31bf3856ad364e35" name="AzureDiagnostics"><br /> <filtertype="" /><br /> </add><br /> </listeners><br /> </trace><br /></system.diagnostics><br />
    33. 33. Sample: Generate Diagnostics Data<br />stringmyRoleInstanceName =<br />RoleEnvironment.CurrentRoleInstance.Id;<br />// Trace with standard .Net tracing APIs<br />System.Diagnostics.Trace.TraceInformation(<br /> "Informational trace from " + myRoleInstanceName);<br /> <br />// Capture full crash dumps<br />CrashDumps.EnableCollection(true);<br />// Capture mini crash dumps<br />CrashDumps.EnableCollection(false);<br />
    34. 34. Sample: Enable Local Data Buffering<br />// Managed traces, IIS logs, failed request logs, <br />// crashdumps and WA diags internal logs are buffered <br />// in local storage by default. Other data sources must be <br />// added explicitly<br />DiagnosticMonitorConfigurationdiagConfig = <br />DiagnosticMonitor.GetDefaultInitialConfiguration();<br />// Add performance counter monitoring<br />PerformanceCounterConfigurationprocTimeConfig = new<br />PerformanceCounterConfiguration();<br />// Run typeperf.exe /q to query for counter names<br />procTimeConfig.CounterSpecifier = <br /> @"Processor(*) Processor Time";<br />procTimeConfig.SampleRate = System.TimeSpan.FromSeconds(1.0);<br />diagConfig.PerformanceCounters.DataSources.Add(procTimeConfig);<br />// Continued on next slide...<br />
    35. 35. Sample: Enable Local Data Buffering<br />// Continued from previous slide...<br /> <br />// Add event collection from the Windows Event Log<br />// Syntax: <Channel>!<xpath query> <br />// http://msdn.microsoft.com/en-us/library/dd996910(VS.85).aspx <br />diagConfig.WindowsEventLog.DataSources.Add("System!*");<br />// Restart diagnostics with this custom local buffering <br />// configuration<br />DiagnosticMonitor.Start(<br /> "DiagnosticsConnectionString", <br />diagConfig);<br />
    36. 36. Sample: Web.Config Changes<br /><!--<br /> You can optionally enable IIS failed request tracing.<br /> This has some performance overhead<br /> A service upgrade is required to toggle this setting.<br />--><br /><system.webServer><br /> <tracing><br /> <traceFailedRequests><br /> <addpath="*"><br /> <traceAreas><br /> <addprovider="ASP"verbosity="Verbose" /><br /> <addprovider="ASPNET"<br />areas="Infrastructure,Module,Page,AppService"<br />verbosity="Verbose" /><br /> <addprovider="ISAPI Extension"verbosity="Verbose"/><br /> <addprovider="WWW Server"verbosity="Verbose"/><br /> </traceAreas><br /> <failureDefinitionsstatusCodes="200-599"/><br /> </add><br /> </traceFailedRequests><br /> </tracing><br /></system.webServer><br />
    37. 37. Sample: Scheduled Data Transfer<br />// Start off with the default initial configuration<br />DiagnosticMonitorConfiguration dc =<br />DiagnosticMonitor.GetDefaultInitialConfiguration();<br />dc.WindowsEventLog.DataSources.Add("Application!*");<br />dc.WindowsEventLog.ScheduledTransferPeriod = <br />System.TimeSpan.FromMinutes(5.0);<br />DiagnosticMonitor.Start("DiagnosticsConnectionString", dc);<br />
    38. 38. Sample: On-Demand Data Transfer<br />// On-Demand transfer of buffered files.<br />// This code can live in the role, or on the desktop,<br />// or even in another service.<br />varddm = newDeploymentDiagnosticManager(<br />storageAccount, <br />deploymentID);<br />varridm = ddm.GetRoleInstanceDiagnosticManager(<br />roleName,<br />roleInstanceName);<br />vardataBuffersToTransfer = DataBufferName.Logs;<br />OnDemandTransferOptionstransferOptions = <br /> newOnDemandTransferOptions();<br />transferOptions.From = DateTime.MinValue;<br />transferOptions.To = DateTime.UtcNow;<br />transferOptions.LogLevelFilter = LogLevel.Critical;<br />GuidrequestID = ridm.BeginOnDemandTransfer(<br />dataBuffersToTransfer,<br />transferOptions);<br />
    39. 39. Cerebrata Diagnostics Manager<br />Demo<br />
    40. 40. Storage Considerations<br />Standard WA Storage costs apply for transactions, storage & bandwidth<br />Data Retention<br />Local buffers are aged out by the Diagnostic Monitor according to configurable quotas<br />You control data retention for data in table/blob storage<br />You should manage cleanup of this!<br />Query Performance on Tabular Data<br />Partitioned by high-order bits of the tick count<br />Query by time is efficient<br />Filter by verbosity level at transfer time<br />
    41. 41. Common Diagnostic Tasks<br />Performance measurement<br />Resource usage<br />Troubleshooting and debugging<br />Problem detection<br />Quality of Service Metrics<br />Capacity planning<br />Traffic analysis (users, views, peak times)<br />Billing<br />Auditing<br />
    42. 42. Management API<br />How do I manage my deployments?<br />
    43. 43. Management API<br />Trough Microsoft.Samples.WindowsAzure.Management.*<br />What it provides...<br />X509 client certificates for authentication<br />View, create, delete, swap, … deployments<br />Edit configuration (and change instance count)<br />List and view properties for hosted services, storage accounts and affinity groups<br />Also exists as<br />PowerShell scripts<br />Msbuild tasks (CI & auto-deploy anyone?)<br />
    44. 44. Using the management API<br />
    45. 45. Auto-Scaling<br />Bringing it all together<br />
    46. 46. Auto-Scaling<br />As easy as doing this?<br />Unfortunately: no…<br />“When” should it scale?<br />“How” should it scale?<br />“Who” / “What” is responsible for scaling?<br /><InstancesminInstances="3" maxInstances="10" /><br />
    47. 47. Auto-Scaling – “When”<br />Different for every application<br />Based on performance counters<br />Based on queue length / workload<br />Based on the weather?<br />Weight of metrics<br />Trends in metric data<br />Answer:<br />“Sensors”<br />“Scaling logic provider”<br />
    48. 48. Auto-Scaling - Sensors<br />Sensors provide metrics and trend<br />Performance counter<br />Queue length<br />Custom<br />
    49. 49. Auto-Scaling – “How”<br />Average topology change takes up to 15 minutes<br />What if your load goes up too fast?<br />Weight of metrics<br />Trends in metric data<br />Answer:<br />“Scaling logic provider”<br />
    50. 50. Auto-Scaling – Scaling logic<br />Scaling logic provider uses sensor data to suggest an action (up/fast-up/down/stable)<br />To implement per application<br />Just a suggestion!<br />
    51. 51. Auto-Scaling – “Who”/”What”<br />A dedicated server / worker role?<br />At least two workers for WA SLA: costs!<br />The application itself?<br />Master election (which role instance responsible?)<br />Answer:<br />Approach will differ per application…<br />
    52. 52. Auto-Scaling - Responsabilities<br />More approaches are possible<br />Dedicated worker<br />On-premise monitoring app<br />The app itself<br />Master election based on RoleEnvironment.Roles<br />
    53. 53. Auto-Scaling<br />Demo<br />
    54. 54. Auto-Scaling Demo<br />Scaling based on custom sensor<br /># users logged in<br />Monitoring done by the app itself<br />Which brings everything together:<br />Master election  Role Environment<br />Performance counters  Diagnostics API<br />Queue length  Storage API<br />Scaling (changing # instances in config)  Management API<br />
    55. 55. Takeaways<br />What to remember?<br />
    56. 56. Takeaways<br />Windows Azure Environment components<br />Fabric Controller<br />Windows Azure Guest OS <br />Fabric Agent<br />Diagnostic Monitor<br />All components provide interaction<br />Interacting with the Fabric<br />Monitoring and Diagnostics<br />Management API<br />Bringing it all together gives you power!<br />
    57. 57. Q&A<br />Any questions?<br />
    58. 58. Thank you!<br />@maartenballiauwmaarten.balliauw@realdolmen.com<br />